I See What You're Saying: Sentiment Analysis With OpenTok and Azure Face API
Building a multi-party video conference that allows us to analyze the sentiment of each participant based on their facial expression.
You know that person. It could be your significant other, a child, a co-worker, or a friend. That person that says one thing, but you can tell by their face, they mean something completely different. You probably just pictured them in your head. Maybe you remember the exact conversation. Perhaps it went like this:
You: Okay?
Them: Fine.
Spoiler Alert: It wasn’t fine.
Wouldn’t it be great if you could know the sentiment behind what they were saying? With OpenTok and Azure’s Face API you can!
In this tutorial, we will build a multi-party video conference that allows us to analyze the sentiment of each participant based on their facial expression. Then we’ll display that sentiment as an emoji over their video.
We’ll get the OPENTOK_API_KEY, OPENTOK_SESSION_ID and OPENTOK_TOKEN variables from our TokBox Account.
In your TokBox Account, click the ‘Projects’ menu and ‘Create New Project.’ Then click the ‘Create Custom Project’ button. Give your new project a name and press the ‘Create’ button. You can leave the preferred codec as ‘VP8’.
You can then copy your API Key and paste it as the value for the OPENTOK_API_KEY setting.
Next, click on “View Project”. At the bottom of the project detail page, you’ll find the Project Tools where you can create a Session ID and Token. Choose “Routed” for your session’s media mode and press the “Create Session ID” button. Then, copy the generated Session ID and paste it as the value of the OPENTOK_SESSION_ID setting.
Finally, paste the generated session ID into the Session ID field of the Generate Token form and press the “Generate Token” button. Copy the generated Token as the value of the OPENTOK_TOKEN setting.
Log into your Azure account and create a new Face API Cognitive Service. Once created, click on the service and go to the “Quick start” blade. There you’ll find your Key and Endpoint. Copy these two values to the AZURE_FACE_API_SUBSCRIPTION_KEY and AZURE_FACE_API_ENDPOINT settings, respectively.
With our configuration ready, let’s add some JavaScript to connect to an OpenTok session. Add an app.js file to the js folder and copy the following to it.
Four things are going on here:
We load variables based on those we specified in the config.js file
We create a handleError method that we’ll use throughout when an error occurs
We add a dataURItoBlob method that we’ll use to convert a base64/URLEncoded image to a blob for sending to Azure Face API
We added two arrays named streams and emotions
The streams array will hold all active participant streams so we can access them to capture images to send to the Azure Face API.
The emotions array will hold strings that represent any emotions returned by Azure Face API. This will be used to display a legend of emojis to the user dynamically.
Add the initializeSession method below to the bottom of the app.js file.
The initializeSession method initializes our OpenTok client with the session we specified with the Session ID. It then adds event handlers for the streamCreated and streamDestroyed events to manage adding and removing streams from our streams array. Finally, it connects to the session using the Token we set in our config.js file.
You can now open the index.html in Chrome or Firefox. When you load the page, you may need to allow the browser to access your webcam and microphone. After that, you should see a video stream of yourself (or whatever your webcam is looking at) displaying on the page.
If that worked, mute your audio then open another tab (keeping the original open) and load the same file. You should now be able to see a second video.
Troubleshooting tip: If there’s no video showing up on the page, open the “console” tab in your browser tools (command+option+i on Mac, CTRL+i on Windows) and check for errors. The most likely issue is that your OpenTok API key, session ID, or token is not set up properly. Since you hardcoded your credentials, it’s also possible that your token has expired.
Now we can see and hear participants, but what is their face telling us that their mouth isn’t? Let’s add a button that allows us to analyze each participant.
In the index.html file, replace the comment that says <!-- Footer will go here --> with the following:
This adds a footer at the bottom of the page with an “Analyze” button and an unordered list that we’ll use as a legend between emojis and sentiments.
Now let’s add the JavaScript to handle our sentiment analysis. Add the following to the bottom of the app.js file.
Let’s review what this code does.
The assignEmoji method takes in a CSS class associated with the emotion for a specific video stream and the index of that stream in our UI. It does the following:
Adds the provided class to our emotions array
Adds a div over the appropriate video panel with the class for the emoji to display
Adds a used class to the li in our footer for that emoji so that it will display in the legend
The processEmotion method receives the payload of face data from the Azure Face API and identifies the emotion with the highest ranking. It then calls assignEmoji with the appropriate CSS class for that emotion and the index of the video it is processing.
The sendToAzure method receives an HTML video element and the index of that video object on our page. It gets the stream associated with that video element and then creates an HTML canvas in the same dimensions as the stream. Next, it draws a capture of the stream to the new canvas and sends an XMLHttpRequest to the Azure Face API with the image it created. The Azure Face API will return a JSON object that we will then send to the processEmotion method.
Lastly, the processImages method clears any existing emojis from the UI and gets all HTML video tags in the DOM and sends them to the sendToAzure method to be processed. This method is called by our “Analyze” button in the footer.
Now when we open the index.html page in our browsers we can press the “Analyze” button to see what emotion Azure’s Face API has identified. There are a few limitations at the moment. For instance, if Azure Face API recognizes two faces in the frame it will return data for both, but our code currently only adds an emoji for the first.
Also, I’m not certain, but it may not work for teenagers. I made my teenage daughter test it dozens of times but it only returned “disgust” and “contempt” as the emotions. Maybe this wasn’t such a good idea. Maybe it’s better to not know what they really think. 😂