Sentiment Analysis for Online Chat therapy (Python, ML)

A few non profits like Mindright, DC are doing a wonderful job of venturing into the space of e-therapy- by using the service of volunteer coaches to help youth via SMS chat support. Here is a basic analysis proposing an approach in Sentiment Analysis to evaluate how effective an sms/online chatting support system could be via coaches influencing the wellbeing of a mentee through daily/weekly text conversations as well as over a period of weeks (typed texts being the only real input coaches have about the mentees' wellbeing).

What is Sentiment Analysis?

Sentiment Analysis (SA) is the computational study of people’s opinions or emotions in online text/social media. It is also sometimes referred to as opinion mining or emotion analysis. SA is a classification process. Document level, sentence level and aspect level classification is typically done to classify the opinion or sentiment in the text. Aspect level SA is done to classify sentiment with respect to specific aspects of entities. Opinion holders can have different opinions of different aspects of the same entities. For example, in SA of reviews for a restaurant the same person can tweet about the food with a positive sentiment– “The food was awesome,” and waiting in a negative sentiment- “but we had to wait for very long, I hated it!”

An introduction to different study methods

A mix of machine learning (ML) and lexicon based methods are used to evaluate emotion through text. Machine learning techniques first trains the algorithm with some particular inputs with known outputs so that later it can work with new unknown data. Lexicon analysis is based on the assumption that collective polarity of a sentence or a piece of text is the sum of individual polarity of its words. Naïve Bayes algorithm, n-gram sentiment analysis or SVM methods are a few of the commonly used approaches in such studies. 

Online text/sms coaching

Let us go back to the study of online one on one chat conversations between a coach and a mentee to perform emotional/sentiment analysis to quantify the mental state or happiness level of the mentee. Typically in SA, as described above we study one time online tweets or reviews by different people. And the opinion is meant to take a side or be something which can be characterized into some polarity of good or not good, in favor or not in favor etc. Also, because it is an opinion, it is about something. When we study conversation however, we are dealing with a very different problem. A conversation between two people can be about many things and may have varying contexts which also depend upon the flow of the conversation. What a person seeks is support and we wish to evaluate through emotion analysis the mental health of the person and how the chat conversations with the coach affects the same over a conversation as well as over weeks or months of support. To evaluate that effectively using ML/software/logic, we study the conversations to understand the structure if there is any. For example, let us explore how a typical conversation between a coach and a mentee may go.

Most starting conversations often begin with a "Hello", "how are you doing?". In written texts our conversations are normally shorter and especially in a professional therapy setup- time bound. The initial responses to such questions depend upon the level of mental happiness of the mentee and also the level of comfort he or she has with online therapy as well as their therapist. Only after a series of exchanges after that the coaches starts to really get to know how the person is doing. Still for the ML we need to look out for first or initial responses from the patient or mentee, to get a generic idea about how the person is feeling in the beginning of the day before getting the help from therapist. For example a first response like “It really sucks!” to "How are you doing today?" can be the baseline information about the patient's health which can be used later to evaluate whether the person felt better at the end of the chat. Normally coaches will ask a series of questions after this to understand what is affecting the patient. By this point in the conversation the coach really begins to make suggestions and applying principles of therapy to help the patient deal with his or her current situation.  Similarly, when the conversation ends, a few last text exchanges can be used to analyze if anything changed about the emotional wellbeing of the mentee after chatting with the coach. These pieces of text can be collected, saved and used as training data for SA processing.

Phase I

Research Methods with ML/Python- 

A sample size of patients in the age group 25-44 yrs receiving support from trained coaches via SMS texting/online chatting is studied for improvement in emotional and mental wellbeing over a period of 12 weeks. An initial pilot study starts with training data wherein supervised annotation is done for the start chat of the SMS texts categorizing them into -[‘positive’, ‘negative’, ‘neutral’]. SA is done then on test data using various available methods- Machine learning algorithm (Naïve Bayes or SVM etc) or Lexicon based to categorize datasets into the same wellbeing markers. A trend in the direction of starting with a lower marker of happiness like ‘Negative’ or ‘Neutral’ and moving towards higher marker is considered as progress and is desired both during a chat session as well as over weeks. Accuracy of methods is studied, improved and compared. The best method is recommended for a detailed study.

Python library scikit-learn for machine learning, nltk for tokenization- tokenizing, stemming to build word list to finally create a bad of words, pandas and numpy for data processing- training data is checked for skew levels for ‘positive’, ‘negative’, ‘neutral’, Plotly is used for plotting graphs to see the skew of data among other applications. Test data is finally classified using different machine learning algorithms choosing a ratio of 8/2 for training data/test data. Emoticons, exclamation marks or other special characters, any abbreviations are ignored for now.

Phase II

The well-being markers are now classified into more specific categories- into [‘Easy’, ‘Normal’, ‘Difficult’, ‘At risk’]. SA is done in the beginning and end of chats and also evaluated over time (12 weeks?) to measure wellbeing progress as being affected by online support offered by the coach. A trend in the direction of starting with a lower marker of happiness like ‘At risk’ and moving towards higher marker like ‘Easy’ is considered as progress and is desired both during a chat session as well as over time.