Abstract:
Text emotion recognition is a challenging topic for research in natural language processing. The research in this field often creates recognition models based on data collected from social media or open datasets. This research investigates the new Google developed dataset "GoEmotions: A Dataset of Fine-Grained Emotions", which is made up of text from subreddits that has been labeled into 28 emotional categories. The dataset is grouped into 3 classes: positive emotion class, negative emotion class, and ambiguous emotion class. The goal is to classify an unknown emotional text into one of these classes. Our study suggests that combining unsupervised learning LDA with popular text feature vectors like TF-IDF and Word2Vec can improve the emotion recognition accuracy. The experiment demonstrates the learning curves and model tuning techniques, as well as the results from various feature vectors and models. According to the experiment results, using XGBoost with Word2Vec gives the best performance with 64 percent accuracy. We also created a chatbot to show how the algorithm can be used in practice.