Speech Emotion Recognition using Machine Learning

Speech emotion recognition is a challenging task primarily because of inter and intra speech to emotion variability. Thus identifying the discriminative acoustic features that can infer human emotions both correctly and in real-time is still an open research problem. Automated real time emotion reco

2025-06-28 16:29:37 - Adil Khan

Project Title

Speech Emotion Recognition using Machine Learning

Project Area of Specialization Artificial IntelligenceProject Summary

Speech emotion recognition is a challenging task primarily because of inter and intra speech to emotion variability. Thus identifying the discriminative acoustic features that can infer human emotions both correctly and in real-time is still an open research problem. Automated real time emotion recognition has a wide range of applications in diverse fields which use human emotional reactions. The application areas include (though not limited to): health care, marketing, assisted living, and human-robot interaction. For this purpose, different smart devices, because of their continuous use, can potentially act as human sensors. These smart devices include smartwatch and smartphones. These devices are capable of capturing real-time speech signals, and send data to cloud or mobile app for human emotion recognition. In this project, we will research and develop a speech-emotion recognition system. The system will be able to recognize various emotions like anger, boredom, disgust, fear, happiness, sadness, and neutral state. This set of emotional states is widely used for emotion recognition purposes. The system will detect the emotions from the voice by extracting meaningful features and using machine learning algorithms. Previous research in this field is mostly focused on handcrafted features and traditional convolutional neural network (CNN) models are used to extract high-level features from speech spectrograms to increase the recognition accuracy. However; the proposed methodologies have high computational cost. In our Final Year Project (FYP), we plan to propose a novel lightweight method that can infer emotions from speech in real-time. In addition to this, we aim to record speech samples from individuals using a voice recorder, smartwatch, and smartphone simultaneously. This way we will be able to implement and evaluate our proposed approach on a large and diverse dataset. Moreover, we will also be able to evaluate and compare the accuracy of different speech sampling devices. Our aim is to develop a method that will predict emotions from speech samples collected from smart devices, since these smart devices are human sensors, and they can ubiquitously infer the emotional state of its owner.

Project Objectives

Humans tend to convey messages not just by using the spoken words but by also using their tones, body language, and expressions. The same message spoken in two different manners can have very different meanings. The objective of this FYP is to study and develop a Speech to Emotion Recognition (SER) system that can find notable and discriminatory features from speech signals to reflect a speaker's emotional state in real-time from their acoustic signals logged from a smart device e.g., a smartwatch and smartphone. For comparison purposes, we will collect the same speech samples from a dedicated voice recorder. The application is used to achieve various goals, in robotics, health care, independent living, marketing, education and entertainment industries. In order to effectively utilise data to get emotion and affective health which take into account the richness of everyday life, we need to measure affective states unobtrusively during these situations. Smart devices like smartphones and smartwatches include sensors, which have the potential to act as  human sensors, and thus could provide rich and accessible information in this respect. Moreover, another objective of this project is to take speech samples from individuals using a voice recorder belonging to different demographics to increase our dataset on this vertical as well.

Project Implementation Method

The project will be implemented using following technologies:

To summarize, our goal is to use smart devices to log speech snippets, extract informative features from these snippets, and send sensed data to the model running on the device. The model will then classify the emotional state of the speaker. Once classified, the emotional data can be used to benefit a wide range of industries, from retail to healthcare, achieving their business objectives.

Benefits of the Project

There would be a broad range of benefits of this project including intelligent human computer interaction, healthcare, education, retail, gaming, automotive and even security. Human emotion recognition also plays an important role in the interpersonal relationship. Following are the major benefits of the project:

Technical Details of Final Deliverable

The final deliverable will consist of following implementations:

Final Deliverable of the Project HW/SW integrated systemCore Industry ITOther Industries Education , Medical , Manufacturing , Media , Others , Health , Security Core Technology Artificial Intelligence(AI)Other Technologies Wearables and Implantables, Big DataSustainable Development Goals Good Health and Well-Being for People, Quality Education, Industry, Innovation and Infrastructure, Partnerships to achieve the GoalRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 48500
SmartWatch Equipment13500035000
Digital Voice Recorder Equipment11350013500

More Posts