Over the past decades, a tremendous amount of research has been done on the use of machine learning for speech processing applications, especially speech recognition. However, in the past few years, research has focused on utilizing deep learning for speech-related applications. This new area of mac
An Audio Analysis Engine: Recognition of Speech, Age and Gender from Audio using Deep Neural Networks
Over the past decades, a tremendous amount of research has been done on the use of machine learning for speech processing applications, especially speech recognition. However, in the past few years, research has focused on utilizing deep learning for speech-related applications. This new area of machine learning has yielded far better results when compared to others in a variety of applications including speech, and thus became a very attractive area of research.
The proposed project aims to prvoide a suite of tools/platform that can be used to perform a comprehensive "Audio Scene Analysis". The System comprises of a number of servies, applied on Audio, such as estimation of (i) age (ii) gender (iii) Mood (sentiment analysis) (iv) Topic modelling, where audio files serve as input to the system.
To explore the utilization of machine learning in Speech processing , learn the innovation in machine learning concepts and a practical implementation of the project using programming knowledge. The detailed objectives are:
1. Delivering a "Audio Analysis Engine" with the Web Interface as welll as an Android Application.
2. To provide a AI based module that can estimate the gender of the speaker.
3. To provide a AI based module that can estimate the age of the speaker.
4. To provide a AI based module that can estimate sentiment (mood/emotions) of speaker.
5. To provide a AI based module that can estimate the topic of speech.
(To Collect a dataset of audio samples)
We aim to provide a user-friendly interface that can perform accurate speech to text, fast output, accurate estimation of gender and age, sentiment and topic.
The project will be implemented with Web as well as Android Interface.
To cope up with the requirements our team will work on creating an attractive and simple interface, machine learning model and deep neural networks for accurate and fast outputs.
The main features of the project and expected features are:
The model will operate mainly as Web Application and Android application.
???????Project Design Assumptions and Dependencies
A general ASR system working is shown in the figure below:

The ML models are used (language models such as BERT, GPT2, XLNet, RoBERTa); WordtoVec, Dynamic Time Wrapping Wave2Vec.
The project might be dependent on less noisy environment and good hardware specifications
Android Studio will be used for testing and implementing the project in Android environment.
External Interface Requirements
The main requirement of the project is the dataset which will be used for training and testing the model. The most natural way to start is our own proprietary speech data, then some public speech data sets like Google and Mozilla voice datasets.
The system has benefits, both in commercial domain, as well as for Law Enforcement Agencies (LEAs).
In commercial domain, the system can be deployed in Call Centers environment, where an audio profile of caller/speaker can be automatically produced that can be used for better provision of services.
For LEAs, the system can be used for audio forensics.
The final product should be accurate and fast in processing the audio signal. Simple and attractive interface for less use of GPU. The are no expected risks associated with the running the ASR system or modifying the source code of the project.
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Multifunctional Portable Bluetooth-compatible Speaker and Mic | Equipment | 2 | 7000 | 14000 |
| Smart Mobile (for Testing) | Equipment | 1 | 35000 | 35000 |
| SSDs to store data | Equipment | 1 | 10000 | 10000 |
| Priting/Stationary | Miscellaneous | 1 | 10000 | 10000 |
| Total in (Rs) | 69000 |
A Stock Market / Equity Market /Share Market ? ??????? ?? ?? ?????? ,PSX) is the aggregati...
The energy crisis is one of the biggest problems in developing countries. There is a big g...
E-commerce has successfully emerged as a large business scope in recent years. Entrepreneu...
In this Project, our main concern is to promote green technology and electric bi...