Adil Khan 10 months ago
AdiKhanOfficial #FYP Ideas

An Audio Analysis Engine: Recognition of Speech, Age and Gender from Audio using Deep Neural Networks

Over the past decades, a tremendous amount of research has been done on the use of machine learning for speech processing applications, especially speech recognition. However, in the past few years, research has focused on utilizing deep learning for speech-related applications. This new area of mac

Project Title

An Audio Analysis Engine: Recognition of Speech, Age and Gender from Audio using Deep Neural Networks

Project Area of Specialization

Artificial Intelligence

Project Summary

Over the past decades, a tremendous amount of research has been done on the use of machine learning for speech processing applications, especially speech recognition. However, in the past few years, research has focused on utilizing deep learning for speech-related applications. This new area of machine learning has yielded far better results when compared to others in a variety of applications including speech, and thus became a very attractive area of research.

The proposed project aims to prvoide a suite of tools/platform that can be used to perform a comprehensive  "Audio Scene Analysis". The System comprises of a number of servies, applied on Audio, such as estimation of (i) age (ii) gender (iii) Mood (sentiment analysis) (iv) Topic modelling, where audio files serve as input to the system.

Project Objectives

To explore the utilization of machine learning in Speech processing , learn the innovation in machine learning concepts and a practical implementation of the project using programming knowledge. The detailed objectives are:

1. Delivering a "Audio Analysis Engine" with the Web Interface as welll as an Android Application.

2. To provide a AI based module that can estimate the gender of the speaker.

3. To provide a AI based module that can estimate the age of the speaker.

4. To provide a AI based module that can estimate sentiment (mood/emotions) of speaker.

5. To provide a AI based module that can estimate the topic of speech.

(To Collect a dataset of audio samples)

Project Implementation Method

??????Product Perspective

We aim to provide a user-friendly interface that can perform accurate speech to text, fast output, accurate estimation of gender and age, sentiment and topic.

The project will be implemented with Web as well as Android Interface.

To cope up with the requirements our team will work on creating an attractive and simple interface, machine learning model and deep neural networks for accurate and fast outputs.

Product Features

The main features of the project and expected features are:

  • Speech-to-Text from audio file
  • Speech-to-Text from real-time audio
  • Gender, Age estimation
  • Sentiment and Topic modelling

???????Operating Environment

The model will operate mainly as Web Application and Android application.

???????Project Design Assumptions and Dependencies

A general ASR system working is shown in the figure below:

The ML models are used (language models such as BERT, GPT2, XLNet, RoBERTa); WordtoVec, Dynamic Time Wrapping Wave2Vec.

The project might be dependent on less noisy environment and good hardware specifications

Android Studio will be used for testing and implementing the project in Android environment.

External Interface Requirements

The main requirement of the project is the dataset which will be used for training and testing the model. The most natural way to start is our own proprietary speech data, then some public speech data sets like Google and Mozilla voice datasets.

Benefits of the Project

The system has benefits, both in commercial domain, as well as for Law Enforcement Agencies (LEAs).

In commercial domain, the system can be deployed in Call Centers environment, where an audio profile of caller/speaker can be automatically produced that can be used for better provision of services.

For LEAs, the system can be used for audio forensics.

The final product should be accurate and fast in processing the audio signal. Simple and attractive interface for less use of GPU. The are no expected risks associated with the running the ASR system or modifying the source code of the project.

Technical Details of Final Deliverable

Functional Requirements

  1. The system takes real-time audio or prerecorded audio files as input
  2. System extracts features from the audio signals
  3. Using language models the audio will be converted to text
  4. Using the features extracted, probabilistic data on age, gender, emotions, topic modelling will be output
  5. The ML model will be compatible with Android environments
  6. The system will also take voice commands for Speech-to-Text, gender and age estimation. E.g. user gives a voice command “Convert audio.wav to text” or “Transcribe audio.wav” or “What is gender of audio.wav?”

Non-Functional Requirements

  1. Efficiency. System should be efficient in feature extraction from fairly noisy audio sample
  2. Accuracy. The most probabilistic output, at least 90% accuracy is expected out of the system
  3. Extensibility. The system is extendible to learn new voice commands.
  4. Usability. Simple interface, added voice operability make the system easy-to-use and user-friendly
  5. Performance. The system shall be fast and consumes less GPU
  6. Operability. The system is also operates in Android environments

Final Deliverable of the Project

HW/SW integrated system

Core Industry

Media

Other Industries

IT , Security

Core Technology

Artificial Intelligence(AI)

Other Technologies

Big Data

Sustainable Development Goals

Industry, Innovation and Infrastructure

Required Resources

Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Multifunctional Portable Bluetooth-compatible Speaker and Mic Equipment2700014000
Smart Mobile (for Testing) Equipment13500035000
SSDs to store data Equipment11000010000
Priting/Stationary Miscellaneous 11000010000
Total in (Rs) 69000
If you need this project, please contact me on contact@adikhanofficial.com
Pakistan Stock Market Prediction System

A Stock Market / Equity Market /Share Market ? ??????? ?? ?? ?????? ,PSX) is the aggregati...

1675638330.png
Adil Khan
10 months ago
video

PHP Tutorial (& MySQL) #22 - Checking for Errors & Redirecti...

AdiKhanOfficial
Adil Khan
3 years ago
Smart solar system using IOT

The energy crisis is one of the biggest problems in developing countries. There is a big g...

1675638330.png
Adil Khan
10 months ago
Multi-vendor e-commerce website

E-commerce has successfully emerged as a large business scope in recent years. Entrepreneu...

1675638330.png
Adil Khan
10 months ago
EV Drive System Emulator

In this Project, our main concern is to promote green technology and electric bi...

1675638330.png
Adil Khan
10 months ago