Adil Khan 10 months ago

AdiKhanOfficial #FYP Ideas

An Audio Analysis Engine: Recognition of Speech, Age and Gender from Audio using Deep Neural Networks

Project Title

Project Area of Specialization

Artificial Intelligence

Project Summary

Over the past decades, a tremendous amount of research has been done on the use of machine learning for speech processing applications, especially speech recognition. However, in the past few years, research has focused on utilizing deep learning for speech-related applications. This new area of machine learning has yielded far better results when compared to others in a variety of applications including speech, and thus became a very attractive area of research.

The proposed project aims to prvoide a suite of tools/platform that can be used to perform a comprehensive "Audio Scene Analysis". The System comprises of a number of servies, applied on Audio, such as estimation of (i) age (ii) gender (iii) Mood (sentiment analysis) (iv) Topic modelling, where audio files serve as input to the system.

Project Objectives

To explore the utilization of machine learning in Speech processing , learn the innovation in machine learning concepts and a practical implementation of the project using programming knowledge. The detailed objectives are:

1. Delivering a "Audio Analysis Engine" with the Web Interface as welll as an Android Application.

2. To provide a AI based module that can estimate the gender of the speaker.

3. To provide a AI based module that can estimate the age of the speaker.

4. To provide a AI based module that can estimate sentiment (mood/emotions) of speaker.

5. To provide a AI based module that can estimate the topic of speech.

(To Collect a dataset of audio samples)

Project Implementation Method

??????Product Perspective

We aim to provide a user-friendly interface that can perform accurate speech to text, fast output, accurate estimation of gender and age, sentiment and topic.

The project will be implemented with Web as well as Android Interface.

To cope up with the requirements our team will work on creating an attractive and simple interface, machine learning model and deep neural networks for accurate and fast outputs.

Product Features

The main features of the project and expected features are:

Speech-to-Text from audio file
Speech-to-Text from real-time audio
Gender, Age estimation
Sentiment and Topic modelling

???????Operating Environment

The model will operate mainly as Web Application and Android application.

???????Project Design Assumptions and Dependencies

A general ASR system working is shown in the figure below:

The ML models are used (language models such as BERT, GPT2, XLNet, RoBERTa); WordtoVec, Dynamic Time Wrapping Wave2Vec.

The project might be dependent on less noisy environment and good hardware specifications

Android Studio will be used for testing and implementing the project in Android environment.

External Interface Requirements

The main requirement of the project is the dataset which will be used for training and testing the model. The most natural way to start is our own proprietary speech data, then some public speech data sets like Google and Mozilla voice datasets.

Benefits of the Project

The system has benefits, both in commercial domain, as well as for Law Enforcement Agencies (LEAs).

In commercial domain, the system can be deployed in Call Centers environment, where an audio profile of caller/speaker can be automatically produced that can be used for better provision of services.

For LEAs, the system can be used for audio forensics.

The final product should be accurate and fast in processing the audio signal. Simple and attractive interface for less use of GPU. The are no expected risks associated with the running the ASR system or modifying the source code of the project.

Technical Details of Final Deliverable

Functional Requirements

The system takes real-time audio or prerecorded audio files as input
System extracts features from the audio signals
Using language models the audio will be converted to text
Using the features extracted, probabilistic data on age, gender, emotions, topic modelling will be output
The ML model will be compatible with Android environments
The system will also take voice commands for Speech-to-Text, gender and age estimation. E.g. user gives a voice command “Convert audio.wav to text” or “Transcribe audio.wav” or “What is gender of audio.wav?”

Non-Functional Requirements

Efficiency. System should be efficient in feature extraction from fairly noisy audio sample
Accuracy. The most probabilistic output, at least 90% accuracy is expected out of the system
Extensibility. The system is extendible to learn new voice commands.
Usability. Simple interface, added voice operability make the system easy-to-use and user-friendly
Performance. The system shall be fast and consumes less GPU
Operability. The system is also operates in Android environments

Final Deliverable of the Project

HW/SW integrated system

Core Industry

Media

Other Industries

IT , Security

Core Technology

Artificial Intelligence(AI)

Other Technologies

Big Data

Sustainable Development Goals

Industry, Innovation and Infrastructure

Required Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
Multifunctional Portable Bluetooth-compatible Speaker and Mic	Equipment	2	7000	14000
Smart Mobile (for Testing)	Equipment	1	35000	35000
SSDs to store data	Equipment	1	10000	10000
Priting/Stationary	Miscellaneous	1	10000	10000
			Total in (Rs)	69000

If you need this project, please contact me on contact@adikhanofficial.com

Comments 0

Pakistan Stock Market Prediction System

A Stock Market / Equity Market /Share Market ? ??????? ?? ?? ?????? ,PSX) is the aggregati...

Adil Khan

10 months ago

PHP Tutorial (& MySQL) #22 - Checking for Errors & Redirecti...

Adil Khan

3 years ago

Smart solar system using IOT

The energy crisis is one of the biggest problems in developing countries. There is a big g...

Adil Khan

10 months ago

Multi-vendor e-commerce website

E-commerce has successfully emerged as a large business scope in recent years. Entrepreneu...

Adil Khan

10 months ago

EV Drive System Emulator

In this Project, our main concern is to promote green technology and electric bi...

Adil Khan

10 months ago