Urdu speech to english text

Project Title

Project Area of Specialization

Computer Science

Project Summary

"TLATOR" short for translator will provide the platform for Urdu speaking lecturers to make their lectures internationally available by adding English subtitles to it. The reason for selecting this project is that due to pandemic most of the schools and university went online and there are so many lectures which can be made internationally available which will benefit both students and the lecturers. Students will get access to learn their desired topics and lecturers can earn from it. Many organizations have implemented subtitles feature on so many different languages but little work has been done for Urdu. The accuracy is not good at all.

Project Objectives

Speech to text is a mechanism for converting human spoken words into written text. The main objective of this project is to convert Urdu spoken words into English text. The following objectives will be met by this project.

• User will be able to upload the lecture or any video in Urdu and create an English transcribe from it.

• User will be able to convert the Urdu video into an English subtitled version of the video and have the access to download it.

End user will give video as input and the system will extract the audio from it and will perform signal processing to get the pitch of sound and will generate phonemes according to it then the acoustic model will help relating audio with words and then the language model will develop a combination of words which maximizes the probability of mapping, will be added to the transcript.

Project Implementation Method

We are using the Wav2Vec2 acoustic model for training and evaluating the Urdu Voice dataset. For feature extraction, we are using Wav2Vec2FeautreExtractor and for the token, we use Wav2Vec2CTCTokenizer. For the training of the dataset, we are using Mozilla Common Voice Urdu audio samples that have 33mbs of audio data in mp3 format.

The model takes a speech signal in its raw form of any language. This is for the language that has unlabeled data. The audio processed in its raw form is a 1-D array and passed it to the multi-layer 1-D CNN to generate audio time frames of 25ms each. These representations of audio are then mapped with a codebook that selects the most appropriate representation for the data.

After pre-training on unlabeled speech, the model has changed on labeled data which is to be used for speech recognition tasks like speech identifier or emotion recognition.

Benefits of the Project

This project aims to provide Urdu lectures or videos with transcribed English subtitles generated by an AI model using Deep Learning. International students can get benefit from this, students can go through the Urdu recorded lectures to learn about the desired topic. This project also aims to make our local lectures be available worldwide.

Technical Details of Final Deliverable

The system segregates the audio of the video for ASR (Automatic Speech Recognition).
The acoustic model will process the audio, outputs phonemes for each word.
Using the phonemes, it will return the correct spoken word. In our case if the word is of Urdu, it outputs the Roman literature. For example: the word “ "???? will be returned as “WAQIA”.
The roman Urdu words will be translated into English, creating a transcript text file for subtitles.
The transcript file will be added to the video with correct timeframes of words spoken of the speaker.
After processing, the user will have a processed English transcribed Urdu lecture for download in the system or it can either be exported to drive or emails.

Final Deliverable of the Project

Software System

Core Industry

Other Industries

Education

Core Technology

Artificial Intelligence(AI)

Other Technologies

Sustainable Development Goals

Quality Education

Required Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
RAM	Equipment	2	6000	12000
HDD	Equipment	1	6000	6000
			Total in (Rs)	18000

If you need this project, please contact me on contact@adikhanofficial.com

Comments 0

IoT based smart posture detection through pressure sensor and recommen...

Annually lots of money is spent on back pain problem and their treatment. We can save and...

Adil Khan

10 months ago

NUML Student Facilitation and Chatbot

NUML Student Facilitation and Chatbot is an online web based management system, This ...

Adil Khan

10 months ago

Paralysis Aid : Bionic Glove

One of the most common disabilities found in humans is stroke paralysis causing the inabil...

Adil Khan

10 months ago

IoT Based Smart Farming System

The Internet of things (IOT) is the network of physical devices vehicles home appliances a...

Adil Khan

10 months ago

Mask occluded face recognition using Generative Adversarial Network

Face recognition is one of the most convenient and fast techniques of identification. It h...

Adil Khan

10 months ago