"TLATOR" short for translator will provide the platform for Urdu speaking lecturers to make their lectures internationally available by adding English subtitles to it. The reason for selecting this project is that due to pandemic most of the schools and university went online and there are so many l
Urdu speech to english text
"TLATOR" short for translator will provide the platform for Urdu speaking lecturers to make their lectures internationally available by adding English subtitles to it. The reason for selecting this project is that due to pandemic most of the schools and university went online and there are so many lectures which can be made internationally available which will benefit both students and the lecturers. Students will get access to learn their desired topics and lecturers can earn from it. Many organizations have implemented subtitles feature on so many different languages but little work has been done for Urdu. The accuracy is not good at all.
Speech to text is a mechanism for converting human spoken words into written text. The main objective of this project is to convert Urdu spoken words into English text. The following objectives will be met by this project.
• User will be able to upload the lecture or any video in Urdu and create an English transcribe from it.
• User will be able to convert the Urdu video into an English subtitled version of the video and have the access to download it.
End user will give video as input and the system will extract the audio from it and will perform signal processing to get the pitch of sound and will generate phonemes according to it then the acoustic model will help relating audio with words and then the language model will develop a combination of words which maximizes the probability of mapping, will be added to the transcript.
We are using the Wav2Vec2 acoustic model for training and evaluating the Urdu Voice dataset. For feature extraction, we are using Wav2Vec2FeautreExtractor and for the token, we use Wav2Vec2CTCTokenizer. For the training of the dataset, we are using Mozilla Common Voice Urdu audio samples that have 33mbs of audio data in mp3 format.
The model takes a speech signal in its raw form of any language. This is for the language that has unlabeled data. The audio processed in its raw form is a 1-D array and passed it to the multi-layer 1-D CNN to generate audio time frames of 25ms each. These representations of audio are then mapped with a codebook that selects the most appropriate representation for the data.
After pre-training on unlabeled speech, the model has changed on labeled data which is to be used for speech recognition tasks like speech identifier or emotion recognition.
This project aims to provide Urdu lectures or videos with transcribed English subtitles generated by an AI model using Deep Learning. International students can get benefit from this, students can go through the Urdu recorded lectures to learn about the desired topic. This project also aims to make our local lectures be available worldwide.
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| RAM | Equipment | 2 | 6000 | 12000 |
| HDD | Equipment | 1 | 6000 | 6000 |
| Total in (Rs) | 18000 |
Annually lots of money is spent on back pain problem and their treatment. We can save and...
NUML Student Facilitation and Chatbot is an online web based management system, This ...
One of the most common disabilities found in humans is stroke paralysis causing the inabil...
The Internet of things (IOT) is the network of physical devices vehicles home appliances a...
Face recognition is one of the most convenient and fast techniques of identification. It h...