Book Prism
It is an ancient dream to replicate machines to perform human functions, like reading. However, machine learning has grown from a dream to reality, over the last five decades. Now, there are several techniques and algorithms to train a machine in order to perform things like humans. The purpose of o
2025-06-28 16:25:43 - Adil Khan
Book Prism
Project Area of Specialization Software EngineeringProject SummaryIt is an ancient dream to replicate machines to perform human functions, like reading. However, machine learning has grown from a dream to reality, over the last five decades. Now, there are several techniques and algorithms to train a machine in order to perform things like humans. The purpose of our project is to provide a platform to kids where they can listen to the audio of the storybook with the highlighted text. The whole system is categorized into five modules, the image processing module, voice processing module, removal of garbage text, syncing of text and speech, and video generation. Image processing module converts the image into text, whereas the voice processing module changes the text into sound. However, in removal of garbage text we will remove the garbage text which includes page numbers and title of the book at the top of the page. Moreover, in the syncing module we will synchronize text and speech and highlight the text. In last module, which is video generation we will generate video of the synchronized text with audio.
Project ObjectivesThe main goals and objectives of our project are:
- To do image processing to convert the given image into text.
- To do text processing to analyze, normalize and transcribe the text into a phonetic or some other linguistic representation.
- To generate speech from text produced by OCR.
- To synchronize text and speech in order to highlight the text.
To highlight the text in order to make it visible to the user.
Project Implementation MethodTechniques:
- Scrum Technique: This technique is mostly used in offices where every team member interacts with the other team members. Meetings are called on daily basis for better project creation and team understanding.
Languages:
- Python: Python is an interpreted high-level general purpose programming language. Its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects. As our project is based on the artificial intelligence and machine learning, so python is the best platform to implement the AI concepts.
- CSS, HTML and JavaScript: The front end of our project is created in web based languages like CSS, HTML and JavaScript which will be implemented for modern styling.
Tools:
- Visual Studio Code: It is a code editor made by Microsoft for Windows, Linux and macOS. Its features include support for debugging, syntax highlighting, intelligent code completion and code refactoring.
PyCharm with Django Framework: It is an integrated development environment used in computer programming, specifically for the Python language. It provides smart code completion, code inspections, quick-fixes along with automated code refactoring and rich navigation capabilities. PyCharm takes care of creating specific directory structure and files required for a Django application and provides the correct settings.
Benefits of the ProjectOur project is very important in today’s world. From this project, we devolved the interest of children in books. When different types of books be read and listened by children, this will develop creative and innovative skills among them. These children are growing youngsters of our country so it is very necessary to build them with right direction and aimful lives.
Technical Details of Final DeliverableElectronic learning is gaining an educational foothold all over the world. This project aims to develop a web application that enables the user to hear the contents of the book with the highlighted text synchronized with an audio. Moreover, the concepts of Optical Character Recognition (OCR) [3] and Text to Speech (TTS) [4] synthesis will be incorporated in our application.
- Optical Character Recognition (OCR): This transforms a two-dimensional image of text, that could contain machine printed or handwritten text from its image representation into machine-readable text.
- Text to speech (TTS): It is the conversion of written text into spoken voice. You can create TTS programs in python. The quality of the spoken voice depends on your speech engine.
Our system will allow its users to read story books, listen to the story books, delete story books, download the video, search the story book from the category list, mark the story books as private or public and the user can also rate the story books.
Final Deliverable of the Project Software SystemCore Industry ITOther IndustriesCore Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Quality EducationRequired Resources| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 50000 | |||
| Resources | Equipment | 50 | 1000 | 50000 |