Applying semantic search and Natural language inference on Quranic Verses

This research project is divided mainly into two parts, one of it being the semantic-based similarity search which would find similar Quranic verses based on queries given to the system and was decided to be completed in part 1 of the project, and, the second being letting the user know about t

2025-06-28 16:25:08 - Adil Khan

Project Title

Applying semantic search and Natural language inference on Quranic Verses

Project Area of Specialization Artificial IntelligenceProject Summary

This research project is divided mainly into two parts, one of it being the semantic-based similarity search which would find similar Quranic verses based on queries given to the system and was decided to be completed in part 1 of the project, and, the second being letting the user know about the label of consequent whether it agrees or disagrees with the premise given an antecedent. The project, while mainly research, features a web interface that would let the user interact with the research or project’s features.

The interface is intended to allow 3 tasks to be available to users i.e. one of which is the semantic search and two particular ways for performing inference task. The first allows you to perform semantic Quranic verse search and output similar verses on the interface. While from the other two for inference, the first lets you give only an antecedent as an input and have the system find similar verse related to the antecedent, and finally, using that for the premise, it lets the user know the consequent label to confirm if the model inferred and whether it agrees or disagrees with the premise by outputting a consequent label. The second for inference involves given a premise and an antecedent for which the system would similarly output just a consequent label. The inference system, while it is being researched to aid the user in understanding the verses, would potentially be used for shortlisting the verses for an average user instead of having just an ambiguous phrase to ask the expert to clarify, however, the ultimate result might still require an expert’s verdict and validation.

Project Objectives Project Implementation Method

Language Support of the System:

This section aims to discuss our decision to include and work on the languages for this research project or at least, part 1 of it. We have opted for Arabic since it is the language of Quran, English, as it's the international medium of communication and, hence, support for it is deemed necessary to preserve the international scale of the project, and finally Urdu, which is our national language, its support is added as a gesture of support for our national language.

Data:

The data was abundantly available for Quran and its translations thanks to the previous work of communities. We were able to get the Tanzil Data, thanks to our Supervisor, Dr. Tafseer Ahmed Khan, and from OPUS parallel corpus. For evaluation purposes of the Semantic Textual Similarity task with respect to the Quran, we used subject-wise data available on QuranGo. Figure 3.1 acts as an example of how the data was available whereas, figure 3.2 acts as an example for parallel corpus we used e.g. en-ur.

Pre-processing:

Since the models we used are pre-trained and since the models picked usually performed well for semantic textual similarity (STS) for English and Urdu, we had to only pre-process Arabic translation of The Holy Quran for the removal of diacritics.

Implementation:

Once we have the front-end created, we need it to communicate through our research work which is in python and, hence, to create a backend server as quickly as possible and to handle heavy processing which is required to let the user perform the task, the decision was made to use FastAPI. This lets us develop a fully-fledged back-end server up and running very quickly while also promising better performance than famous python frameworks such as Django and Flask.

Now that we have the frontend and the communication between the processing system ready, we can move on to the engine of the system i.e. the actual method which we experimented with during our research. The Processing block of the system i.e. "Retrieval of Quranic Verses" part would use pre-embed verses of the Quran i.e. all 6236 verses of the Holy Quran from all 3 languages excluding the count of "BismilLah" verses. By doing this, we save the cost (in terms of time and responsiveness) of creating a model and have the embeddings be made for all Quranic verses in advance which can then be used to calculate similarity against using the similarity metric of our choice i.e. "Cosine Similarity". For embeddings, we intend to save them from two top-performing models and then using them to calculate similarity against the query we have from the front-end. Finally, the system sorts the similarity detail given by the deep learning model and returns the top 10 most related verses to the user's front-end interface through REST APIs.

Benefits of the Project

This research project is being carried out to solve a problem for Muslims all over the world. The problem being the fact that first, an average person does not know which para, section, or verse of The Holy Quran talks about whatever the concern of the person in question is. Therefore, our method aims to provide a solution to this first problem by using semantic search techniques to find and deliver related verses in response to the query of the user. The second part (literally) of the project aims to solve the other half of this project i.e. Did the user understand the verse correctly? To test himself and shortlist the ayah and results for the respective precedents and antecedents, the user can consult a professional with the shortlisted results. When used in tandem, they essentially cut the cost in half in terms of time and effort.

Technical Details of Final Deliverable Final Deliverable of the Project Software SystemCore Industry ITOther Industries Others Core Technology Artificial Intelligence(AI)Other Technologies OthersSustainable Development Goals Industry, Innovation and InfrastructureRequired Resources

More Posts