AUTOMATIC QUESTION GENERATION FOR BS COMPUTER ENGINEERING SUBJECTS USING NLP

Project Title

Project Area of Specialization

Artificial Intelligence

Project Summary

Automatic question generation is part of natural language processing (NLP). Many researchers are working on this field of automatic question generation by NLP, and many methods and models have been developed to automatically generate different types of questions It is a research field where many researchers have announced their research results, and it is a research field that aims to achieve higher accuracy. The work has been done in many languages. Today, teachers, professors, and tutors (scholars) spend a lot of time manually creating test papers and tests. Students also spend a lot of time on self-analysis .In addition; students rely on adviser for self-analysis. Therefore, we are currently working on this NLP area, which has a lot of room for development. We want to build that kind of computer application system that can input text for the referenced material. Based on this, you will be presented with a set of questions and answers that you can use to perform self-analysis (self-adjustment). A similar approach is used by adviser to write test papers and tests. The ability to ask questions is the central cognitive element that to perceive a difference in human and animal cognitive abilities. A good question is relatively short, clear, and unambiguous. The Multiple-choice questions (MCQ) are very easy to evaluate, and the evaluation is implemented via a computerized application, so the results are published within hours and the evaluation process is 100 percentages pure. MCQs online exams, including many major exams such as ETS Pakistan, GATE, and CTSP, are very popular. In addition, WH-Questions use words like ”why,” ”when, ”who, ”where, ”how, etc. You cannot answer yes or no. In contrast to polar coordinates, query area units Yes-No WH-Queries that don’t necessarily provide a variety of other things answer, or it is essentially forbidden. Therefore, the system is beneficial for online student exams, especially in times of pandemic, to generate new questions whose answers are not directly available on the internet; thus, reducing the student’s negligence. These generated questions can be used by teachers to define tests. Students can take advantage of this for self-assessment to understand their understanding of a particular topic. This automation reduces costs, labor, and eliminates the occurrence of human error, giving users a quick and easy-to-apply questioning tool at their fingertips.

Project Objectives

Build a system that can help students in calibrating themselves and remove any dependence.
Automate the process of question generation.
To reduce the teacher efforts

Project Implementation Method

Fill in the Blank and MCQS types Questions
We have used SQUAD 1.0 dataset which contains about 100,000 questions generated on Wikipedia articles. Intuitively, the task of selecting a probable answer is very much similar to tagging a word as spam or not spam . Hence, we decided to use binary classification on each input word to tag it whether it is an answer or not. For this task, each non-stop word from the paragraphs of SQUAD dataset were extracted and we added some features on them like POS tag, shape, word count, NER tag, etc. and a label ‘isAnswer’. Using the data generated from the previous step, we used scikit-learn's Gaussian Naive Bayes algorithm to train a model that would tag each word as whether it can be a pivotal answer or not. The advantage of using Naïve Bayes is that it also gives us the probability of each word, which will be used to choose the most probable pivotal answer. The distractors are generated using pre-trained word-embeddings and cosine-similarity . This will generate words that will be used as the multiple-choice options. Once the model is trained, we save the model to use it later for user inputs. After the user uploads the document. The content is split into sentences and various preprocessing is done to clean the text. We feed these sentences to the model that we saved earlier to predict the pivotal answers. The generated results are then formatted and displayed to the user.
Example Input Sentence: “The fourth planet from the Sun in the Earth's solar system is Mars, which is sometimes called the Red Planet.”
Pivotal Answer chosen: Mars Options generated: Moon, Jupiter, Saturn
WH types Questions
For generation of Wh questions, the sentences are filtered as the entire text cannot be used to generate questions. We use the top sentences to generate our questions. To identify the top sentences we use NLTK’s Textrank algorithm. This algorithm takes out the most important sentences present in the text. The general preprocessing part involves tokenizing the uploaded text document. The words are then tagged using POS tagging and NER tagging.The question generation procedures are further classified based on sentence structures. Each of these have different transformation rules and algorithms. They are classified into:

Named Entity Recognition based algorithm
Discourse Marker based algorithm
Non Discourse marker based algorithm

Benefits of the Project

Simple interface which enhances the ease of updating data.
The system excludes human efforts and saves time and resources.
The business research and professional support analyse which will assist the users globally to earn and enhance their knowledge & skills by solving a questionnaire (Multiple-Choice Questions (MCQs) or WH- questions).
The main aim of the paper is to reduce the gap between manpower and technology by focusing on automating the task of question paper generation

Technical Details of Final Deliverable

We will build a system that accepts a text document from the user and then generates questions. The document may contain a passage or an extract of any topic. The main architecture is mainly divided into three modules: Authentication - As the user has to create an account to use the QG system, the user must sign up to create an account. Authentication is done by Django authentication or the user can use his Google account as well to log in to the system. Question Generator - After logging into the system the user has to upload a text document. The document cannot contain any images or non-textual special characters. The user is provided with the option of downloading or rating the generated question. Based on the choice, the generated question file or the ratings will be saved in the local disk. Also, the user has to choose the maximum number of questions to be generated, although it cannot exceed the number of sentences in a text document. The questions are of two types, namely Fill in the blank(FIB) and Wh type of questions.
The FIB model uses machine learning techniques to generate FIB questions. It identifies the keyword using classification techniques. The keyword is then replaced with a blank line and the remaining sentence is used as the FIB question. It also generates multiple wrong answers or distractors using the keyword so that it can be used as MCQ. The Wh question first takes out the top sentences, preprocesses the text and then generates questions based on the type of sentences. Sentences are classified into NER tags based, discourse marker based and non discourse marker based algorithms. It applies various transformation rules to generate the questions based on the structure of the sentences.

Evaluation - Once the questions are generated, if the user wants to evaluate the questions generated, they can rate the generated questions based on various criteria like answerability, grammatical correctness of the question. This would help in improving the system. Apart from this we have also implemented BLEU scores to compare our questions to human generated ones.

Final Deliverable of the Project

Software System

Core Industry

Education

Other Industries

IT , Medical , Others , Health

Core Technology

Artificial Intelligence(AI)

Other Technologies

Augmented & Virtual Reality

Sustainable Development Goals

Quality Education

Required Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
Google colab pro	Miscellaneous	4	1300	5200
NVIDIA P1000	Equipment	1	70000	70000
thesis printing and binding	Miscellaneous	4	1200	4800
			Total in (Rs)	80000

If you need this project, please contact me on contact@adikhanofficial.com

Comments 0

Artificial intelligence based Drowsy Driver Detection

Major studies have suggested that around 20% of all road accidents are fatigue related. Dr...

Adil Khan

11 months ago

My Superior Recruit

We?re providing "Superior Recruit" a website-based platform specially designed for such st...

Adil Khan

11 months ago

Solar Refrigeration Without The Use of Batteries

Summary    Using any renewable energy source forces us to use batteries as an en...

Adil Khan

11 months ago

speed control of induction motor using vfd and plc

Induction motors (IM) have performed the main part of many speed control systems and found...

Adil Khan

11 months ago

Development of MYO ELECTRODE based Mobile Platform for detection of pr...

Hands and fingers have a vital role in daily life for various functions and esthetics. The...

Adil Khan

11 months ago