Product Reveiw System

?Text mining is a broad field having sentiment mining as its important constituent in which we try to deduce the behavior of people towards a specific item, merchandise, politics, sports, social media comments, review sites etc. Out of many issues in sentiment mining, analysis and classification, on

2025-06-28 16:34:36 - Adil Khan

Project Title

Project Area of Specialization Artificial IntelligenceProject Summary

—Text mining is a broad field having sentiment mining as its important constituent in which we try to deduce the behavior of people towards a specific item, merchandise, politics, sports, social media comments, review sites etc. Out of many issues in sentiment mining, analysis and classification, one major issue is that the reviews and comments can be in different languages like English, Arabic, Urdu etc. Handling each language according to its rules is a difficult task. A lot of research work has been done in the English Language for sentiment analysis and classification but limited sentiment analysis work is being carried out on other regional languages like Arabic, Urdu and Hindi. In this paper, the Waikato Environment for Knowledge Analysis (WEKA) is used as a platform to execute different classification models for text classification of the Roman Urdu text. Reviews dataset has been scrapped from different automobiles’ sites. These extracted Roman Urdu reviews, containing 1000 positive and 1000 negative reviews, are then saved in WEKA attribute-relation file format (off) as labeled examples. Training is done on 80% of this data and the rest of it is used for testing purpose which is done using different models and results are analyzed in each case. The results show that Multinomial Naïve Bayes outperformed Bagging, Deep Neural Network, Decision Tree, Random Forest, AdaBoost, k-NN and SVM Classifiers in terms of more accuracy, precision, recall, and F-measure.

Project Objectives

Due to extensive use of computers, smartphones and high speed internet, people are now using web for social contacts, business correspondence, e-marketing, e-commerce, e-surveys, etc. People share their ideas, suggestions, comments and opinions about a particular product, service, political entity and current affairs. There are so many user-generated opinions available on the web. From all those opinions, it is difficult to judge the number of positive and negative opinions (Khushboo et al., 2012). It makes it difficult for people to take the right decision about purchasing a particular product. On the other hand, it is also difficult for manufacturers or service providers to keep the track of the public opinions about their products or service and to manage the opinions. Similarly, an analyst wants to conduct a survey to get feedback of public on a specific topic. He/She will post the topic on a blog to analyze the sentiment of people about that topic. There will be so many opinions on that post. For all these opinions, it will be difficult to know how many opinions are positive and negative. So a computer machine may be trained to take such decisions in a quick and accurate manner.

Project Implementation Method

The purpose of this step is to ensure that only relevant features get selected from the dataset. In this step, before forwarding the data to the training of models and for classification, the following steps were performed.

1) Data Extraction The extraction task includes the scrapping of reviews from the automobiles sites. The users freely post their comments and reviews on these sites which are mostly multi-lingual, e.g. “Honda cars ka AC bohut acha Hota hai”, “imported cars k spare parts mehengy milty hn” etc. This is because English has a great influence on Urdu speaking community, and also due to the fact that most of the automobiles related terminologies is used as it is in other local languages as well including Urdu.

2) Stop-words Removal Words which are non-semantic in nature are termed as stop-words and usually include prepositions, articles, conjunctions, and pronouns. As they hold very little or no information about the sentiment of the review, so they are removed from the data. A list of Urdu stopwords was taken and converted to the Roman Urdu script.

3) Lower-case words All the word tokens are converted to lower-case before they are added to the corpus in order to shift all the words to the same format, so that prediction can be made easy.

4) Development of Corpus All the extracted reviews and comments are stored in a text file which includes 1000 positive and 1000 negative reviews. In this study, 800 positive and 800 negative reviews are used as training dataset and rest 400 reviews (200 positive and 200 negative) are used as testing dataset.

5) Conversion of Data into ARFF Attribute-Relation File Format (ARFF) is an ASCII text input file format of WEKA. These files have two important sections i.e. Header information and Data information. The dataset text file was converted to ARFF format using by using TextDirectoryLoader command in Simple CLI mode of WEKA. For example: >java weka.core.converters.TextDirectoryLoader filename.txt > filename.arff The elements of text files are saved as strings with relevant class labels

The objectives of this research are to mine the polarity of public opinions written in Roman-Urdu with blend of English and Urdu extracted from a blog, to train the machine using a training data set, and to build Naïve Bayesian, Decision Tree and KNN classification models and to predict the polarity of new opinions by using these classification models.

This paper is organized into five sections. In the first and second sections, the introduction and previous related work is briefly described. In the third section, the methodology adopted to perform different experiments is explained. In the fourth section, calculation and evaluation of experiments are performed to get various results and discussion on these results is conducted. In the last section, certain conclusions are drawn on the basis of outcomes

Benefits of the Project

Sentiment analysis is a computational process to identify positive or negative sentiments expressed in a piece of text. In this paper, we present a sentiment analysis system for Roman Urdu. For this task, we gathered Roman Urdu data of 779 reviews for five different domains, i.e., Drama, Movie/Telefilm, Mobile Reviews, Politics, and Miscellaneous (Misc). We selected unigram, bigram and uni-bigram (unigram + bigram) features for this task and used five different classifiers to compute accuracies before and after feature reduction. In total, thirty-six (36) experiments were performed, and they established that Naïve Bayes (NB) and Logistic Regression (LR) performed better than the rest of the classifiers on this task. It was also observed that the overall results were improved after feature reduction.

Technical Details of Final Deliverable

The usefulness and importance of sentiment analysis task is a widely discussed and effective technique in e-commerce. E-commerce is a very convenient way to buy things online. It saves a lot of time that is usually spent traveling and buying by visiting the shops. E-commerce provides an efficient and effective way to shop sitting right in front of one’s computer/mobile at home. For a given product, sentiment analysis captures the users views; their feelings and opinion related to that product. The reviews are categorized into three basic classes i.e. negative, positive, and neutral. This paper focuses on Urdu Roman reviews that are obtained by one of the most famous and accessed e-commerce website of Pakistan–Daraz.pk. There are total 20.286 K reviews which are annotated into three classes by three different experts. Vector space model, a.k.a bag of word model is applied for feature extraction which are later passed to Support Vector Machines (SVM) for sentiment classification. Experiments are conducted on MATLAB Linux server. The dataset is kept public for future use and experiments.

Final Deliverable of the Project Software SystemCore Industry ITOther IndustriesCore Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Quality Education, Industry, Innovation and InfrastructureRequired Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
			Total in (Rs)	10000
Reserch Design	Miscellaneous	1	10000	10000

Product Reveiw System

More Posts