Topic modeling for Urdu news.

This project is for topic modeling of Urdu news text in which we are using machine learning to extract the result. Topic modeling is a technique to identify the topics present in a large set of text documents. We use Urdu news text to show that this approach works on any genre of text equally w

2025-06-28 16:36:25 - Adil Khan

Project Title

Topic modeling for Urdu news.

Project Area of Specialization Artificial IntelligenceProject Summary

This project is for topic modeling of Urdu news text in which we are using machine learning to extract the result. Topic modeling is a technique to identify the topics present in a large set of text documents. We use Urdu news text to show that this approach works on any genre of text equally well. Topic modeling helps us to extract unexposed latent words that can represent complete documents for example we get the whole text by using machine learning and extract the keywords from the Urdu new text.

Project Objectives
  1. It makes it possible for people to find the topic of interest easily when there is hard to read all the data generated on social media at a rate never seen before.

There has been a lot of research on topic modeling in English but not much in Urdu despite According to Ethnologist's 2018 estimates, Urdu, is the 11th most widely spoken language in the world, with 170 million total speakers.

Project Implementation Method

Topic modeling is an algorithm for extracting the topic or topics for a collection of documents. It is the widely used text mining method in Natural Language Processing to gain insights into text documents. The algorithm is analogous to dimensionality reduction techniques used for numerical data. some steps are

Steps...................................................................................................................................................
==>data collection.
==>feature selection
==>preprocessing (cleaning, lemmatization, stemming)(nltk, SciPy, genism)
==>word embedding (token level)(bag of word, glove, word to bag)
==>model selection (lda, lsi, hdp, lsa)
==>comparison
==>key word extraction, visualization (word cloud, coherence model)

Benefits of the Project Technical Details of Final Deliverable

not set 

Final Deliverable of the Project Software SystemCore Industry ITOther IndustriesCore Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Quality EducationRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 44600
GTX 1650 Equipment13700037000
120GB SSD Miscellaneous 136003600
4GB RAM DDR3 Miscellaneous 140004000

More Posts