Newsgroup-Dataset-Text-Classification

Training Dataset and Testing Dataset : http://archive.ics.uci.edu/ml/machine-learning-databases/20newsgroups-mld/ The training dataset consists of 20 different classes each of which have around 1000 articles each . The first part of the project was to be able to create the x dataset for training . This was done in the following steps :

Each article was parsed one by one .
Stop words were removed
A dictionary was created with each word and its frequency.
Top 2000 words were selected and used as the feature list .
The entire 2d array was initialized with zero and then the entire dataset was parsed again to fill this array .

The second part of the project was using the sklearn implementation of Multinomial Naive Bayes classifier.

The third part of the project was using a self implemented Naive Bayes classifier.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
text classification project.ipynb		text classification project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Newsgroup-Dataset-Text-Classification

About

Releases

Packages

Languages

Shivankit99/Newsgroup-Dataset-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Newsgroup-Dataset-Text-Classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages