Skip to content

This repository introduces an autoencoder architecture for news article clustering, outperforming TF-IDF, GLoVe, Word2Vec, and BERT. Using K-Means clustering and evaluation via Davies-Bouldin and Calinski-Harabasz Indices, it demonstrates the superior ability of autoencoders to effectively categorize news content.

License

Notifications You must be signed in to change notification settings

dipsikhade/embedding-comparisons-in-clustering-application

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

embedding-comparisons-in-clustering-application

  • This repository presents a novel autoencoder architecture designed for news article clustering.
  • It compares its performance against established methods like TF-IDF, GLoVe, Word2Vec, and BERT.
  • Evaluation metrics include Davies-Bouldin and Calinski-Harabasz Indices.

Notebooks

Comparison of Existing Models

Autoencoder for News Clustering

  • Autoencoder_Clustering.ipynb: Demonstrates an autoencoder-based approach for news article clustering, showcasing its superiority over traditional methods.

Getting Started

Cloning the Repository

  1. Clone the repository:

    git clone https://github.com/KanishkRath/embedding-comparisons-in-clustering-application.git
    cd embedding-comparisons-in-clustering-application
  2. Access the notebooks:

    • Locate the notebooks within the cloned repository to explore and execute them using Jupyter Notebook.

Usage

  • Ensure you have Jupyter Notebook installed.
  • Open the notebooks to run and explore the functionalities.
  • Customize code segments or parameters for experimenting with different datasets or settings.

Comparison Summary

Model Comparison Table

Method Davies Bouldin Index Calinski Harabasz Index
TF-IDF 8.5827 14.52
Word2Vec 1.5521 484.77
GLoVe 1.7722 369.76
BERT 3.0890 125.49
Autoencoder 0.8967 1250.82
Autoencoder with One Hot Encoding 0.8288 3298.19

About

This repository introduces an autoencoder architecture for news article clustering, outperforming TF-IDF, GLoVe, Word2Vec, and BERT. Using K-Means clustering and evaluation via Davies-Bouldin and Calinski-Harabasz Indices, it demonstrates the superior ability of autoencoders to effectively categorize news content.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%