GitHub - anhbaysgalan1/mongolian-nlp: Useful resources for Mongolian NLP

This repo will contain a list of useful resources for Mongolian NLP and also my own experiments mostly with PyTorch.

Datasets

DATASET LJSpeech like male voice TTS dataset created from the Mongolian Bible
- used in tugstugi/pytorch-dc-tts
- use dl_and_preprop_dataset.py to download the audio files
DATASET Eduge news classification dataset
- used to train the Eduge.mn production news classifier
- 75K news with 9 categories: урлаг соёл, эдийн засаг, эрүүл мэнд, хууль, улс төр, спорт, технологи, боловсрол and байгал орчин
DATASET 11-11.mn government agency complaint dataset
- 80K with 5 categories: санал хүсэлт, гомдол, шүүмжлэл, талархал and өргөдөл
DATASET online news corpus
- 700 million words

DEMO HMM TTS online demo of the Mongolian National University
- 1x male and 2x female voices
PYTORCH tugstugi/pytorch-dc-tts
- DEMO Colab online demo
- DATASET LJSpeech like male voice dataset created from the Mongolian Bible
TF tugstugi/Tacotron-2 fork of Rayhane-mamah/Tacotron-2 adapted for the Mongolian Bible dataset
- DEMO Colab online demo

DEMO Cyrillic to Mongolian script converter demo of the Inner Mongolian university
PYTORCH tugstugi/bichig2cyrillic Mongolian script to (and back) cyrillic converter
- DEMO Cyrillic to Mongolian Colab online demo
PYTORCH Mongolian script OCR to be released

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
bichig2cyrillic		bichig2cyrillic
datasets		datasets
misc		misc
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md