This repo will contain a list of useful resources for Mongolian NLP and also my own experiments mostly with PyTorch.
DATASET
LJSpeech like male voice TTS dataset created from the Mongolian Bible- used in tugstugi/pytorch-dc-tts
- use dl_and_preprop_dataset.py to download the audio files
DATASET
Eduge news classification dataset- used to train the Eduge.mn production news classifier
- 75K news with 9 categories:
урлаг соёл
,эдийн засаг
,эрүүл мэнд
,хууль
,улс төр
,спорт
,технологи
,боловсрол
andбайгал орчин
DATASET
11-11.mn government agency complaint dataset- 80K with 5 categories:
санал хүсэлт
,гомдол
,шүүмжлэл
,талархал
andөргөдөл
- 80K with 5 categories:
DATASET
online news corpus- 700 million words
DEMO
HMM TTS online demo of the Mongolian National University- 1x male and 2x female voices
PYTORCH
tugstugi/pytorch-dc-ttsDEMO
Colab online demoDATASET
LJSpeech like male voice dataset created from the Mongolian Bible
TF
tugstugi/Tacotron-2 fork of Rayhane-mamah/Tacotron-2 adapted for the Mongolian Bible datasetDEMO
Colab online demo
PYTORCH
tugstugi/mongolian-speech-recognition- single voice demo
DEMO
Cyrillic to Mongolian script converter demo of the Inner Mongolian universityPYTORCH
tugstugi/bichig2cyrillic Mongolian script to (and back) cyrillic converterPYTORCH
Mongolian script OCR to be released