Skip to content

TIXFeniks/neurips2019_intrus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INTRUS

Supplementary code for NeurIPS submission "Sequence Modeling with Unconstrained Generation Order"(arxiv). This code trains and applies a machine translation model that can generate sequences in arbitrary order

orders

What do i need to run it?

  • A machine with some CPU (preferably 4+) and at least one GPU
  • The optimal performance is reached when running on 8 GPUs
  • Some popular Linux x64 distribution
    • Tested on Ubuntu16.04, should work fine on any popular linux64 and even MacOS;
    • Windows and x32 systems may require heavy wizardry to run;
    • When in doubt, use Docker, preferably GPU-enabled (i.e. nvidia-docker)

How do I run it?

  1. Setup environment
  • Clone or download this repo. cd yourself to it's root directory.
  • Get a python distribution. Anaconda works fine.
  • Install packages from requirements.txt
  1. Prepare data
  • Grab the WMT English-Russian dataset from http://statmt.org/ (or another language of your choosing)
  • Tokenize it with mosestokenizer or any other reasonable tokenizer. It is also recommended that you lowercase the data.
  • Learn and apply BPE with subword-nmt
  • You can find example preprocessing pipelines here.
  1. Run jupyter notebook
  • All the training notebooks are in the ./notebooks/ folder
  • Before you run the first cell, optionally set %env CUDA_VISIBLE_DEVICES=### to devices that you plan to use.
  • Follow the code as it loads data, trains model and reports training progress.
  • NOTE: The BLEU metric measured in the notebook is not the one used for evaluation. See sacrebleu.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published