Skip to content

Latest commit

 

History

History
 
 

5_translation

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

5 - Translation

Preface

Here we implement the full Transformer model on the IWSLT 2016 de-en dataset, a much smaller dataset than the WMT dataset used by Vawani et al., but sufficient to demonstrate the model's capabilities.

Since the IWSLT dataset is much smaller, we can use a smaller set of hyperparameters, as compared to the original Transformer model. Specifically, we use 1 encoder layer instead of 6, hidden dimension size of 64 rather than 512 and only 4 heads rather than 8 for multihead-attention. Finally, I use learned positional encodings for both encoder and decoder, rather than using the sinusoidal functions.

The training data and pretrained model is available here for testing.

This README is still WIP but the model should work fine.

Description

Plain old translation task, in this case translating German to English using the training set from IWSLT 2016.

For example:

Input:
Der Großteil der Erde ist Meerwasser.
Output:
Most of the planet is ocean water.

I implement the entire Transformer model, with certain changes highlighted in the Preface above. You can try translating immediately with the pretrained model by running the following command:

$ python3 main.py --test --line=10

Change the --line parameter for a different sample.

Commands

Training

$ python3 main.py --train

This trains a Transformer model with default parameters:

  • Training steps: --steps=50000
  • Batchsize: --batchsize=64
  • Learning rate: --lr=1e-4
  • Savepath: --savepath=models/
  • Encoding dimensions: --hidden=64
  • Encoder layers: --enc_layers=1
  • Decoder layers: --dec_layers=6
  • Number of heads: --heads=4

The model will be trained on the Translation Task with default parameters:

  • Max sequence length: --max_len=20

Testing

$ python3 main.py --test
or
$ python3 main.py --test --line=10

This tests the trained model. You can specify a particular line using --line, otherwise it defaults to the first sample.

You can also use the --plot flag to plot the final encoder-decoder attention heatmaps.

Help

$ python3 main.py --help

Run with the --help flag to get a list of possible flags.

Details

Skip ahead to the Model section for details about attention.

Input and Output

Typical of translation tasks, we first preprocess the dataset to generate a dictionary mapping each token to an index. Here we use two dictionaries, one for English and one for German, although some papers do use a single dictionary for both source and target languages. With reference to the .json files, the index of each token is simply its position in the list. The dictionaries are generated by running make_dict.py.

The Task class's next_batch method will generate three arrays:

  1. The one-hot encoded German sentences
  2. The one-hot encoded English sentences with <S> added to the beginning ie. shifting the sentences one token to the right; this serves as the decoder input
  3. The one-hot encoded English sentences; this serves as the labels or the decoder outputs

Model

WIP more here later.

Some interesting results.

Testing with --line=4

Input :
ich denke das problem ist dass wir das meer für zu selbstverständlich halten

Truth :
and the problem i think is that we take the ocean for <UNK>

Output:
i think the problem is that we take the ocean for <UNK>
task_1

Both words in the ocean attends to meer for all four heads. In addition, für zu selbstverständlich halten is a German phrase that means take for granted in English and here we see that the tokens take and <UNK> attends strongly to selbstverständlich.

Testing with --line=10

Input :
die meisten tiere leben in den ozeanen

Truth :
most of the animals are in the <UNK>

Output:
most animals live in the <UNK>
task_1

In this case, we see that the token most attends to meisten and ignores the die at the beginning of the input German sentence.

Testing with --line=153

Input :
übrigens ist das zeug <UNK>

Truth :
this stuff is <UNK> as <UNK> by the way

Output:
by the way that's <UNK> stuff
task_1

Here the tokens by the way all attend strongly to übrigens, which is the German parallel for the English phrase. In addition, we also see that the English translation <UNK> stuff correctly flips the order of the German tokens zeug <UNK> (where zeug means stuff).