Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Language Model

This folder contains code to do the following:

Universal models are motivated by 1. The two different "universal" models are explored: a recurrent model (RecLM), based off 2, and a convolutional model (ConvLM), motivated by 3.

1. Preprocessing

SpaCy is used to tokenize the wikitext dataset. Parallel processing is used for efficiency.

2a. ConvLM

The convolutional language model consists of temporal convolutional blocks, which themselves are composed of variationally weight-dropped convolutional layers. In addition, each block is residual.

The convolutional layers are variationally weight-dropped to mimic the variational weight drop employed in the recurrent language model. The motivation to do this is to allow the same weights to be dropped across multiple timesteps, so that all timesteps in a convolution's output sequence will have been processed in the same way.

Variational dropout is also used for the embedding layer.

Training results:

conv_results

2b. RecLM

The recurrent language model consists of weight dropped RNNs stacked on top of each other, as happens in 2.

Variational dropout is used for the embedding layer.

Training results:

rec_results

References

  1. Universal Language Model Fine-tuning for Text Classification

  2. Regularizing and Optimizing LSTM language models

  3. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modelling