This folder contains code to do the following:
- Preprocess the wikitext datasets
- Train a universal language model on the datasets. This model can be recurrent or feedforward.
- Finetune the language model on a different task. Specifically, the Toxic comment classification challenge
Universal models are motivated by 1. The two different "universal" models are explored: a recurrent model (RecLM), based off 2, and a convolutional model (ConvLM), motivated by 3.
SpaCy is used to tokenize the wikitext dataset. Parallel processing is used for efficiency.
The convolutional language model consists of temporal convolutional blocks, which themselves are composed of variationally weight-dropped convolutional layers. In addition, each block is residual.
The convolutional layers are variationally weight-dropped to mimic the variational weight drop employed in the recurrent language model. The motivation to do this is to allow the same weights to be dropped across multiple timesteps, so that all timesteps in a convolution's output sequence will have been processed in the same way.
Variational dropout is also used for the embedding layer.
Training results:
The recurrent language model consists of weight dropped RNNs stacked on top of each other, as happens in 2.
Variational dropout is used for the embedding layer.
Training results: