Tackling Kaggle's Quora question pairs competition
Blog, where I describe my exploration of the dataset : https://medium.com/@gabrieltseng/natural-language-processing-with-quora-9737b40700c8
In this file, I test various ways of manipulating the data before inputting it into my neural network, including cleaning and adding leaky features.
Training_Cleaned_NeuralNetwork.ipynb
I then train to convergence a neural network (without leaky features).
I briefly explore SpaCy's capabilities using this dataset, for potential future use.
- Ensembling, particularly if I train a neural network with word2vec instead of GloVe embeddings
- Identifying 'legal' leaky features (such as question lengths).
- Further optimization of hyperparameters.