ScratchFormers

implementing transformers from scratch.

Attention is all you need.

Starters

einops
attentions
- multi-head causal attention
- multi-head cross attention
- multi-head grouped query attention (torch + einops)
positional embeddings
- rotary positional embeddings (RoPE)
Low-Rank Adaptation (LoRA)
- implementing LoRA based on this wonderful tutorial by Sebastian Raschka
- finetuning LoRA adapted deberta-v3-base on IMDb dataset

Models

simple Vision Transformer
- for process, check building_ViT.ipynb
- model implementation
- used mean pooling instead of [class] token
GPT2
- for process, check buildingGPT2.ipynb
- model implementation
- built in such a way that it supports loading pretrained openAI/huggingface weights gpt2-load-via-hf.ipynb
- for my own custom trained causal LM, checkout
OpenAI CLIP
- implemented ViT-B/32 variant
- for process, check building_clip.ipynb
- inference req: install clip for tokenization and preprocessing: pip install git+https://github.com/openai/CLIP.git
- model implementation
- zero-shot inference code
- built in such a way that it supports loading pretrained openAI weights and IT WORKS!!!
- My lighter implementation of this using existing image and language models trained on Flickr8k dataset
Encoder Decoder Transformer
- for process, check building_encoder-decoder.ipynb
- model implementation
- src_mask for encoder is optional but is nice to have since it is used to mask out the pad tokens so attention is not considered for those tokens.
- used learned embeddings for position instead of sin/cos as per the OG.
- I trained a model for multilingual machine translation.
  - Translates english to hindi and telugu.
  - change: single encoder & decoder embedding layer since I used a single tokenizer.
BERT - MLM
- for process of masked language modeling, check masked-language-modeling.ipynb
- model implementation
- simplification: for pre-training no use of [CLS] & [SEP] tokens since I only built the model for masked language modeling and not for next sentence prediction.
- I trained an entire model on the wikipedia dataset
- once, pretrained the MLM head can be replaced with any other downstream task head.
ViT MAE
- Paper: Masked autoencoders are scalable vision learners
- model implementation
- for process, check: building-vitmae.ipynb
- Quite reliant on the original code released by authors.
- Only simplification: No [CLS] token so used mean pooling
- The model can be trained 2 ways:
  - For pretraining: the decoder can be thrown away and the encoder can be used for downstream tasks
  - For visualization: can be used to reconstruct masked images.
- I trained a smaller model for reconstruction visualization: ViTMAE on Animals Dataset

Requirements

einops
torch
torchvision
numpy
matplotlib
pandas

Here's my puppy's picture:

God is our refuge and strength, a very present help in trouble.
Psalm 46:1

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
BERT-MLM		BERT-MLM
GPT2		GPT2
LoRA		LoRA
OpenAI-CLIP		OpenAI-CLIP
ViT		ViT
encoder-decoder		encoder-decoder
vitmae		vitmae
LICENSE		LICENSE
README.md		README.md
attentions.ipynb		attentions.ipynb
einops.ipynb		einops.ipynb
positional-embeddings.ipynb		positional-embeddings.ipynb
sumo.jpg		sumo.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScratchFormers

implementing transformers from scratch.

Starters

Models

Requirements

About

Releases

Packages

Languages

License

shreyanshhub/transformers_shreyy

Folders and files

Latest commit

History

Repository files navigation

ScratchFormers

implementing transformers from scratch.

Starters

Models

Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages