A dead-simple image search and image-text matching system for Bangla using CLIP (Contrastive Language–Image Pre-training)
python >= 3.9
pip install -r requirements.txt
- Download the model weights and place inside the
models
folder.
The model consists of an EfficientNet / ResNet image encoder and a BERT text encoder and was trained on multiple datasets from Bangla image-text domain. To run the app
,
streamlit run app.py
Live Demo: HuggingFace Space
- Training Code: bangla-CLIP
- Article: medium