You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In metaclip/pipeline.py, I find the the function shard_text_loader parsing the .tar format data, including finding .jpeg and .json. I want to kown how these .tar data were organized, and why image data of .jpeg has been downloaded before sub_matching?
Thanks very much!
The text was updated successfully, but these errors were encountered:
it's supposed to be similar as webdataset.
To allow 100% transparency, our sample dataloader reads it via regular python tar api, the tar file is organized as <dataset_dir>/{shard_id % 100}/{shard_id}.tar.
Thank you for your excellent work!
In
metaclip/pipeline.py
, I find the the functionshard_text_loader
parsing the.tar
format data, including finding.jpeg
and.json
. I want to kown how these.tar
data were organized, and why image data of.jpeg
has been downloaded before sub_matching?Thanks very much!
The text was updated successfully, but these errors were encountered: