-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no way to set msgpack max_bin_len limits use of cache to small files #200
Comments
The cachecontrol library can't cache large images [1], which we work around here by using our own serializer to allow msgpage to load big files. It's quite likely this caching is the wrong way to go, and missing some important details, but it is a useful way to speed up the experimentation. [1] psf/cachecontrol#200
It appears that this got fixed somehow. bob $ pip3 freeze | egrep -i 'requests|msgpack|cache'
CacheControl==0.12.6
msgpack==1.0.2
requests==2.25.1
alice $ (
printf 'HTTP/1.0 200 OK\n'
printf 'Date: '; LC_ALL=C date -u '+%a, %d %b %Y %X %Z'
printf 'Content-Length: 500000000\n'
printf 'Cache-Control: max-age=6000\n\n'
yes | dd iflag=count_bytes count=500MB
) | nc -l 8000
bob $ python3 -c '
import requests
import cachecontrol.caches
s = requests.session()
c = cachecontrol.caches.FileCache("./cache")
a = cachecontrol.CacheControlAdapter(c)
s.mount("http://", a)
print(len(s.get("http://localhost:8000/foo.txt").content))
'
500000000
bob $ python3 -c '
import requests
import cachecontrol.caches
s = requests.session()
c = cachecontrol.caches.FileCache("./cache")
a = cachecontrol.CacheControlAdapter(c)
s.mount("http://", a)
print(len(s.get("http://localhost:8000/foo.txt").content))
'
500000000 The second request is definitely served from cache because nc stops listening after the first client disconnects. 500000000 is enough to exceed the default max_bin_len: bob $ MSGPACK_PUREPYTHON=1 python -c '
import msgpack, sys
with open(sys.argv[1], "rb") as f:
f.read(5)
u = msgpack.Unpacker(f)
u.unpack()
' ./cache/5/c/a/8/b/5ca8b7d8184924c60c5c454a874bf5ed7b4741d0660cb7d295185d63
Traceback (most recent call last):
File "<string>", line 6, in <module>
File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 723, in unpack
ret = self._unpack(EX_CONSTRUCT)
File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 671, in _unpack
ret[key] = self._unpack(EX_CONSTRUCT)
File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 671, in _unpack
ret[key] = self._unpack(EX_CONSTRUCT)
File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 625, in _unpack
typ, n, obj = self._read_header(execute)
File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 467, in _read_header
raise ValueError("%s exceeds max_bin_len(%s)" % (n, self._max_bin_len))
ValueError: 500000000 exceeds max_bin_len(104857600) |
I've tried reproducing a variant of this as part of #336, but failed to. I'm going to close thisn out and track any follow-ups there. Thanks all! (If anybody has a reproducer for this, it would be greatly appreciated.) |
When trying to use cachecontrol with very large files (disk images in the case I'm considering), there's no easy way to pass a max_bin_len to msgpack.loads to say "yeah, I really do want to be able to load huge files".
cachecontrol will write the huge files, but then when it comes round to read them, msgpack will produce a ValueError and cachecontrol will return None to the deserialization routines.
It appears that the way to hack around it would be to subclass Serializer and replace
loads_v4
to give some args tomsgpack.loads
.Is there a better way? Is this something that you'd be interested in seeing as a kwarg passed down from
CacheControl
?The text was updated successfully, but these errors were encountered: