Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no way to set msgpack max_bin_len limits use of cache to small files #200

Closed
cdent opened this issue Jan 21, 2019 · 2 comments
Closed

no way to set msgpack max_bin_len limits use of cache to small files #200

cdent opened this issue Jan 21, 2019 · 2 comments

Comments

@cdent
Copy link

cdent commented Jan 21, 2019

When trying to use cachecontrol with very large files (disk images in the case I'm considering), there's no easy way to pass a max_bin_len to msgpack.loads to say "yeah, I really do want to be able to load huge files".

cachecontrol will write the huge files, but then when it comes round to read them, msgpack will produce a ValueError and cachecontrol will return None to the deserialization routines.

It appears that the way to hack around it would be to subclass Serializer and replace loads_v4 to give some args to msgpack.loads.

Is there a better way? Is this something that you'd be interested in seeing as a kwarg passed down from CacheControl?

cdent added a commit to cdent/etcd-compute that referenced this issue Jan 21, 2019
The cachecontrol library can't cache large images [1], which we
work around here by using our own serializer to allow msgpage
to load big files.

It's quite likely this caching is the wrong way to go, and
missing some important details, but it is a useful way to
speed up the experimentation.

[1] psf/cachecontrol#200
@hexagonrecursion
Copy link
Contributor

It appears that this got fixed somehow.

bob $ pip3 freeze | egrep -i 'requests|msgpack|cache'
CacheControl==0.12.6
msgpack==1.0.2
requests==2.25.1

alice $ (
printf 'HTTP/1.0 200 OK\n'
printf 'Date: '; LC_ALL=C date -u '+%a, %d %b %Y %X %Z'
printf 'Content-Length: 500000000\n'
printf 'Cache-Control: max-age=6000\n\n'
yes | dd iflag=count_bytes count=500MB
) | nc -l 8000

bob $ python3 -c '                                
import requests
import cachecontrol.caches
s = requests.session()
c = cachecontrol.caches.FileCache("./cache")
a = cachecontrol.CacheControlAdapter(c)
s.mount("http://", a)
print(len(s.get("http://localhost:8000/foo.txt").content))
'
500000000

bob $ python3 -c '
import requests
import cachecontrol.caches
s = requests.session()
c = cachecontrol.caches.FileCache("./cache")
a = cachecontrol.CacheControlAdapter(c)
s.mount("http://", a)
print(len(s.get("http://localhost:8000/foo.txt").content))
'
500000000

The second request is definitely served from cache because nc stops listening after the first client disconnects.

500000000 is enough to exceed the default max_bin_len:

bob $ MSGPACK_PUREPYTHON=1 python -c '
import msgpack, sys
with open(sys.argv[1], "rb") as f:
    f.read(5)
    u = msgpack.Unpacker(f)
    u.unpack()
' ./cache/5/c/a/8/b/5ca8b7d8184924c60c5c454a874bf5ed7b4741d0660cb7d295185d63 
Traceback (most recent call last):
  File "<string>", line 6, in <module>
  File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 723, in unpack
    ret = self._unpack(EX_CONSTRUCT)
  File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 671, in _unpack
    ret[key] = self._unpack(EX_CONSTRUCT)
  File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 671, in _unpack
    ret[key] = self._unpack(EX_CONSTRUCT)
  File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 625, in _unpack
    typ, n, obj = self._read_header(execute)
  File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 467, in _read_header
    raise ValueError("%s exceeds max_bin_len(%s)" % (n, self._max_bin_len))
ValueError: 500000000 exceeds max_bin_len(104857600)

@woodruffw
Copy link
Member

I've tried reproducing a variant of this as part of #336, but failed to. I'm going to close thisn out and track any follow-ups there. Thanks all!

(If anybody has a reproducer for this, it would be greatly appreciated.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants