python 2.7 - Requests issue decoding gzip -
i'm trying pull large number of text files website using requests package of files available outright text , others compressed text files.
tmphtml = 'https://website.com/csv/pwr/somedata.dat.gz' tmpreq = requests.get(tmphtml, proxies = proxy_w_auth, auth = (usr, pw))
when pull uncompressed files, works when pull 1 of compressed files lots of following:
'\x1f\x8b\x08\x08\xe5\xc6\xd9a\x00\x03somedata.dat\x00\xa5\x9d\xcbn\x1c\xb9\x19\x85\xf7\x01\xf2\x0e\xfd\x00q\xa9x,^j\xa9\xc8\x9a\xb1\x9dx\x16dm\x12/\r\x8c\x0712\x19\x0f\xb2\t\x02\xf4\xc3\xa7\xba\xeem\x9e\x9f<\xa46s\x93\xf1\r\x8b\xfd\x7fl\x9e\xe2e/\xcfwo\x1eno\xee^\x1e\xceo\x7f\xfa\xf3\xf9\xe9\xf9\xe3\x9b\x9f\xee_\xce\x9f^\x9e\xdf=\x9d\xef?>\xbe<\xdf\x8d\xff\xba\xfe\xc3\xe9\xe5\xf3\xd3\xc3\xf4\xc3\xbf\x8c\x7f{xy\xf9\xeb\xc3\x87\x87\xc7\x97\xd3\xd3\xf3\xbb\xfb\x87\xf3\xe3\xc3\xcb\xe9\xfe\xed\xdd\xe3\x8f\x0f\xe7\x87\x7f<\xbd{\xbe{y\xf7\xf1qb\xff\xf1\x0f\xeav\xdfvmk\xce\xf7\xdf~;\xff\xf0\xed\xb7\xd3\xa7\xff~\xf9\xfd\xe6\xe9\xeb\x97\x7f\xfd\xe9\xf4\xc3\xd3\xe9\x97\xef\xff9]\x10\xeav-\x7f\xec\xdd\xe3\xf9\x87\xf3\xb9w\x8d\xf6\xe7\x1b\xd3\xf4n\xfc\x99\x9e\x7fh\xd3\xba\x90f\x1ak\xce7\xbaq\xe3\x8f:_\x06\xd31ldu\xe3_tq\xc3z\x91\xd5\xdfvc\x19\xcb\x84,\xdd\xb8\x11\xa6\x9a\xce\x8c?+m\x99\ri\xf6\xc2\xb9i\xc7\xa6\xd9[\xdd\x96\xc1\\\x003vn\xda\xf8\x83\xd2\xa7\xf4\x12\xca\x17?\xe2\x10u\xd8\xe5\xf9\xc6\xa7\x1c\x8a\x1fp\xb5
i can see file name in beginning of string returned i'm not sure how can extract content. according requests documentation, should automatically decompressing gz files?
http://requests.readthedocs.org/en/latest/community/faq/
the response object looks has gzip in headers well:
{'accept': '/', 'connection': 'keep-alive', 'accept-encoding': 'gzip, deflate', 'user-agent': 'python-requests/2.7.0 cpython/2.7.10 windows/7'}
any suggestions appreciated.
sometimes web clients request server compress file before sending it. not .gz
files, mind you, since wouldn't compress twice. cuts down file size, large text files. client decompresses automatically before displaying user. requests docs in question describe. not have worry use-case.
to decompress gzipped file, have either decompress in memory using gzip (part of standard lib) or write disk in 'wb'
mode , use gzip
utility.
Comments
Post a Comment