c++ - Boost 1.59 not decompressing all bzip2 streams -
i've been trying decompress .bz2 files on fly , line-by-line speak files i'm dealing massive uncompressed (region of 100 gb uncompressed) wanted add solution saves disk space.
i have no problems decompressing using files compressed vanilla bzip2 files compressed pbzip2 decompress first bz2 stream finds. bugtracker relates problem: https://svn.boost.org/trac/boost/ticket/3853 lead believe fixed past version 1.41. i've checked bzip2.hpp file , contains 'fixed' version , i've checked version of boost used in program 1.59.
the code here:
cout<<"warning bzip2 support little buggy!"<<endl; //open file here trans_file.open(files[i].c_str(), std::ios_base::in | std::ios_base::binary); //set boost bzip2 compression boost::iostreams::filtering_istream in; in.push(boost::iostreams::bzip2_decompressor()); in.push(trans_file); std::string str; //begin reading while(std::getline(in, str)) { std::stringstream stream(str); stream>>id_f>>id_i>>aif; /* stuff values here*/ }
any suggestions great. thanks!
you right.
it seems changeset #63057 fixes part of issue.
the corresponding unit-test work, though. uses copy
algorithm (also on composite<>
instead of filtering_istream
, if relevant).
i'd open defect or regression. include file exhibits problem, of course. me it's reproduced using /etc/dictionaries-common/words
compressed pbzip2
(default options).
i have test.bz2
here: http://7f0d2fd2-af79-415c-ab60-033d3b494dc9.s3.amazonaws.com/test.bz2
here's test program:
#include <boost/iostreams/filtering_stream.hpp> #include <boost/iostreams/filter/bzip2.hpp> #include <boost/iostreams/stream.hpp> #include <fstream> #include <iostream> namespace io = boost::iostreams; void multiple_member_test(); // unit tests in changeset #63057 int main() { //multiple_member_test(); //return 0; std::ifstream trans_file("test.bz2", std::ios::binary); //set boost bzip2 compression io::filtering_istream in; in.push(io::bzip2_decompressor()); in.push(trans_file); //begin reading std::string str; while(std::getline(in, str)) { std::cout << str << "\n"; } } #include <boost/iostreams/compose.hpp> #include <boost/iostreams/copy.hpp> #include <boost/iostreams/device/array.hpp> #include <boost/iostreams/device/back_inserter.hpp> #include <cassert> #include <sstream> void multiple_member_test() // unit tests in changeset #63057 { std::string data(20ul << 20, '*'); std::vector<char> temp, dest; // write compressed data temp, twice in succession io::filtering_ostream out; out.push(io::bzip2_compressor()); out.push(io::back_inserter(temp)); io::copy(boost::make_iterator_range(data), out); out.push(io::back_inserter(temp)); io::copy(boost::make_iterator_range(data), out); // read compressed data temp dest io::filtering_istream in; in.push(io::bzip2_decompressor()); in.push(io::array_source(&temp[0], temp.size())); io::copy(in, io::back_inserter(dest)); // check dest consists of 2 copies of data assert(data.size() * 2 == dest.size()); assert(std::equal(data.begin(), data.end(), dest.begin())); assert(std::equal(data.begin(), data.end(), dest.begin() + dest.size() / 2)); dest.clear(); io::copy( io::array_source(&temp[0], temp.size()), io::compose(io::bzip2_decompressor(), io::back_inserter(dest))); // check dest consists of 2 copies of data assert(data.size() * 2 == dest.size()); assert(std::equal(data.begin(), data.end(), dest.begin())); assert(std::equal(data.begin(), data.end(), dest.begin() + dest.size() / 2)); }
Comments
Post a Comment