c++ - Boost 1.59 not decompressing all bzip2 streams -


i've been trying decompress .bz2 files on fly , line-by-line speak files i'm dealing massive uncompressed (region of 100 gb uncompressed) wanted add solution saves disk space.

i have no problems decompressing using files compressed vanilla bzip2 files compressed pbzip2 decompress first bz2 stream finds. bugtracker relates problem: https://svn.boost.org/trac/boost/ticket/3853 lead believe fixed past version 1.41. i've checked bzip2.hpp file , contains 'fixed' version , i've checked version of boost used in program 1.59.

the code here:

cout<<"warning bzip2 support little buggy!"<<endl;  //open file here trans_file.open(files[i].c_str(), std::ios_base::in |  std::ios_base::binary);  //set boost bzip2 compression boost::iostreams::filtering_istream in; in.push(boost::iostreams::bzip2_decompressor()); in.push(trans_file); std::string str;  //begin reading while(std::getline(in, str)) {     std::stringstream stream(str);     stream>>id_f>>id_i>>aif;     /* stuff values here*/ } 

any suggestions great. thanks!

you right.

it seems changeset #63057 fixes part of issue.

the corresponding unit-test work, though. uses copy algorithm (also on composite<> instead of filtering_istream, if relevant).

i'd open defect or regression. include file exhibits problem, of course. me it's reproduced using /etc/dictionaries-common/words compressed pbzip2 (default options).

i have test.bz2 here: http://7f0d2fd2-af79-415c-ab60-033d3b494dc9.s3.amazonaws.com/test.bz2

here's test program:

#include <boost/iostreams/filtering_stream.hpp> #include <boost/iostreams/filter/bzip2.hpp> #include <boost/iostreams/stream.hpp> #include <fstream> #include <iostream>  namespace io = boost::iostreams;  void multiple_member_test(); // unit tests in changeset #63057  int main() {     //multiple_member_test();     //return 0;      std::ifstream trans_file("test.bz2", std::ios::binary);      //set boost bzip2 compression     io::filtering_istream in;     in.push(io::bzip2_decompressor());     in.push(trans_file);      //begin reading     std::string str;     while(std::getline(in, str))     {         std::cout << str << "\n";     } }  #include <boost/iostreams/compose.hpp> #include <boost/iostreams/copy.hpp> #include <boost/iostreams/device/array.hpp> #include <boost/iostreams/device/back_inserter.hpp> #include <cassert> #include <sstream>  void multiple_member_test()  // unit tests in changeset #63057 {      std::string      data(20ul << 20, '*');     std::vector<char>  temp, dest;       // write compressed data temp, twice in succession      io::filtering_ostream out;      out.push(io::bzip2_compressor());      out.push(io::back_inserter(temp));      io::copy(boost::make_iterator_range(data), out);      out.push(io::back_inserter(temp));      io::copy(boost::make_iterator_range(data), out);       // read compressed data temp dest      io::filtering_istream in;      in.push(io::bzip2_decompressor());      in.push(io::array_source(&temp[0], temp.size()));      io::copy(in, io::back_inserter(dest));       // check dest consists of 2 copies of data      assert(data.size() * 2 == dest.size());      assert(std::equal(data.begin(), data.end(), dest.begin()));      assert(std::equal(data.begin(), data.end(), dest.begin() + dest.size() / 2));       dest.clear();      io::copy(              io::array_source(&temp[0], temp.size()),              io::compose(io::bzip2_decompressor(), io::back_inserter(dest)));       // check dest consists of 2 copies of data      assert(data.size() * 2 == dest.size());      assert(std::equal(data.begin(), data.end(), dest.begin()));      assert(std::equal(data.begin(), data.end(), dest.begin() + dest.size() / 2));  }  

Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -