java - Is there a uniform ExcelExtractor class and a factory for both xls and xlsx files? -


is there common class , implementation of excelextractor interface handles, uniformly, extraction of text xls , xlsx sources? maybe, in ss package.

i looking allow me like, getting right implementation factory, based on file type.

right now, having explicitly use org.apache.poi.hssf.extractor.excelextractor xls files , org.apache.poi.xssf.extractor.xssfexcelextractor xlsx.

for example, explicit approach xls:

inputstream inp = new fileinputstream(path); hssfworkbook wb = new hssfworkbook(new poifsfilesystem(inp)); excelextractor extractor = new excelextractor(wb);  extractor.setformulasnotresults(true); extractor.setincludesheetnames(false); string text = extractor.gettext(); 

i can implement own factory, before thought ask see if there common approach handles both formats (that ss package for).

two options

first, if really want stick old apache poi text extractors, use extractorfactory class. identify type, , create extractor you

however, better option - apache tika. tika builds on top of poi (and lots of others), , gives plain text extraction (+detection +xhtml +more!) wide range of file formats. you'd call tika, ask text, , no matter type. see tika examples one started


Comments

Popular posts from this blog

1111. appearing after print sequence - php -

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

Ruby on Rails, ActiveRecord, Postgres, UTF-8 and ASCII-8BIT encodings -