前言:
现在看官们对“apachepoi接口列表”可能比较关切,看官们都想要分析一些“apachepoi接口列表”的相关内容。那么小编同时在网上网罗了一些对于“apachepoi接口列表””的相关知识,希望姐妹们能喜欢,咱们一起来了解一下吧!HSSF是POI对Excel-97(-2007)文件操作的纯Java实现。XSSF是POI对Excel 2007 OOXML(.xlsx)文件操作的纯Java实现。从POI 3.8版本开始,提供了一种基于XSSF的低内存占用的API(SXSSF)。
POI提供两种读写API模型:事件模型和用户模型。事件模型是基于流的方式实现,使用sax(simple api for XML)模型进行内容解析,对CPU和内存的消耗小,但无法进行写操作。用户模型基于内存树的方式实现,使用DOM进行excel的解析,对CPU和内存的消耗大,但能够以面向对象的方式进行操作,可读可写。
Event API (HSSF Only)
它可以以比较小的内存来读一个xls文件,为了使用这个API,你需要:
(1)创建一个org.apache.poi.hssf.eventmodel.HSSFRequest的实例
(2)实现org.apache.poi.hssf.eventmodel.HSSFListener接口的类。
(3)HSSFRequest.addListener(yourlistener,recordid)注册(2)中实现的类,recordid应该是org.apache.poi.hssf.record的属性。或者也可认使用HSSFRequest.addListerForAllRecords(yourlistener)。
(4)创建一个org.apache.poi.poifs.filesystem.FileSystem实例, 并把XLS文件通过输入流方式输入。
(5)可以把(4)中创建的实例通过HSSFEventFactory.processWorkbookEvents(request,Filesystem)方法或HSSFEventFactory.processEvents(request,inputstream)与绑定。
(6)这样listener就会调用processRecord(Record)方法,直到整个文件读完。
/**
* This example shows how to use the event API for reading a file.
*/
public class EventExample
implements HSSFListener
{
private SSTRecord sstrec;
/**
* This method listens for incoming records and handles them as required.
* @param record The record that was found while reading.
*/
public void processRecord(Record record)
{
switch (record.getSid())
{
// the BOFRecord can represent either the beginning of a sheet or the workbook
case BOFRecord.sid:
BOFRecord bof = (BOFRecord) record;
if (bof.getType() == bof.TYPE_WORKBOOK)
{
System.out.println("Encountered workbook");
// assigned to the class level member
} else if (bof.getType() == bof.TYPE_WORKSHEET)
{
System.out.println("Encountered sheet reference");
}
break;
case BoundSheetRecord.sid:
BoundSheetRecord bsr = (BoundSheetRecord) record;
System.out.println("New sheet named: " + bsr.getSheetname());
break;
case RowRecord.sid:
RowRecord rowrec = (RowRecord) record;
System.out.println("Row found, first column at "
+ rowrec.getFirstCol() + " last column at " + rowrec.getLastCol());
break;
case NumberRecord.sid:
NumberRecord numrec = (NumberRecord) record;
System.out.println("Cell found with value " + numrec.getValue()
+ " at row " + numrec.getRow() + " and column " + numrec.getColumn());
break;
// SSTRecords store a array of unique strings used in Excel.
case SSTRecord.sid:
sstrec = (SSTRecord) record;
for (int k = 0; k < sstrec.getNumUniqueStrings(); k++)
{
System.out.println("String table value " + k + " = " + sstrec.getString(k));
}
break;
case LabelSSTRecord.sid:
LabelSSTRecord lrec = (LabelSSTRecord) record;
System.out.println("String cell found with value "
+ sstrec.getString(lrec.getSSTIndex()));
break;
}
}
/**
* Read an excel file and spit out what we find.
*
* @param args Expect one argument that is the file to read.
* @throws IOException When there is an error processing the file.
*/
public static void main(String[] args) throws IOException
{
// create a new file input stream with the input file specified
// at the command line
FileInputStream fin = new FileInputStream(args[0]);
// create a new org.apache.poi.poifs.filesystem.Filesystem
POIFSFileSystem poifs = new POIFSFileSystem(fin);
// get the Workbook (excel part) stream in a InputStream
InputStream din = poifs.createDocumentInputStream("Workbook");
// construct out HSSFRequest object
HSSFRequest req = new HSSFRequest();
// lazy listen for ALL records with the listener shown above
req.addListenerForAllRecords(new EventExample());
// create our event factory
HSSFEventFactory factory = new HSSFEventFactory();
// process our events based on the document input stream
factory.processEvents(req, din);
// once all the events are processed close our file input stream
fin.close();
// and our document input stream (don't want to leak these!)
din.close();
System.out.println("done.");
}
}
Event API(XSSF和SAX)
使用较少的内存读取XLSX文件。
import java.io.InputStream;
import java.util.Iterator;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
public class ExampleEventUserModel {
public void processOneSheet(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
// To look up the Sheet Name / Sheet Order / rID,
// you need to process the core Workbook stream.
// Normally it's of the form rId# or rSheet#
InputStream sheet2 = r.getSheet("rId2");
InputSource sheetSource = new InputSource(sheet2);
parser.parse(sheetSource);
sheet2.close();
}
public void processAllSheets(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
Iterator<inputstream> sheets = r.getSheetsData();
while(sheets.hasNext()) {
System.out.println("Processing new sheet:\n");
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
System.out.println("");
}
}
public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
XMLReader parser =
XMLReaderFactory.createXMLReader(
"org.apache.xerces.parsers.SAXParser"
);
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
return parser;
}
/**
* See org.xml.sax.helpers.DefaultHandler javadocs
*/
private static class SheetHandler extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
if(name.equals("c")) {
// Print the cell reference
System.out.print(attributes.getValue("r") + " - ");
// Figure out if the value is an index in the SST
String cellType = attributes.getValue("t");
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
// Clear contents cache
lastContents = "";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
if(name.equals("v")) {
System.out.println(lastContents);
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
lastContents += new String(ch, start, length);
}
}
public static void main(String[] args) throws Exception {
ExampleEventUserModel example = new ExampleEventUserModel();
example.processOneSheet(args[0]);
example.processAllSheets(args[0]);
}
}</inputstream>
SXSSF
SXSSF位于org.apache.poi.xssf.streaming包下,用于spreadsheets比较大时,api和XSSF兼容。SXSSF通过一个滑动窗口来限制访问Row的数量从而达到低内存占用,不在窗口的rows不可再访问。
可以在创建SXSSFWorkbook(int windowSize)指定窗口大小,或者针对sheet设置窗口大小
?
1
SXSSFSheet#setRandomAccessWindowSize(int windowSize)
当通过createRow()创建一个新Row时,窗口中的行数已经超过了固定的大小,索引最小的会被flush。
默认的窗口大小是100,当窗口大小是-1时,表示窗口无限大。SXSSF会生成临时文件,所以必须通过dispose方法进行删除。
SXSSFWorkbook 默认使用inline strings而不是一个共享的string table,这样就不用在内存中保存数据,但这样可能会使产生的文档与客户端不兼容。如果开启String table可能会占用大量内存,这种权衡需要用户决定。
另外是否会消耗内存取决于你要使用的特性,比如:merged region , hyperlinks, comments等。
import junit.framework.Assert;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
public static void main(String[] args) throws Throwable {
SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
Sheet sh = wb.createSheet();
for(int rownum = 0; rownum < 1000; rownum++){
Row row = sh.createRow(rownum);
for(int cellnum = 0; cellnum < 10; cellnum++){
Cell cell = row.createCell(cellnum);
String address = new CellReference(cell).formatAsString();
cell.setCellValue(address);
}
}
// Rows with rownum < 900 are flushed and not accessible
for(int rownum = 0; rownum < 900; rownum++){
Assert.assertNull(sh.getRow(rownum));
}
// ther last 100 rows are still in memory
for(int rownum = 900; rownum < 1000; rownum++){
Assert.assertNotNull(sh.getRow(rownum));
}
FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");
wb.write(out);
out.close();
// dispose of temporary files backing this workbook on disk
wb.dispose();
}
SXSSF会把sheet数据放在临时(一个sheet一个临时文件),所以临时文件可能很大,例如一个20MB的csv数据,xml临时文件可能多在1G,所以如果临时文件的大小是一个问题,可以使SXSSF使用gzip压缩。
SXSSFWorkbook wb = new SXSSFWorkbook();
wb.setCompressTempFiles(true); // temp files will be gzipped
标签: #apachepoi接口列表