2008-04-16
更新lucene到2.3 建立索引遇到的困扰
关键字: lucene
为了提高索引的速度,利用缓存的方式,达到一定量的时候flush。
lucene2.2版本代码这样写:(部分代码)
IndexWriter fsWriter = new IndexWriter(fsDir,analyzer, true);
addDocument(fsWriter, s); //添加一条信息到Document
if (fsWriter.ramSizeInBytes() > IParaConf.MAXMEMERY) {// 重点:判断内存使用量,大于指定的flush到硬盘。
System.out.println("flush...");
fsWriter.flush();
}
看了lucene2.3的API发现多了一个方法:
public void setRAMBufferSizeMB(double mb)
Determines the amount of RAM that may be used for buffering added documents before they are flushed as a new Segment. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.
When this is set, the writer will flush whenever buffered documents use this much RAM. Pass in DISABLE_AUTO_FLUSH to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.
The default value is DEFAULT_RAM_BUFFER_SIZE_MB
于是乎我想既然有了这个方法,我就可以设定好需要的内存,不用再如上那样判断内存使用量了。 可是替换完发现,make index 速度奇慢无比。看一下索引目录,原来在不停的写索引归并文件。 看来这个方法并没有起作用。 无奈,又替换回lucene2.2版本了。
lucene2.2版本代码这样写:(部分代码)
IndexWriter fsWriter = new IndexWriter(fsDir,analyzer, true);
addDocument(fsWriter, s); //添加一条信息到Document
if (fsWriter.ramSizeInBytes() > IParaConf.MAXMEMERY) {// 重点:判断内存使用量,大于指定的flush到硬盘。
System.out.println("flush...");
fsWriter.flush();
}
看了lucene2.3的API发现多了一个方法:
public void setRAMBufferSizeMB(double mb)
Determines the amount of RAM that may be used for buffering added documents before they are flushed as a new Segment. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.
When this is set, the writer will flush whenever buffered documents use this much RAM. Pass in DISABLE_AUTO_FLUSH to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.
The default value is DEFAULT_RAM_BUFFER_SIZE_MB
于是乎我想既然有了这个方法,我就可以设定好需要的内存,不用再如上那样判断内存使用量了。 可是替换完发现,make index 速度奇慢无比。看一下索引目录,原来在不停的写索引归并文件。 看来这个方法并没有起作用。 无奈,又替换回lucene2.2版本了。


评论
在2.2里 我只要把setMaxBufferedDocs 设置的足够大 就不会提交。我可以判断到内存一定量的时候再手动提交。
2.3里说只有到达setRAMBufferSizeMB的大小的时候才会flush到硬盘。 我设置了这个参数没起作用,不知为何!
而且我把2.2替换成2.3的jar包,不改原来的代码,还是频繁的flush到硬盘,根本不理会setMaxBufferedDocs设置的值