Lucid Imagination

Secondary links

  • Contact Us
  • Sign Up or Login
  • Downloads
  • Solutions
    • Partners |
    • Blog |
    • Software |
    • Services |
    • Training |
    • Case Studies |
    • Webcasts |
  • Developers
    • Blog |
    • Tech Articles |
    • Community |
    • Docs |
    • Downloads |
    • Whitepapers |
    • Podcasts |
  • About
    • Market Overview |
    • Management |
    • Company News |
    • In the Media |
    • Contact |

beta

Start new search

Back to search results

  1. FromDate
  2. Ian Lea1969-12-31 19:00
  3. "Murdoch, Paul"1969-12-31 19:00
  4. Mark Miller1969-12-31 19:00
  5. "Murdoch, Paul"1969-12-31 19:00
  6. "Murdoch, Paul"1969-12-31 19:00
  7. Erick Erickson1969-12-31 19:00
  8. Mark Miller1969-12-31 19:00

[java-user] Re: Batch Indexing - best practice?

Subject:
Re: Batch Indexing - best practice?
From:
Erick Erickson <erickerickson@...>
Date:
1969-12-31 19:00
What's a document? What's indexing?

Here's what I'd do as a very first step. Time the actual
indexing and report it out. By that I mean how long does
IndexWriter.addDocument() take? If you actually get the
document from wherever first then add all the fields
and add the document, I'd time adding the fields too. The point
is to separate the Lucene stuff from whatever else you do
before trying to fix anything.

The first point of the link Ian provided has the easily-overlooked
phrase "and the slowness is indeed inside Lucene"...

Best
Erick



On Mon, Mar 15, 2010 at 11:02 AM, Murdoch, Paul <PAUL.B.MURDOCH@saic.com>wrote:

Thanks. I'll try lowering the merge factor and see if speed increases. The indexing is threaded....similar to the utility class in Listing 10.1 from Lucene in Action. Search speed is great once the index is built....close to real time. So my main problem is getting the indexing speed fixed. I do use the StandardAnalyzer for most of my fields. What type of performance level should I be trying to hit for indexing (docs/sec)...just to give me an idea of what to shoot for? Paul -----Original Message----- From: java-user-return-45433-PAUL.B.MURDOCH=saic.com@lucene.apache.org [mailto:java-user-return-45433-PAUL.B.MURDOCH=saic.com@lucene.apache.org ] On Behalf Of Mark Miller Sent: Monday, March 15, 2010 10:48 AM To: java-user@lucene.apache.org Subject: Re: Batch Indexing - best practice? On 03/15/2010 10:41 AM, Murdoch, Paul wrote:
Hi, I'm using Lucene 2.9.2. Currently, when creating my index, I'm
calling
indexWriter.addDocument(doc) for each Document I want to index. The Documents aren't large and I'm averaging indexing about 500 documents every 90 seconds. I'd like to try and speed this up....unless 90 seconds for 500 Documents is reasonable. I have the merge factor set
to
1000. Do you have any suggestions for batch indexing? Is there something like indexWriter.addDocuments(Document[] docs) in the API? Thanks. Paul
You should lower that merge factor - thats *really* high. You shouldn't really need much more than 50 or so ... and for search speed your going to want fewer segments anyway - if your just going to end up optimizing at the end, there is no reason for such a large merge factor - you will pay for most of what you saved when you optimize. That is very slow by the way. Should be much faster - especially if you are using multiple threads. -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org

Solr Powered

Give us your feedback

  • Lucene
  • Solr
  • Nutch
  • Tika
  • Mahout
  • Droids
  • PyLucene
  • Lucene.Net
  • Lucy
  • Lucene4c
  • Open Relevance Project
  • How We Can Help:
    • Getting Started |
    • Support Subscriptions |
    • White Papers |
    • Training |
    • Consulting |
    • Contact Us |
  • Developers:
    • Blog |
    • Documentation |
    • Tech Articles |
    • Podcasts and Videos |
    • Community |
  • Downloads:
    • LucidWorks for Solr |
    • LucidWorks for Lucene |
    • LucidGaze for Solr |
    • LucidGaze for Lucene |
  • Products:
  • Services:

Contact | Privacy Policy | Legal Terms of Use | Copyrights and Disclaimers | Admin

Apache Solr, Apache Lucene, ApacheCon and their logos are trademarks of the Apache Software Foundation.

© 2010 Lucid Imagination. All Right reserved.