Lucid Imagination

Secondary links

  • Contact Us
  • Sign Up or Login
  • Downloads
  • Solutions
    • Partners |
    • Blog |
    • Software |
    • Services |
    • Training |
    • Case Studies |
    • Webcasts |
  • Developers
    • Blog |
    • Tech Articles |
    • Community |
    • Docs |
    • Downloads |
    • Whitepapers |
    • Podcasts |
  • About
    • Market Overview |
    • Management |
    • Company News |
    • In the Media |
    • Contact |

beta

Start new search

Back to search results

  1. FromDate
  2. Lance Norskog1969-12-31 19:00
  3. Michael McCandless1969-12-31 19:00
  4. "Burton-West, Tom"1969-12-31 19:00
  5. Tom Burton-West1969-12-31 19:00
  6. Michael McCandless1969-12-31 19:00
  7. Michael McCandless1969-12-31 19:00
  8. Michael McCandless1969-12-31 19:00
  9. Tom Burton-West1969-12-31 19:00
  10. Michael McCandless1969-12-31 19:00

[solr-user] Re: TermInfosReader.get ArrayIndexOutOfBoundsException

Subject:
Re: TermInfosReader.get ArrayIndexOutOfBoundsException
From:
Michael McCandless <lucene@...>
Date:
1969-12-31 19:00
Yes, the term count reported by CheckIndex is the total number of unique terms.

It indeed looks like you are exceeding the unique term count limit --
16777214 * 128 (= the default term index interval) is 2147483392 which
is mighty close to max/min 32 bit int value.  This makes sense,
because CheckIndex steps through the terms in order, one by one.  So
the first term just over the limit triggered the exception.

Hmm -- can you try a patched Lucene in your area?  I have one small
change to try that may increase the limit to termIndexInterval
(default 128) * 2.1 billion.

Mike

On Tue, Feb 9, 2010 at 12:23 PM, Tom Burton-West <tburtonwest@gmail.com> wrote:
Thanks Lance and Michael, We are running Solr 1.3.0.2009.09.03.11.14.39  (Complete version info from Solr admin panel appended below) I tried running CheckIndex (with the -ea:  switch ) on one of the shards. CheckIndex also produced an ArrayIndexOutOfBoundsException on the larger segment containing 500K+ documents. (Complete CheckIndex output appended below) Is it likely that all 10 shards are corrupted?  Is it possible that we have simply exceeded some lucene limit? I'm wondering if we could have exceeded the lucene limit of unique terms of 2.1 billion as mentioned towards the end of the Lucene Index File Formats document.  If the small 731 document index has nine million unique terms as reported by check index, then even though many terms are repeated, it is concievable that the 500,000 document index could have more than 2.1 billion terms. Do you know if  the number of terms reported by CheckIndex is the number of unique terms? On the other hand, we previously optimized a 1 million document index down to 1 segment and had no problems.  That was with an earlier version of Solr and did not include CommonGrams which could conceivably increase the number of terms in the index by 2 or 3 times. Tom -----------------------------------------------------------------------------------        Solr Specification Version: 1.3.0.2009.09.03.11.14.39        Solr Implementation Version: 1.4-dev 793569 - root - 2009-09-03 11:14:39        Lucene Specification Version: 2.9-dev        Lucene Implementation Version: 2.9-dev 779312 - 2009-05-27 17:19:55 [tburtonw@slurm-4 ~]$  java -Xmx4096m  -Xms4096m -cp /l/local/apache-tomcat-serve/webapps/solr-sdr-search/serve-10/WEB-INF/lib/lucene-core-2.9-dev.jar:/l/local/apache-tomcat-serve/webapps/solr-sdr-search/serve-10/WEB-INF/lib -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /l/solrs/1/.snapshot/serve-2010-02-07/data/index Opening index @ /l/solrs/1/.snapshot/serve-2010-02-07/data/index Segments file=segments_zo numSegments=2 version=FORMAT_DIAGNOSTICS [Lucene 2.9]  1 of 2: name=_29dn docCount=554799    compound=false    hasProx=true    numFiles=9    size (MB)=267,131.261    diagnostics = {optimize=true, mergeFactor=2, os.version=2.6.18-164.6.1.el5, os=Linux, mergeDocStores=true, lucene.version=2.9-dev 779312 - 2009-05-27 17:19:55, source=merge, os.arch=amd64, java.version=1.6.0_16, java.vendor=Sun Microsystems Inc.}    has deletions [delFileName=_29dn_7.del]    test: open reader.........OK [184 deleted docs]    test: fields, norms.......OK [6 fields]    test: terms, freq, prox...FAILED    WARNING: fixIndex() would remove reference to this segment; full exception: java.lang.ArrayIndexOutOfBoundsException: -16777214        at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:246)        at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:218)        at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:57)        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:474)        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:715)  2 of 2: name=_29im docCount=731    compound=false    hasProx=true    numFiles=8    size (MB)=421.261    diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.18-164.6.1.el5, os=Linux, mergeDocStores=true, lucene.version=2.9-dev 779312 - 2009-05-27 17:19:55, source=merge, os.arch=amd64, java.version=1.6.0_16, java.vendor=Sun Microsystems Inc.}    no deletions    test: open reader.........OK    test: fields, norms.......OK [6 fields]    test: terms, freq, prox...OK [9504552 terms; 34864047 terms/docs pairs; 144869629 tokens]    test: stored fields.......OK [3550 total field count; avg 4.856 fields per doc]    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc] WARNING: 1 broken segments (containing 554615 documents) detected WARNING: would write new segments file, and 554615 documents would be lost, if -fix were specified [tburtonw@slurm-4 ~]$ The index is corrupted. In some places ArrayIndex and NPE are not wrapped as CorruptIndexException. Try running your code with the Lucene assertions on. Add this to the JVM arguments:  -ea:org.apache.lucene... -- View this message in context: http://old.nabble.com/TermInfosReader.get-ArrayIndexOutOfBoundsException-tp27506243p27518800.html Sent from the Solr - User mailing list archive at Nabble.com.

Solr Powered

Give us your feedback

  • Lucene
  • Solr
  • Nutch
  • Tika
  • Mahout
  • Droids
  • PyLucene
  • Lucene.Net
  • Lucy
  • Lucene4c
  • Open Relevance Project
  • How We Can Help:
    • Getting Started |
    • Support Subscriptions |
    • White Papers |
    • Training |
    • Consulting |
    • Contact Us |
  • Developers:
    • Blog |
    • Documentation |
    • Tech Articles |
    • Podcasts and Videos |
    • Community |
  • Downloads:
    • LucidWorks for Solr |
    • LucidWorks for Lucene |
    • LucidGaze for Solr |
    • LucidGaze for Lucene |
  • Products:
  • Services:

Contact | Privacy Policy | Legal Terms of Use | Copyrights and Disclaimers | Admin

Apache Solr, Apache Lucene, ApacheCon and their logos are trademarks of the Apache Software Foundation.

© 2010 Lucid Imagination. All Right reserved.