Enterprise Search support for Apache Lucene and Solr by Lucid Imagination

Secondary links

  • Contact Us
  • Log in
  • Downloads
  • Solutions
    • Software |
    • Services |
    • Training |
    • White Papers & Case Studies |
    • Webinars & Events |
  • Developers
    • Blog |
    • Tech Articles |
    • Community |
    • Documentation |
    • Downloads |
    • Webcasts & Podcasts |
  • About
    • Market Overview |
    • Management |
    • Company News |
    • In the Media |
    • Contact |

beta

Start new search

Options

  • results per page

Clear all facets

  • Project clear projects

  • Source clear sources

  • Author clear authors

Search Results for

Results loading...

Found 29,395 results in 0.126 seconds. Displaying page 6 of 2,940, sorted by

  1. [nutch-user] Update on ignoring menu divs

    Sent 2010-02-28 by "Ian M. Evans" <ianevans@...>

    Using Nutch as a crawler for solr. I've been digging around the nutch-user archives a bit and have seen some people discussing how to ignore menu items or other unnecessary div areas like common footers, etc. I still haven't come across a full answer yet. Is there a to define a div by id tha...

  2. [nutch-user] Summary

    Sent 2010-02-27 by QueroVc <yuri.gopfert@...>

    hello, I have a problem. You can configure how the nutch (crawl or searc ????) creates the summaries? -- View this message in context: http://old.nabble.com/Summary-tp27731301p27731301.html Sent from the Nutch - User mailing list archive at Nabble.com.

  3. [nutch-dev] Hudson build is back to normal : Nutch-trunk #1080

    Sent 2010-02-27 by Apache Hudson Server <hudson@...>

    See

  4. [nutch-user] Re: can't load class error

    Sent 2010-02-27 by Ted Yu <yuzhihong@...>

    Please disregard my previous email - the command was launched from incorrect directory. I don't see improvement for my latest run: [root@snv-qa-lin-domain-crawler1 software]# hfs -text /user/tomcatadmin/lpm/15-100226111258118-tomcatadmin/parse/0/part-m-00000 10/02/27 07:36:28 INFO util.NativeCod...

  5. [nutch-user] Re: can't load class error

    Sent 2010-02-27 by Ted Yu <yuzhihong@...>

    Now I see this in the log: [root@snv-qa-lin-domain-crawler1 webmap_workflow]# hfs -text /user/tomcatadmin/lpm/15-100226111258118-tomcatadmin/generate/0/part-r-00000 2010-02-27 07:25:08,062 WARN [main] conf.Configuration DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml...

  6. [nutch-user] Re: can't load class error

    Sent 2010-02-27 by Julien Nioche <lists.digitalpebble@...>

    Look at the Hadoop option -libjars and use it to point to the nutch-1.0.jar, that should work J. On 27 February 2010 13:08, Ted Yu wrote: > Hi, > We use nutch to perform domain crawl but I see strange 'can't load class' > error: > > [root@snv-qa-lin-domain-crawler1 softwar...

  7. [nutch-user] can't load class error

    Sent 2010-02-27 by Ted Yu <yuzhihong@...>

    Hi, We use nutch to perform domain crawl but I see strange 'can't load class' error: [root@snv-qa-lin-domain-crawler1 software]# hfs -text /user/tomcatadmin/lpm/12-100226111258118-tomcatadmin/parse/0/part-m-00000 10/02/27 04:45:10 INFO util.NativeCodeLoader: Loaded the native-hadoop library 10/0...

  8. [nutch-user] recover from hadoop.tmp.dir?

    Sent 2010-02-27 by Patricio Galeas <pgaleas@...>

    Hello, Two weeks ago, we started a web crawl (depth=6, threads=10) and today is the process aborted because our hard disk is full. We defined a 100GB partition for the hadoop.tmp.dir. Yesterday (night), I checked the size of hadoop.tmp.dir by the last crawl and it had 23GB. Some hours later ...

  9. [nutch-user] Problem with specialchars when dumping segments.

    Sent 2010-02-26 by Felix Zimmermann <felizimm@...>

    Hi, when dumping segments with "bin/nutch readseg -dump ...", special characters of non-utf8 encoced pages are lost. For example the "ö" (ö) is replaced by a "?"... I am really in need of the dumped files with correct representation of special chars. How can I deal with this problem? Tha...

  10. [nutch-user] Text.encode failing during de-duplication

    Sent 2010-02-25 by Eddie Drapkin <oorza2k5@...>

    Hello, I'm trying to upgrade from Nutch 0.9 to Nutch 1.0 and I've solved all of the issues that I seem be having, except for one. When I run a web crawl, everything fetches fine until it gets to dedup, in which case, I get this stack trace: 2010-02-25 14:31:46,592 WARN mapred.LocalJobRunner ...

  1. <<
  2. 1
  3. 2
  4. 3
  5. 4
  6. 5
  7. 6
  8. 7
  9. 8
  10. 9
  11. 10
  12. >>

Solr Powered

Give us your feedback

  • Lucene
  • Solr
  • Nutch
  • Tika
  • Mahout
  • Droids
  • PyLucene
  • Lucene.Net
  • Lucy
  • Lucene4c
  • Open Relevance Project
  • How We Can Help:
    • Getting Started |
    • Support Subscriptions |
    • White Papers |
    • Training |
    • Consulting |
    • Contact Us |
  • Developers:
    • Blog |
    • Documentation |
    • Tech Articles |
    • Podcasts and Videos |
    • Community |
  • Downloads:
    • LucidWorks for Solr |
    • LucidWorks for Lucene |
    • LucidGaze for Solr |
    • LucidGaze for Lucene |
  • Products:
  • Services:

Contact | Privacy Policy | Legal Terms of Use | Copyrights and Disclaimers | Admin

Apache Solr, Apache Lucene, ApacheCon and their logos are trademarks of the Apache Software Foundation.

© 2010 Lucid Imagination. All Right reserved.