Lucid Imagination

Secondary links

  • Contact Us
  • Sign Up or Login
  • Downloads
  • Solutions
    • Partners |
    • Blog |
    • Software |
    • Services |
    • Training |
    • Case Studies |
    • Webinars |
  • Developers
    • Blog |
    • Tech Articles |
    • Community |
    • Docs |
    • Downloads |
    • Whitepapers |
    • Podcasts |
  • About
    • Market Overview |
    • Management |
    • Company News |
    • In the Media |
    • Contact |

beta

Start new search

Back to search results

  1. FromDate
  2. Eddie Drapkin2010-02-25 16:18

[nutch-user] Text.encode failing during de-duplication

Subject:
Text.encode failing during de-duplication
From:
Eddie Drapkin <oorza2k5@...>
Date:
2010-02-25 16:18
Hello,

I'm trying to upgrade from Nutch 0.9 to Nutch 1.0 and I've solved all of the
issues that I seem be having, except for one.

When I run a web crawl, everything fetches fine until it gets to dedup, in
which case, I get this stack trace:


2010-02-25 14:31:46,592 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.NullPointerException
        at org.apache.hadoop.io.Text.
encode(Text.java:388)
        at org.apache.hadoop.io.Text.set(Text.java:178)
        at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:191)
        at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:157)
        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
2010-02-25 14:31:47,328 FATAL indexer.DeleteDuplicates - DeleteDuplicates:
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1250)
        at
org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:448)
        at
org.apache.nutch.indexer.DeleteDuplicates.run(DeleteDuplicates.java:515)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java:499)


I'm running (I can't upgrade to 1.6) on a 1.5 JVM.  I've tried with a
version of hadoop that's old enough to run on 1.5 (0.18.3) and with a
version of hadoop (0.20.2) that a co-worker modified to build and run on
1.5, but is it possible that I can't upgrade until I can upgrade my JVM?
Maybe it's something else?  If there's any more information you need, let me
know, thanks!

Thanks,
Eddie

PS. Sorry if this gets sent twice, I tried to send before I subscribed to
this list.

Solr Powered

Give us your feedback

  • Lucene
  • Solr
  • Nutch
  • Tika
  • Mahout
  • Droids
  • PyLucene
  • Lucene.Net
  • Lucy
  • Lucene4c
  • Open Relevance Project
  • How We Can Help:
    • Getting Started |
    • Support Subscriptions |
    • White Papers |
    • Training |
    • Consulting |
    • Contact Us |
  • Developers:
    • Blog |
    • Documentation |
    • Tech Articles |
    • Podcasts and Videos |
    • Community |
  • Downloads:
    • LucidWorks for Solr |
    • LucidWorks for Lucene |
    • LucidGaze for Solr |
    • LucidGaze for Lucene |
  • Products:
  • Services:

Contact | Privacy Policy | Legal Terms of Use | Copyrights and Disclaimers | Admin

Apache Solr, Apache Lucene, ApacheCon and their logos are trademarks of the Apache Software Foundation.

© 2010 Lucid Imagination. All Right reserved.