Lucid Imagination

Secondary links

  • Contact Us
  • Sign Up or Login
  • Downloads
  • Solutions
    • Partners |
    • Blog |
    • Software |
    • Services |
    • Training |
    • Case Studies |
    • Webcasts |
  • Developers
    • Blog |
    • Tech Articles |
    • Community |
    • Docs |
    • Downloads |
    • Whitepapers |
    • Podcasts |
  • About
    • Market Overview |
    • Management |
    • Company News |
    • In the Media |
    • Contact |

beta

Start new search

Back to search results

  1. FromDate
  2. Chris Hostetter2009-12-07 15:51
  3. Grant Ingersoll2009-12-07 18:29
  4. Noble Paul നോബിള്‍ नोब्ळ्2009-12-08 00:22
  5. Grant Ingersoll2009-12-08 06:17
  6. Noble Paul നോബിള്‍ नोब्ळ्2009-12-08 10:03
  7. Zacarias2010-01-05 10:48
  8. Zacarias2010-01-05 10:49
  9. Zacarias2010-01-05 13:53
  10. Grant Ingersoll2010-01-05 13:58
  11. Chris Hostetter2010-01-05 15:18
  12. Jan Høydahl / Cominvent2010-01-22 17:37
  13. Jan Høydahl / Cominvent2010-02-08 15:48

[solr-dev] Solr Cell revamped as an UpdateProcessor?

Subject:
Re: Solr Cell revamped as an UpdateProcessor?
From:
Grant Ingersoll <gsingers@...>
Date:
2010-01-05 13:58
On Jan 5, 2010, at 1:53 PM, Zacarias wrote:

I'd attached a file to the previous mail. Is there any filter for pdf files or any other reason.
The mailer strips attachments, although you might be able to get a zip through. Perhaps send a pointer to somewhere else or just describe it here.
On Tue, Jan 5, 2010 at 12:49 PM, Zacarias <zacarias@linebee.com> wrote:
Here is my propousal Regards On Tue, Jan 5, 2010 at 12:48 PM, Zacarias <zacarias@linebee.com> wrote:
Hi, I'm developing a directory monitor to add in a Sor implementation. Tell me if it could be interesting for you we will be glad to share it with the comunity. Also I would like your opinion about the propousal if it looks ok for you and if you like to make any change or question it will be very well welcome. Regards Zacarias http://www.linebee.com 2009/12/8 Noble Paul നോബിള്‍ नोब्ळ् <noble.paul@corp.aol.com> I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor
is a good idea On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll <gsingers@apache.org> wrote:
On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
Integrating Extraction w/ DIH is a better option. DIH makes it easier to do the mapping of fields etc.
Which comment is this directed at? I'm lacking context here.
On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll <gsingers@apache.org>
wrote:
On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote:
ASs someone with very little knowledge of Solr Cell and/or Tika, I
find myself wondering if ExtractingRequestHandler would make more sense as an extractingUpdateProcessor -- where it could be configured to take take either binary fields (or string fields containing URLs) out of the Documents, parse them with tika, and add the various XPath matching hunks of text back into the document as new fields.
Then ExtractingRequestHandler just becomes a handler that slurps up
it's ContentStreams and adds them as binary data fields and adds the other literal params as fields.
Wouldn't that make things like SOLR-1358, and using Tika with
URLs/filepaths in XML and CSV based updates fairly trivial?
It probably could, but am not sure how it works in a processor chain.
However, I'm not sure I understand how they work all that much either. I also plan on adding, BTW, a SolrJ client for Tika that does the extraction on the client. In many cases, the ExtrReqHandler is really only designed for lighter weight extraction cases, as one would simply not want to send that much rich content over the wire.
-- ----------------------------------------------------- Noble Paul | Systems Architect| AOL | http://aol.com
-------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search
-- ----------------------------------------------------- Noble Paul | Systems Architect| AOL | http://aol.com
-------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search

Solr Powered

Give us your feedback

  • Lucene
  • Solr
  • Nutch
  • Tika
  • Mahout
  • Droids
  • PyLucene
  • Lucene.Net
  • Lucy
  • Lucene4c
  • Open Relevance Project
  • How We Can Help:
    • Getting Started |
    • Support Subscriptions |
    • White Papers |
    • Training |
    • Consulting |
    • Contact Us |
  • Developers:
    • Blog |
    • Documentation |
    • Tech Articles |
    • Podcasts and Videos |
    • Community |
  • Downloads:
    • LucidWorks for Solr |
    • LucidWorks for Lucene |
    • LucidGaze for Solr |
    • LucidGaze for Lucene |
  • Products:
  • Services:

Contact | Privacy Policy | Legal Terms of Use | Copyrights and Disclaimers | Admin

Apache Solr, Apache Lucene, ApacheCon and their logos are trademarks of the Apache Software Foundation.

© 2010 Lucid Imagination. All Right reserved.