Found 29,154 results in 0.09 seconds. Displaying page 1 of 2,916, sorted by
Sent 2010-02-08 by Ted Yu <yuzhihong@...>
Hi,
I was reading http://wiki.apache.org/nutch/LanguageIdentifier and tried to
access EncodingDetectorPluginlink
but the page isn't there:
http://wiki.apache.org/nutch/EncodingDetectorPlugin
Can someone provide more information ?
Thanks
Sent 2010-02-08 by Julien Nioche <lists.digitalpebble@...>
Hi,
You'd need to filter the URLs from the segments as well before you
index. Removing the entries from the linkDB will just prevent them
from getting anchor fields - they'll still be added to the index.
Look at the class IndexerMapReduce for more details.
An option would be to add support for ...
Sent 2010-02-08 by Stefano Cherchi <stefanocherchi@...>
Is there nobody out there who can provide some kind of hint?
I'm really stuck with this problem and I cannot figure out what else I can do.
Thanks
S
----- Messaggio originale -----
> Da: Stefano Cherchi
> A: nutch-user@lucene.apache..org
> Inviato: Gio 4 febbra...
Sent 2010-02-08 by Esteve Schouten <eschouten@...>
Hi,
I need help with nutch. I have lucene indexer in my proyect and i need
to add documents with the content of the url's crawls with nutch in my
lucene index. how can i do it?
Steve
--
-----------------------------------------------------------------------
Esteve Schouten Ginard
Àrea d'...
Sent 2010-02-08 by Ryan Smith <ryan.justin.smith@...>
FWIW, there is a plugin for heritrix to write to hbase as a back end store.
Maybe it will help for making a nutch plugin?
http://code.google.com/p/hbase-writer
-Ryan
On Mon, Feb 8, 2010 at 4:32 AM, Hua Su wrote:
> Hi all,
>
> Any recent progress on HBase integration? There...
Sent 2010-02-08 by Hua Su <huas.su@...>
Hi all,
Any recent progress on HBase integration? There is a filed issue
NUTCH-650
.
I really love the idea of using HBase as nutch storage backend. It not only
simplifies nutch storage, but also makes much url/page processing work more
efficient ...
Sent 2010-02-08 by Apache Hudson Server <hudson@...>
See
Sent 2010-02-07 by Sahil Shah <sahilshah2650@...>
Hey Everyone,
I want to write a plugin that generates snippets/ summary based on the query
by using index based approach. I have read the wiki but I am still not clear
as to how to understand the source code.The API collection is also huge....
There are so many interfaces and classes. Where to s...
Sent 2010-02-04 by Stefano Cherchi <stefanocherchi@...>
Hi everybody. I've been struggling for three days now with a quite trivial problem, without solution.
I need to index a few web sites with the following structure:
Page type 1: List of posts (http://www.website.com/list.html?page=XXx) where XXx is a progressive number from 00 to 999. Each page...
Sent 2010-02-04 by Alexander Aristov <alexander.aristov@...>
Your problem has nothing to do with PDFs. Do you have messages/exceptions
where you are merging indexes?
Best Regards
Alexander Aristov
On 4 February 2010 12:58, Withanage, Dulip <
withanage@asia-europe.uni-heidelberg.de> wrote:
> Thanks for the initial ideas.
> >>do they really corrupt or th...