Found 29,395 results in 0.017 seconds. Displaying page 8 of 2,940, sorted by
Sent 2010-02-24 by Yves Petinot <yves@...>
Hi,
I was wondering if someone else on the list has been experiencing an
issue similar to the one below. I'm running 2 independent crawls on a
single hadoop cluster and am regularly getting "reduce copier failed"
errors. Most of the time Nutch is able to recover from these errors, but
every ...
Sent 2010-02-24 by Bradford Stephens <bradfordstephens@...>
The Seattle Hadoop/Scalability/NoSQL (yeah, we vary the title) meetup
is tonight! We're going to have a guest speaker from MongoDB :)
As always, it's at the University of Washington, Allen Computer
Science building, Room 303 at 6:45pm. You can find a map here:
http://www.washington.edu/home/maps...
Sent 2010-02-24 by Magnús Skúlason <maggias@...>
Hi,
This is actually very easy, just create a indexing plugging, analyse the url
format and return null from the indexing pluggin if you don't want to index
it.
best regards,
Magnus
On Wed, Feb 24, 2010 at 6:09 PM, Steven Wichers wrote:
> On some of the sites I want to ind...
Sent 2010-02-24 by Steven Wichers <steven@...>
On some of the sites I want to index with nutch, there are only
specific types of pages I would like to be searchable. I need a way to
be able to crawl these sites, but only index pages that match a
certain regular expression.
ex:
www.example.com/browse/ finds links in the form of
www.example.c...
Sent 2010-02-24 by Pedro Bezunartea López <pedro@...>
Hi Ashley,
Hi,
> I'm looking to reproduce program analysis results based on Nutch v0.4. I
> realize this is a very old release, but is it possible to obtain the source
> from somewhere? I see some of the classes I'm looking for in v0.7, but I
> need the older version to confirm it.
> Thanks,
> A...
Sent 2010-02-24 by Pedro Bezunartea López <pedro@...>
Hi Sami,
The schema.xml file there is usable only when using Solr as the search
> server. Are you using Solr?
>
Not yet! thanks for clarifying it. Cheers,
Pedro.
> --
> Sami Siren
>
>
> Pedro Bezunartea López wrote:
> > Hi,
>
>>
>> I've developed a web application in lucene that searches...
Sent 2010-02-24 by Ashley Sterritt <ashley.sterritt@...>
Hi,
I'm looking to reproduce program analysis results based on Nutch v0.4. I
realize this is a very old release, but is it possible to obtain the
source from somewhere? I see some of the classes I'm looking for in
v0.7, but I need the older version to confirm it.
Thanks,
Ashley
Sent 2010-02-24 by Sami Siren <ssiren@...>
The schema.xml file there is usable only when using Solr as the search
server. Are you using Solr?
--
Sami Siren
Pedro Bezunartea López wrote:
> Hi,
>
> I've developed a web application in lucene that searches web pages using a
> nutch generated index. I'd like to highlight the query sear...
Sent 2010-02-24 by xiao yang <yangxiao9901@...>
Hi, Dogacan,
I'm quite confused with the avro design nutchbase is using. The hbase
schema is defined both in /org/apache/nutch/storage/NutchFields.java
(http://github.com/dogacan/nutchbase/blob/master/src/java/org/apache/nutch/storage/NutchFields.java)
and /webtable.json
(http://github.com/dog...
Sent 2010-02-24 by xiao yang <yangxiao9901@...>
There's no good way to do this.
I'm waiting for Hbase integration with Nutch, which will make this
operation much easier. The data store structure nutch is using now is
not suitable for adding a single url to the index as I know.
Thanks!
Xiao
On Tue, Feb 16, 2010 at 7:47 PM, Ahmad Al-Amri