Found 29,421 results in 0.1 seconds. Displaying page 9 of 2,943, sorted by
Sent 2010-02-25 by Eddie Drapkin <oorza2k5@...>
Hello,
I'm trying to upgrade from Nutch 0.9 to Nutch 1.0 and I've solved all of the
issues that I seem be having, except for one.
When I run a web crawl, everything fetches fine until it gets to dedup, in
which case, I get this stack trace:
2010-02-25 14:31:46,592 WARN mapred.LocalJobRunner ...
Sent 2010-02-25 by reinhard schwab <reinhard.schwab@...>
crawl-urlfilter.txt and regex-urlfilter.txt take regular expressions as
input.
if you want filter out urls, which contain "menu", then just add
-.*menu
this rule will filter out any urls which contain "menu".
note that the first matching rule from top wins.
if there is a rule before this rule m...
Sent 2010-02-25 by QueroVc <yuri.gopfert@...>
But the crawl-urlfilter.txt not accept only characters instead of strings?
If accepted, as I write?
# Skip URLs containing certain characters as probable queries, etc..
-[?*!@=]
Could be?
# Skip URLs containing certain characters as probable queries, etc..
- [ "menu"]
Thanks
QueroVc wrote:...
Sent 2010-02-25 by "Ian M. Evans" <ianevans@...>
Hi everyone,
Last night I was able to get solr up and running. Ran and was able to
access:
http://www.digitalhit.com:8983/solr/admin
This morning, I started on the nutch crawling instructions over at:
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
After adding the following to ...
Sent 2010-02-25 by Ashley Sterritt <ashley.sterritt@...>
Great, thanks!
2010/2/25 Pedro Bezunartea López :
> I was curious about this, and after a little browsing through sourceforge, I
> found the CVS link:
>
> http://nutch.cvs.sourceforge.net/viewvc/nutch/nutch/?pathrev=nutch_0_4
>
> HTH,
>
> Pedro.
>
>
> 2010/2/25 Andrzej Bia...
Sent 2010-02-25 by Pedro Bezunartea López <pedro@...>
I was curious about this, and after a little browsing through sourceforge, I
found the CVS link:
http://nutch.cvs.sourceforge.net/viewvc/nutch/nutch/?pathrev=nutch_0_4
HTH,
Pedro.
2010/2/25 Andrzej Bialecki
> On 2010-02-24 17:34, Pedro Bezunartea López wrote:
>
>> Hi Ashl...
Sent 2010-02-25 by Andrzej Bialecki <ab@...>
On 2010-02-24 17:34, Pedro Bezunartea López wrote:
> Hi Ashley,
>
> Hi,
>> I'm looking to reproduce program analysis results based on Nutch v0.4. I
>> realize this is a very old release, but is it possible to obtain the source
>> from somewhere? I see some of the classes I'm looking for in v0.7,...
Sent 2010-02-25 by "Andreas P. Koenzen" <akoenzen@...>
Replace it with this: -[@!*]
That's it...
Best regards,
---
Andreas P. Koenzen
On 25/02/2010, at 03:06 a.m., Ian M. Evans wrote:
> I suck at regex and in keeping with the Olympic spirit, I probably
> suck
> at giant slalom too.
>
> In the regex-urlfilter.txt there's the suggested probable ...
Sent 2010-02-25 by MilleBii <millebii@...>
You can add a specific rule before that exclusion rule
Something like :
+.*/?page=.*
2010/2/25, Ian M. Evans :
> I suck at regex and in keeping with the Olympic spirit, I probably suck
> at giant slalom too.
>
> In the regex-urlfilter.txt there's the suggested probable q...
Sent 2010-02-25 by Bradford Stephens <bradfordstephens@...>
Thanks for coming, everyone! We had around 25 people. A *huge*
success, for Seattle. And a big thanks to 10gen for sending Richard.
Can't wait to see you all next month.
On Wed, Feb 24, 2010 at 2:15 PM, Bradford Stephens
wrote:
> The Seattle Hadoop/Scalability/NoSQL...