Found 57,806 results in 0.027 seconds. Displaying page 5 of 5,781, sorted by
Sent 2010-03-11 by Robert Muir <rcmuir@...>
>
> I don't deal with a lot of multi-lingual stuff, but my understanding is
> that this sort of thing gets a lot easier if you can partition your docs
> by language -- and even if you can't, doing some langauge detection on the
> (dirty) OCRed text to get a language guess (and then partition by l...
Sent 2010-03-11 by Chris Hostetter <hossman_lucene@...>
: Interesting. I wonder though if we have 4 million English documents and 250
: in Urdu, if the Urdu words would score badly when compared to ngram
: statistics for the entire corpus.
Well it doesn't have to be a strict ratio cutoff .. you could look at the
average frequency of all character...
Sent 2010-03-11 by JavaGuy84 <bbarani@...>
Erik,
That was a wonderful explanation, I hope many folks in this forum will be
benefited from the explanation you have given here.
Actually I Googled and found the solution when you had earlier mentioned
that I can do a leading wildcard without hacking the code.
I found out the patch that h...
Sent 2010-03-11 by Grant Ingersoll <gsingers@...>
On Mar 11, 2010, at 6:30 PM, Yonik Seeley wrote:
> Interesting looking stuff Marcus!
> Seems sort of related to stat.facet (calc stats on unique facet values)
> http://wiki.apache.org/solr/StatsComponent
And https://issues.apache.org/jira/browse/SOLR-1622
>
>
> On Thu, Mar 11, 2010 at 5:49 P...
Sent 2010-03-11 by Erick Erickson <erickerickson@...>
Leaving aside some historical reasons, the root of
the issue is that any search has to identify all the
terms in a field that satisfy it. Let's take a normal
non-leading wildcard case first.
Finding all the terms like 'some*' will have to
deal with many fewer terms than 's*'. Just dealing with
t...
Sent 2010-03-11 by Mike Malloy <mike@...>
I dont mean to turn this into a sales pitch, but there is a tool for Java app
performance management that you may find helpful. Its called New Relic
(www.newrelic.com) and the tool can be installed in 2 minutes. It can give
you very deep visibility inside Solr and other Java apps. (Full disclosur...
Sent 2010-03-11 by Jay Hill <jayallenhill@...>
The fieldNorm is computed like this: fieldNorm = lengthNorm * documentBoost
* documentFieldBoosts
and the lengthNorm is: lengthNorm = 1/(numTermsInField)**.5
[note that the value is encoded as a single byte, so there is some precision
loss]
So the values are not pre-set for the lengthNorm, bu...
Sent 2010-03-11 by Yonik Seeley <yonik@...>
Interesting looking stuff Marcus!
Seems sort of related to stat.facet (calc stats on unique facet values)
http://wiki.apache.org/solr/StatsComponent
On Thu, Mar 11, 2010 at 5:49 PM, Marcus Herou
wrote:
> I have now implemented Facet with FunctionQueries it is really...
Sent 2010-03-11 by JavaGuy84 <bbarani@...>
Eric,
Thanks a lot for your reply.
I was able to successfully hack the query parser and enabled the leading
wild card search.
As of today I hacked the code for this reason only, I am not sure how to
make the leading wild card search to work without hacking the code and this
type of search is t...
Sent 2010-03-11 by Tom Burton-West <tburtonwest@...>
We've been thinking about running some kind of a classifier against each book
to select books with a high percentage of dirty OCR for some kind of special
processing. Haven't quite figured out a multilingual feature set yet other
than the punctuation/alphanumeric and character block ideas mentio...