Posted by Kelvin on 07 Jun 2012 at 02:06 am | Tagged as: Lucene / Solr / Elasticsearch / Nutch
There are a number of instances in Solr where it's desirable to retrieve data from an external datastore for boosting purposes instead of trying to contort Solr with multiple queries, joins etc.
Here's a trivial example:
Jobs are stored as documents in Solr. Users of the application can rank a job from 1-10. We need to boost each job with the user's rank if it exists.
Now, to try to attempt to model this fully in Solr would be fairly inefficient, especially for large # of jobs and/or users, since each time a user ranks a job, the searcher has to reload in order for that data to be available for searching.
A much more efficient method of implementing this, is by storing the rank data in a nosql store like Redis, and retrieving the rank at query-time, using it to boost the documents accordingly.
This can be accomplished using a custom FunctionQuery. I've blogged about how to create custom function queries in Solr before, so this is simply an application of the subject.
Here's the code:
This FunctionQuery accepts 3 arguments:
3. the field to use as an id field
Here's what the salient part of RedisValueSource looks like:
From here, you can use the following Solr query to perform boosting based on the Redis value:
The explain output looks like this:
3.4664698 = (MATCH) sum of: 1.070082 = (MATCH) weight(cat:electronics in 2), product of: 0.80067647 = queryWeight(cat:electronics), product of: 1.3364723 = idf(docFreq=14, maxDocs=21) 0.59909695 = queryNorm 1.3364723 = (MATCH) fieldWeight(cat:electronics in 2), product of: 1.0 = tf(termFreq(cat:electronics)=1) 1.3364723 = idf(docFreq=14, maxDocs=21) 1.0 = fieldNorm(field=cat, doc=2) 2.3963878 = (MATCH) FunctionQuery(redis(id,influence,1001)), product of: 4.0 = 4.0 1.0 = boost 0.59909695 = queryNorm