Connecting Redis to Solr for boosting documents
Posted by Kelvin on 07 Jun 2012 at 02:06 am | Tagged as: Lucene / Solr / Elastic Search / Nutch
There are a number of instances in Solr where it's desirable to retrieve data from an external datastore for boosting purposes instead of trying to contort Solr with multiple queries, joins etc.
Here's a trivial example:
Jobs are stored as documents in Solr. Users of the application can rank a job from 1-10. We need to boost each job with the user's rank if it exists.
Now, to try to attempt to model this fully in Solr would be fairly inefficient, especially for large # of jobs and/or users, since each time a user ranks a job, the searcher has to reload in order for that data to be available for searching.
A much more efficient method of implementing this, is by storing the rank data in a nosql store like Redis, and retrieving the rank at query-time, using it to boost the documents accordingly.
This can be accomplished using a custom FunctionQuery. I've blogged about how to create custom function queries in Solr before, so this is simply an application of the subject.
Here's the code:
@Override public ValueSource parse(FunctionQParser fp) throws ParseException {
String idField = fp.parseArg();
String redisKey = fp.parseArg();
String redisValue = fp.parseArg();
return new RedisValueSource(idField, redisKey, redisValue);
}
}
This FunctionQuery accepts 3 arguments:
1. redisKey
2. redisValue
3. the field to use as an id field
Here's what the salient part of RedisValueSource looks like:
final String[] lookup = FieldCache.DEFAULT.getStrings(reader, idField);
final Jedis jedis = new Jedis("localhost");
String v = jedis.hget(redisKey, redisValue);
final JSONObject obj;
if (v != null) {
obj = (JSONObject) JSONValue.parse(v);
} else {
obj = new JSONObject();
}
jedis.disconnect();
return new DocValues() {
@Override public float floatVal(int doc) {
final String id = lookup[doc];
Object v = obj.get(id);
if(v != null) {
try {
return Float.parseFloat(v.toString());
} catch (NumberFormatException e) {
return 0;
}
} return 0;
}
@Override public int intVal(int doc) {
final String id = lookup[doc];
Object v = obj.get(id);
if(v != null) {
try {
return Integer.parseInt(v.toString());
} catch (NumberFormatException e) {
return 0;
}
} return 0;
}
@Override public String strVal(int doc) {
final String id = lookup[doc];
Object v = obj.get(id);
return v != null ? v.toString() : null;
}
@Override public String toString(int doc) {
return strVal(doc);
}
};
}
From here, you can use the following Solr query to perform boosting based on the Redis value:
http://localhost:8983/solr/select?defType=edismax&q=cat:electronics&bf=redis(id,influence,1001)&debugQuery=on
The explain output looks like this:
3.4664698 = (MATCH) sum of:
1.070082 = (MATCH) weight(cat:electronics in 2), product of:
0.80067647 = queryWeight(cat:electronics), product of:
1.3364723 = idf(docFreq=14, maxDocs=21)
0.59909695 = queryNorm
1.3364723 = (MATCH) fieldWeight(cat:electronics in 2), product of:
1.0 = tf(termFreq(cat:electronics)=1)
1.3364723 = idf(docFreq=14, maxDocs=21)
1.0 = fieldNorm(field=cat, doc=2)
2.3963878 = (MATCH) FunctionQuery(redis(id,influence,1001)), product of:
4.0 = 4.0
1.0 = boost
0.59909695 = queryNorm
