Posted by Kelvin on 09 Sep 2009 at 10:18 pm | Tagged as: programming
There are a number of solutions for geosearching/spatial search in Lucene and Solr.
- LocalLucene and LocalSolr are excellent options.
- LuceneTutorial.com describes a partial solution for doing it the old-school way using FieldCache and a custom lucene query.
- The new TrieRange feature introduced by Uwe Schindler also offers a new way of performing range searches on numbers.
Here's the twist though: all of the solutions mentioned above work only when there is a single lat/long pair _per_ document.
When would we need to have multiple lat/longs per document?
An example would be an event search engine where a single event can have multiple locations and you'd like all locations to turn up when searching within a zipcode, but for them to appear under their respective events.
Here's an algorithm I came up with that works well enough for smallish indexes:
Given that there are "container" documents with multiple locations,
1. index container and location as separate document types
2. assign uid fields to container and location docs (standard in solr)
3. location docs have a lat/long field (indexed as a string), and a "reference" to the container id value
4. load all location docs as Point2D.Float into memory
5. when a geo search request comes, convert to lat/long, then produce a bounding rectangle encapsulating the desired radius
6. iterate through the set of Point2D.Floats, saving the points within the bounds
7. obtain the list of container ids these points contain
8. construct a query filter out of these container ids
9. finally, perform the search on container docs with the query filter
For bonus points in Solr, you can easily add the requisite location doc ids into the solr response so you can reference exactly which locations were matched to which container.
Its long and somehow feels abit hackish, but it works, and blazing fast coz its all in memory.