Thoughts on Lucene, Solr, crawling and vertical search 

Split wav/flac/ape files with cue

Posted by Kelvin on 07 May 2012 | Tagged as: Ubuntu

If you ever need to split a disc image which has been burned as a single wav/flac/ape file with a corresponding cue file, this will help you out.

Split2flac does all the tedium of splitting, renaming (according to a renaming …

Lucene multi-point spatial search

Posted by Kelvin on 14 Apr 2012 | Tagged as: Lucene / Solr / Elastic Search / Nutch, programming

This post describes a method of augmenting the lucene-spatial contrib package to support multi-point searches. It is quite similar to the method described http://www.supermind.org/blog/548/multiple-latitudelongitude-pairs-for-a-single-solrlucene-doc with some minor modifications.

The problem is as follows:

A company (mapped as a Lucene


Non-blocking/NIO HTTP requests in Java with Jetty's HttpClient

Posted by Kelvin on 05 Mar 2012 | Tagged as: crawling, programming

Jetty 6/7 contain a HttpClient class that make it uber-easy to issue non-blocking HTTP requests in Java. Here is a code snippet to get you started.

Initialize the HttpClient object.

    HttpClient client =


Using contextual hints to improve Solr's autocomplete suggester

Posted by Kelvin on 03 Mar 2012 | Tagged as: Lucene / Solr / Elastic Search / Nutch

Context-less multi-term autocomplete is difficult.

Given the term "di", we can look at our index and rank terms starting with "di" by frequency and return the n most frequent terms. Solr's TSTLookup and FSTLookup do this very well. …

Solr autocomplete with document suggestions

Posted by Kelvin on 03 Mar 2012 | Tagged as: Lucene / Solr / Elastic Search / Nutch

Solr 3.5 comes with a nice autocomplete/typeahead component that is based on the SolrSpellCheckComponent.

You provide it a query and a field, and the Suggester returns a list of suggestions based on the query. For example:

Continue reading…

Book review of Apache Solr 3 Enterprise Search Server

Posted by Kelvin on 28 Feb 2012 | Tagged as: Lucene / Solr / Elastic Search / Nutch, programming

Apache Solr 3 Enterprise Search Server published by Packt Publishing is the only Solr book available at the moment.

It's a fairly comprehensive book, and discusses many new Solr 3 features. Considering the breakneck pace of Solr development and …

Apache Solr book review coming soon..

Posted by Kelvin on 27 Feb 2012 | Tagged as: Lucene / Solr / Elastic Search / Nutch

Just received my review copy of the only Apache Solr book on the market..

http://www.packtpub.com/apache-solr-3-enterprise-search-server/book

My book review to follow shortly..

Batch convert svg to png in Ubuntu

Posted by Kelvin on 19 Oct 2011 | Tagged as: Ubuntu

sudo apt-get install librsvg2-bin
for i in *; do rsvg-convert -a $i -o `echo $i | sed -e 's/svg$/png/'`; done
 

to rasterize the …

Mount a .dmg file in Ubuntu

Posted by Kelvin on 11 Oct 2011 | Tagged as: Ubuntu

sudo apt-get install dmg2img
dmg2img /path/to/image.dmg
sudo modprobe hfsplus
sudo mount -t hfsplus -o loop image.img /mnt
 

The .dmg archive is now mounted at /mnt. You …

Download KhanAcademy videos with a PHP crawler

Posted by Kelvin on 08 Oct 2011 | Tagged as: PHP, programming

At the moment (October 2011), there's no simple way to download all videos from a playlist from KhanAcademy.org.

This simple PHP crawler script changes that. :-)

What it does is downloads the videos (from archive.org) …

Next Page »

05/19/2012 | Kelvin Tan | Lucene Solr Crawl Consultant