Blog 

100% height iframes

Posted by Kelvin on 30 Aug 2008 | Tagged as: programming

http://brondsema.net/blog/index.php/2007/06/06/100_height_iframe was a solution that worked for me after trying several out.

Using Hadoop IPC/RPC for distributed applications

Posted by Kelvin on 02 Jun 2008 | Tagged as: programming, Lucene / Nutch

Hadoop is growing to be a pretty large framework - release 0.17.0 has 483 classes!
Previously, I’d written about Hadoop SequenceFile. SequenceFile is part of the org.apache.hadoop.io package, the other notable useful classes in that package being ArrayFile and MapFile which are persistent array and dictionary data structures respectively.
About Hadoop IPC
Here, I’m going to introduce the […]

TREC 2007 Million Queries Track

Posted by Kelvin on 10 May 2008 | Tagged as: programming

Just read about the IBM Haifa Team’s experiences in tweaking Lucene relevance for TREC.

via Jeff’s Search Engine Caffè

Lucene Tutorial.com

Posted by Kelvin on 25 Apr 2008 | Tagged as: programming

I’ve been maintaining a website dedicated to introducing Lucene to beginners.

Check it out here: http://www.lucenetutorial.com

Feedback is always welcome, including topics you’d like to see written on.

A Collection of JVM Options

Posted by Kelvin on 24 Apr 2008 | Tagged as: programming

Just found this collection of JVM options which might prove handy one day.

Limiting system cache size in Windows Server 2003

Posted by Kelvin on 24 Apr 2008 | Tagged as: programming

On a consulting gig, I was recently asked to investigate a strange problem with a Lucene server on Windows Server 2003.

The Lucene index was periodically refreshed by running a new instance of the app, then killing the old one via “taskkill”. Worked fine, except the available memory displayed by Task Manager somehow steadily decreased with […]

Is Nutch appropriate for aggregation-type vertical search?

Posted by Kelvin on 24 Sep 2007 | Tagged as: programming, Lucene / Nutch

I get pinged all the time by people who tell me they want to build a vertical search engine with Nutch. The part I can’t figure out, though, is why Nutch?
What’s vertical anyway?
So let’s start from basics. Vertical search engines typically fall into 2 categories:

Whole-web search engines which selectively crawl the Internet for webpages […]

Fuzzy string matching

Posted by Kelvin on 03 Jan 2007 | Tagged as: programming

I’ve been recently peripherally involved in a project which attempts to perform a fuzzy match on names in a MySQL database. With Homethinking, we had to do something similar regarding matching for realtor and brokerage names. Its also related to some of the Lucene consulting I’ve been involved with.

Its an interesting problem. There’s an article […]

Exploring Hadoop SequenceFile

Posted by Kelvin on 03 Jan 2007 | Tagged as: Lucene / Nutch

Hadoop’s SequenceFile is at the heart of the Hadoop io package. Both MapFile (disk-backed Map) and ArrayFile (disk-backed Array) are built on top of SequenceFile.

So what exactly is SequenceFile? Its class javadoc tells us: Support for flat files of binary key/value pairs.- not very helpful.

Let’s dig through the code and find out more:

supports key/value […]

MySQL Falcon open-sourced

Posted by Kelvin on 02 Jan 2007 | Tagged as: programming

Just read that MySQL Falcon storage engine has been open-sourced.

http://mike.kruckenberg.com/archives/2006/04/jim_starkey_int.html has a really good, concise brief on Falcon and what it does.

Next Page »

07/04/08 | Kelvin Tan | Lucene Vertical Search Consultant