Thoughts on Lucene, Solr, Nutch and vertical search 

Lucene / Solr / Nutch

Archived Posts from this Category

Average length of a URL

Posted by Kelvin on 06 Nov 2009 | Tagged as: Lucene / Solr / Nutch, crawling, programming

I’ve always been curious what the average length of a URL is, mostly when approximating memory requirements of storing URLs in RAM.

Well, I did a dump of the DMOZ URLs, sorted and uniq-ed the list of URLs.

Ended up with 4074300 unique URLs weighing in at 139406406 bytes, which approximates to 34 characters per URL.

Idea: 2-stage recovery of corrupt Solr/Lucene indexes

Posted by Kelvin on 09 Sep 2009 | Tagged as: Lucene / Solr / Nutch, programming

Using Hadoop IPC/RPC for distributed applications

Posted by Kelvin on 02 Jun 2008 | Tagged as: Lucene / Solr / Nutch, programming

Is Nutch appropriate for aggregation-type vertical search?

Posted by Kelvin on 24 Sep 2007 | Tagged as: Lucene / Solr / Nutch, programming

Exploring Hadoop SequenceFile

Posted by Kelvin on 03 Jan 2007 | Tagged as: Lucene / Solr / Nutch

PHP + Lucene integration

Posted by Kelvin on 01 Jan 2007 | Tagged as: Lucene / Solr / Nutch, programming

A simple API-friendly crawler

Posted by Kelvin on 01 Dec 2006 | Tagged as: Lucene / Solr / Nutch, programming

Search and crawling internship

Posted by Kelvin on 31 Oct 2006 | Tagged as: Lucene / Solr / Nutch, programming

Nutch 0.8, Map & Reduce, here I come!

Posted by Kelvin on 09 Aug 2006 | Tagged as: Lucene / Solr / Nutch, programming

Lucene scoring for dummies

Posted by Kelvin on 08 Mar 2006 | Tagged as: Lucene / Solr / Nutch

Next Page »

07/04/09 | Kelvin Tan | Lucene Solr Nutch Consultant