Archived Posts from this Category
Posted by Kelvin on 06 Nov 2009 | Tagged as: Lucene / Solr / Nutch, crawling, programming
I’ve always been curious what the average length of a URL is, mostly when approximating memory requirements of storing URLs in RAM.
Well, I did a dump of the DMOZ URLs, sorted and uniq-ed the list of URLs.
Ended up with 4074300 unique URLs weighing in at 139406406 bytes, which approximates to 34 characters per URL.
No Comments »
Posted by Kelvin on 09 Sep 2009 | Tagged as: Lucene / Solr / Nutch, programming
Posted by Kelvin on 02 Jun 2008 | Tagged as: Lucene / Solr / Nutch, programming
Comments Off
Posted by Kelvin on 24 Sep 2007 | Tagged as: Lucene / Solr / Nutch, programming
Posted by Kelvin on 03 Jan 2007 | Tagged as: Lucene / Solr / Nutch
Posted by Kelvin on 01 Jan 2007 | Tagged as: Lucene / Solr / Nutch, programming
Posted by Kelvin on 01 Dec 2006 | Tagged as: Lucene / Solr / Nutch, programming
4 Comments »
Posted by Kelvin on 31 Oct 2006 | Tagged as: Lucene / Solr / Nutch, programming
Posted by Kelvin on 09 Aug 2006 | Tagged as: Lucene / Solr / Nutch, programming
2 Comments »
Posted by Kelvin on 08 Mar 2006 | Tagged as: Lucene / Solr / Nutch
Next Page »