Thoughts on Lucene, Solr, crawling and vertical search 

Batch convert svg to png in Ubuntu

Posted by Kelvin on 19 Oct 2011 | Tagged as: Ubuntu

sudo apt-get install librsvg2-bin
for i in *; do rsvg-convert -a $i -o `echo $i | sed -e 's/svg$/png/'`; done
 

to rasterize the ...

Mount a .dmg file in Ubuntu

Posted by Kelvin on 11 Oct 2011 | Tagged as: Ubuntu

sudo apt-get install dmg2img
dmg2img /path/to/image.dmg
sudo modprobe hfsplus
sudo mount -t hfsplus -o loop image.img /mnt
 

The .dmg archive is now mounted at /mnt. You ...

Download KhanAcademy videos with a PHP crawler

Posted by Kelvin on 08 Oct 2011 | Tagged as: PHP, programming

At the moment (October 2011), there's no simple way to download all videos from a playlist from KhanAcademy.org.

This simple PHP crawler script changes that. :-)

What it does is downloads the videos (from archive.org) ...

Painless CRUD in PHP via AjaxCrud

Posted by Kelvin on 08 Oct 2011 | Tagged as: PHP, programming

I recently discovered an Ajax CRUD library which makes CRUD operations positively painless: AjaxCRUD

Its features include:

- displaying list in an inline-editable table
- generates a create form
- all operations (add,edit,delete) handled via ajax
...

What's new in Solr 3.4.0

Posted by Kelvin on 06 Oct 2011 | Tagged as: Lucene / Solr / Nutch

If you are already using Apache Solr 3.1, 3.2 or 3.3, it's strongly recommended you upgrade to 3.4.0 because of the index corruption bug on OS or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0.

Solr 3.4.0 release ...

Introducing SolrTutorial.com

Posted by Kelvin on 02 Oct 2011 | Tagged as: Lucene / Solr / Nutch

Just launched a Solr tutorial website, a site styled after my LuceneTutorial.com but tailored towards Solr users.

It also includes high-level overviews to Solr for non-programmers, such as Solr for Managers and Solr for SysAdmins.

Delete directories older than x days

Posted by Kelvin on 04 Aug 2011 | Tagged as: Ubuntu

Great for cleaning up log directories.

find . -maxdepth 1 -mtime +14 -type d -exec rm -fr {} \;
 

Change 14 to the required age in days.

HOWTO: Collect WebDriver HTTP Request and Response Headers

Posted by Kelvin on 22 Jun 2011 | Tagged as: crawling, Lucene / Solr / Nutch, programming

WebDriver, is a fantastic Java API for web application testing. It has recently been merged into the Selenium project to provide a friendlier API for programmatic simulation of web browser actions. Its unique property is that of executing web pages ...

Solr 3.2 released!

Posted by Kelvin on 22 Jun 2011 | Tagged as: crawling, Lucene / Solr / Nutch, programming

I'm a little slow off the block here, but I just wanted to mention that Solr 3.2 had been released!

Get your download here: http://www.apache.org/dyn/closer.cgi/lucene/solr

Solr 3.2 release highlights include

  • Ability to specify overwrite and commitWithin

...

Classical learning curves for some editors

Posted by Kelvin on 20 Jun 2011 | Tagged as: programming

Next Page »

02/06/2012 | Kelvin Tan | Lucene Solr Crawl Consultant