Thoughts on Lucene, Solr, crawling and vertical search 

Reading ElasticSearch server book...

Posted by Kelvin on 23 May 2013 | Tagged as: Lucene / Solr / Elastic Search / Nutch

Just got on my hands on a review copy of PacktPub's ElasticSearch Server book, which I believe is the first ES book on the market.

Review to follow shortly..

New file formats in Lucene 4.1+ index

Posted by Kelvin on 30 Apr 2013 | Tagged as: Lucene / Solr / Elastic Search / Nutch

Lucene 4.1 introduces new files in the index.

Here's a link to the documentation: https://builds.apache.org/job/Lucene-Artifacts-trunk/javadoc/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html

The different types of files are:

.tim: Term Dictionary
.tip: Term Index
.doc: Frequencies and Skip Data
.pos: Positions
.pay: Payloads and Offsets

Permission filtering in Solr using an ACL permissions string

Posted by Kelvin on 03 Apr 2013 | Tagged as: Lucene / Solr / Elastic Search / Nutch

For an app I'm working on, permissions ACL is stored in a string, in the form:

category1=100|category2=300|category3=300

Both users and documents have an ACL string.

The number represents the access level for that category. Bigger numbers mean ...

Solr DateField java dateformat

Posted by Kelvin on 03 Apr 2013 | Tagged as: Lucene / Solr / Elastic Search / Nutch

Grrrr... keep forgetting the Solr DateField dateformat, so here it is for posterity.

new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");

My favorite open-source license of all...

Posted by Kelvin on 30 Mar 2013 | Tagged as: programming

WTFPL - Do What the Fuck You Want to Public License

http://www.wtfpl.net/

Installing mosh on Dreamhost

Posted by Kelvin on 26 Mar 2013 | Tagged as: programming

Here's a gist which helps you install mosh on Dreamhost: https://gist.github.com/andrewgiessel/4486779

Generating HMAC MD5/SHA1/SHA256 etc in Java

Posted by Kelvin on 26 Nov 2012 | Tagged as: programming

There are a number of examples online which show how to generate HMAC MD5 digests in Java.

Unfortunately, most of them don't generate digests which match the digest examples provided on the HMAC wikipedia page.

HMAC_MD5("key", "The quick brown

...

Interesting PHP and apache/nginx links

Posted by Kelvin on 25 Nov 2012 | Tagged as: PHP, programming

http://code.google.com/p/rolling-curl/
A more efficient implementation of curl_multi()

https://github.com/krakjoe/pthreads
http://docs.php.net/manual/en/book.pthreads.php
Posix threads in PHP. Whoa!

http://www.underhanded.org/blog/2010/05/05
Installing Apache Worker over prefork.

http://www.wikivs.com/wiki/Apache_vs_nginx
I stumbled on this page when researching the pros/cons of Apache + ...

Java port of Quicksilver-style Live Search

Posted by Kelvin on 19 Nov 2012 | Tagged as: Lucene / Solr / Elastic Search / Nutch, programming

Here's a straight Java port of the quicksilver algo, found here: http://orderedlist.com/blog/articles/live-search-with-quicksilver-style-for-jquery/

quicksilver.js contains the actual algorithm in javascript.

It uses the same input strings as the demo page at http://static.railstips.org/orderedlist/demos/quicksilverjs/jquery.html

import java.io.IOException;

...

The easiest way of converting a MySQL DB from latin1 to UTF8

Posted by Kelvin on 16 Nov 2012 | Tagged as: programming

There are *numerous* pages online describing how to fix those awful junk characters in a latin1 column caused by unicode characters.

After spending over 2 hours trying out different methods, I found one that's dead simple and actually works: ...

Next Page »

05/23/2013 | Kelvin Tan | Lucene Solr ElasticSearch Consultant