Supermind Search Consulting Blog 
Solr - Elasticsearch - Big Data

Phrase-based Out-of-order Solr Autocomplete Suggester

Posted by Kelvin on 16 Sep 2013 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

Solr has a number of Autocomplete implementations which are great for most purposes. However, a client of mine recently had some fairly specific requirements for autocomplete: 1. phrase-based substring matching 2. out-of-order matches ('foo bar' should match 'the bar is foo') 3. fallback matching to a secondary field when substring matches on the primary field […]

Guava Tables

Posted by Kelvin on 13 Sep 2013 | Tagged as: programming

Just discovered Guava's Table data structure. Whoa..! https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained Table<Vertex, Vertex, Double> weightedGraph = HashBasedTable.create(); weightedGraph.put(v1, v2, 4); weightedGraph.put(v1, v3, 20); weightedGraph.put(v2, v3, 5);   weightedGraph.row(v1); // returns a Map mapping v2 to 4, v3 to 20 weightedGraph.column(v3); // returns a Map mapping v1 to 20, v2 to 5

Custom Solr QueryParsers for fun and profit

Posted by Kelvin on 09 Sep 2013 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

In this post, I'll show you what you need to do to implement a custom Solr QueryParser. Step 1 Extend QParserPlugin. public class TestQueryParserPlugin extends QParserPlugin { public void init(NamedList namedList) { }   @Override public QParser createParser(String s, SolrParams localParams, SolrParams params, SolrQueryRequest req) { return new TestQParser(s, localParams, params, req); } } This […]

High-level overview of Latent Semantic Analysis / LSA

Posted by Kelvin on 09 Sep 2013 | Tagged as: programming, Lucene / Solr / Elasticsearch / Nutch

I've just spent the last couple days wrapping my head around implementing Latent Semantic Analysis, and after wading through a number of research papers and quite a bit of linear algebra, I've finally emerged on the other end, and thought I'd write something about it to lock the knowledge in. I'll do my best to […]

Naive Solr Did You Mean re-searcher SearchComponent

Posted by Kelvin on 05 Sep 2013 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

Solr makes Spellcheck easy. Super-easy in fact. All you need to do is to change some stuff in solrconfig.xml, and voila, spellcheck suggestions! However, that's not how google does spellchecking. What Google does is determine if the query has a mis-spelling, and if so, transparently correct the misspelled term for you and perform the search, […]

Reading ElasticSearch server book…

Posted by Kelvin on 23 May 2013 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

Just got on my hands on a review copy of PacktPub's ElasticSearch Server book, which I believe is the first ES book on the market. Review to follow shortly..

New file formats in Lucene 4.1+ index

Posted by Kelvin on 30 Apr 2013 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

Lucene 4.1 introduces new files in the index. Here's a link to the documentation: https://builds.apache.org/job/Lucene-Artifacts-trunk/javadoc/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html The different types of files are: .tim: Term Dictionary .tip: Term Index .doc: Frequencies and Skip Data .pos: Positions .pay: Payloads and Offsets

Permission filtering in Solr using an ACL permissions string

Posted by Kelvin on 03 Apr 2013 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

For an app I'm working on, permissions ACL is stored in a string, in the form: category1=100|category2=300|category3=300 Both users and documents have an ACL string. The number represents the access level for that category. Bigger numbers mean higher access. In the previous Lucene-based iteration, to perform permission filtering, I just loaded the entire field into […]

Solr DateField java dateformat

Posted by Kelvin on 03 Apr 2013 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

Grrrr… keep forgetting the Solr DateField dateformat, so here it is for posterity. new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");

My favorite open-source license of all…

Posted by Kelvin on 30 Mar 2013 | Tagged as: programming

WTFPL – Do What the Fuck You Want to Public License http://www.wtfpl.net/

« Previous PageNext Page »