Supermind Search Consulting Blog 
Solr - ElasticSearch - Big Data

Power browsing proggit + HN + lobste.rs + dzone news

Posted by Kelvin on 20 Jan 2016 | Tagged as: programming

Disclaimer: this uses Erudite, a tool I wrote in Django. Here's how I speed-read programming-related news. Open https://erudite.supermind.org/news/headlines/#tab_Programming in your browser. Press ` (backtick key) to page the entire row, shift+` to page-prev the entire row. Press 1 2 3 4 to page each respective column. shift+1 to previous page on column 1, shift+2 on […]

Erudite – a text-only, keyboard-friendly news reader

Posted by Kelvin on 12 Jan 2016 | Tagged as: programming

Something I've been working on for a bit: https://erudite.supermind.org A keyboard-friendly, text-only news reader. Somewhat mobile-friendly. Hit '?' for keyboard shortcuts.

Embed custom Javascript and HTML in a Kibana 4.x visualization

Posted by Kelvin on 11 Jan 2016 | Tagged as: Lucene / Solr / Elastic Search / Nutch

The embarrassingly simple answer to embedding ANY Javascript and HTML into a Kibana vis is to hack the markdown_vis plugin to not use markdown at all, but just display the HTML as-is. Modify src/plugins/markdown_vis/public/markdown_vis_controller.js, and comment out $scope.html = $sce.trustAsHtml(marked(html)); and replace it with $scope.html = $sce.trustAsHtml(html); You'll need to recreate the bundles (just install […]

Lucene 5 NRT Example

Posted by Kelvin on 16 Dec 2015 | Tagged as: Lucene / Solr / Elastic Search / Nutch

I just added an NRT search example for Lucene 5.x to lucenetutorial.com. Check it out here: http://www.lucenetutorial.com/lucene-nrt-hello-world.html

Pain-free Solr replication

Posted by Kelvin on 02 Dec 2015 | Tagged as: Lucene / Solr / Elastic Search / Nutch

Here's a setup I use for totally pain-free Solr replication, and allowing you to switch masters/slaves quickly without messing with config files. Add this to solrconfig.xml <requestHandler name="/replication" class="solr.ReplicationHandler" >   <str name="maxNumberOfBackups">1</str>   <lst name="master">         <str name="enable">${enable.master:false}</str>         <str name="replicateAfter">startup</str>         <str name="replicateAfter">commit</str>   […]

[SOLVED] Frequent disconnects on Ubuntu 12.04 iwlwifi Centrino 2200

Posted by Kelvin on 20 Nov 2015 | Tagged as: Ubuntu

On certain wireless routers, I was getting the dreaded "wlan0: deauthenticating from … by local choice", resulting in constant disconnects (every 30 seconds or less). I tried a whole bunch of options (disabling 11n, disabling hw scanning etc) and the only thing that eventually worked was disabling ipv6. sudo gedit /etc/sysctl.conf #Add these lines at […]

Monier-Williams Sanskrit-English-IAST search engine

Posted by Kelvin on 17 Sep 2015 | Tagged as: Lucene / Solr / Elastic Search / Nutch, programming, Python

I just launched a search application for the Monier-Williams dictionary, which is the definitive Sanskrit-English dictionary. See it in action here: http://sanskrit.supermind.org The app is built in Python and uses the Whoosh search engine. I chose Whoosh instead of Solr or ElasticSearch because I wanted to try building a search app which didn't depend on […]

A HTML5 ElasticSearch Query DSL Builder

Posted by Kelvin on 16 Sep 2015 | Tagged as: Lucene / Solr / Elastic Search / Nutch, programming

Tl;DR : I parsed ElasticSearch source and generated a HTML app that allows you to build ElasticSearch queries using its JSON Query DSL. You can see it in action here: http://supermind.org/elasticsearch/query-dsl-builder.html I really like ElasticSearch's JSON-based Query DSL – it lets you create fairly complex search queries in a relatively painless fashion. I do not, […]

Properly unit testing scrapy spiders

Posted by Kelvin on 20 Nov 2014 | Tagged as: crawling, Python

Scrapy, being based on Twisted, introduces an incredible host of obstacles to easily and efficiently writing self-contained unit tests: 1. You can't call reactor.run() multiple times 2. You can't stop the reactor multiple times, so you can't blindly call "crawler.signals.connect(reactor.stop, signal=signals.spider_closed)" 3. Reactor runs in its own thread, so your failed assertions won't make it […]

Definitive guide to routing Android and Genymotion traffic through a socks proxy

Posted by Kelvin on 16 Nov 2014 | Tagged as: android, programming

If you only need to route traffic on Android through a ssh tunnel (not proxy), just use http://code.google.com/p/sshtunnel/ If all you need to do is to inspect network traffic, you can use Wireshark on Genymotion. If however, you're on Genymotion and/or need to get Android traffic through a proxy, especially if you're trying to conduct […]

Next Page »