Supermind Search Consulting Blog 
Solr - Elasticsearch - Big Data

Pain-free Solr replication

Posted by Kelvin on 02 Dec 2015 | Tagged as: Lucene / Solr / Elasticsearch / Nutch

Here's a setup I use for totally pain-free Solr replication, and allowing you to switch masters/slaves quickly without messing with config files. Add this to solrconfig.xml <requestHandler name="/replication" class="solr.ReplicationHandler" > <str name="maxNumberOfBackups">1</str> <lst name="master"> <str name="enable">${enable.master:false}</str> <str name="replicateAfter">startup</str> <str name="replicateAfter">commit</str> <str name="confFiles">solrconfig.xml,schema.xml,stopwords.txt,elevate.xml</str> <str name="commitReserveDuration">00:00:10</str> </lst> <lst name="slave"> <str name="enable">${enable.slave:false}</str> <str name="masterUrl">http://${replication.master}:8983/solr/corename</str> <str name="pollInterval">00:00:20</str> <str name="compression">internal</str> […]

[SOLVED] Frequent disconnects on Ubuntu 12.04 iwlwifi Centrino 2200

Posted by Kelvin on 20 Nov 2015 | Tagged as: Ubuntu

On certain wireless routers, I was getting the dreaded "wlan0: deauthenticating from … by local choice", resulting in constant disconnects (every 30 seconds or less). I tried a whole bunch of options (disabling 11n, disabling hw scanning etc) and the only thing that eventually worked was disabling ipv6. sudo gedit /etc/sysctl.conf   #Add these lines […]

Monier-Williams Sanskrit-English-IAST search engine

Posted by Kelvin on 17 Sep 2015 | Tagged as: programming, Lucene / Solr / Elasticsearch / Nutch, Python

I just launched a search application for the Monier-Williams dictionary, which is the definitive Sanskrit-English dictionary. See it in action here: http://sanskrit.supermind.org The app is built in Python and uses the Whoosh search engine. I chose Whoosh instead of Solr or ElasticSearch because I wanted to try building a search app which didn't depend on […]

A HTML5 ElasticSearch Query DSL Builder

Posted by Kelvin on 16 Sep 2015 | Tagged as: programming, Lucene / Solr / Elasticsearch / Nutch

Tl;DR : I parsed ElasticSearch source and generated a HTML app that allows you to build ElasticSearch queries using its JSON Query DSL. You can see it in action here: http://supermind.org/elasticsearch/query-dsl-builder.html I really like ElasticSearch's JSON-based Query DSL – it lets you create fairly complex search queries in a relatively painless fashion. I do not, […]

Properly unit testing scrapy spiders

Posted by Kelvin on 20 Nov 2014 | Tagged as: crawling, Python

Scrapy, being based on Twisted, introduces an incredible host of obstacles to easily and efficiently writing self-contained unit tests: 1. You can't call reactor.run() multiple times 2. You can't stop the reactor multiple times, so you can't blindly call "crawler.signals.connect(reactor.stop, signal=signals.spider_closed)" 3. Reactor runs in its own thread, so your failed assertions won't make it […]

Definitive guide to routing Android and Genymotion traffic through a socks proxy

Posted by Kelvin on 16 Nov 2014 | Tagged as: android, programming

If you only need to route traffic on Android through a ssh tunnel (not proxy), just use http://code.google.com/p/sshtunnel/ If all you need to do is to inspect network traffic, you can use Wireshark on Genymotion. If however, you're on Genymotion and/or need to get Android traffic through a proxy, especially if you're trying to conduct […]

Send response to client in PHP and continue processing

Posted by Kelvin on 03 Feb 2014 | Tagged as: PHP

Here's one way to send and close the connection to the client and for the PHP script to continue processing, presumably to perform some processing that is time-consuming: <?php ob_end_clean(); header("Connection: close\r\n"); header("Content-Encoding: none\r\n"); ignore_user_abort(true); // optional ob_start(); echo ('Text user will see'); $size = ob_get_length(); header("Content-Length: $size"); ob_end_flush(); // Strange behaviour, will not work […]

Mapping alt-pgup and alt-pgdown to home and end in ubuntu

Posted by Kelvin on 12 Nov 2013 | Tagged as: Ubuntu

On my Lenovo T530 laptop, the PgUp and PgDown keys are right next to the arrow keys, which makes for very smooth code navigation. Unfortunately, the Home and End keys are far away, above the Backspace key to be precise. Here's how to map Alt + PgUp -> Home and Alt + PgDown -> End […]

[SOLVED] gedit Invalid byte sequence in conversion input

Posted by Kelvin on 07 Nov 2013 | Tagged as: Ubuntu

I've been tearing my hair out lately trying to open UTF-8 encoded text files in gedit (Ubuntu 12.04). For some reason, the auto charset detection mechanism is broken. Opening the same files using gvim or leafpad just works. Googling for a solution didn't help either. Well, I found the fix. What you need to do […]

[solved] Tomcat 6 UTF-8 encoding issue

Posted by Kelvin on 08 Oct 2013 | Tagged as: programming

If after following all the instructions in the Tomcat docs for enabling UTF-8 support (http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8) and you still run into UTF-8 issues, and your webapp involves reading and displaying the contents of files, give this a whirl. In catalina.sh, either at the top of the file or after the long comments, insert this: export CATALINA_OPTS="$CATALINA_OPTS […]

« Previous PageNext Page »