Supermind Search Consulting Blog 
Solr - Elasticsearch - Big Data

Posts about programming

Application-wide keyboard shortcuts in Swing

Posted by Kelvin on 21 Apr 2011 | Tagged as: programming

Swing's focus subsystem of keyboard events are fired specific to the component in focus.

One way of implementing application-wide keyboard shortcuts is to add it to _every_ component that is created. (yes, its as ridonkulous as it sounds)

Here's another way, using KeyboardFocusManager:

  // Add Ctrl-W listener to quit application
    KeyboardFocusManager.getCurrentKeyboardFocusManager().addKeyEventDispatcher(new KeyEventDispatcher(){
 
      public boolean dispatchKeyEvent(KeyEvent e) {
        if (e.getKeyCode() == java.awt.event.KeyEvent.VK_W && e.getModifiers() == java.awt.event.InputEvent.CTRL_MASK) {
          System.exit(0);
          return true;
        }
        return false;
      }
    });

Working MySQL 5.1+ Levenshtein Stored Procedure

Posted by Kelvin on 13 Apr 2011 | Tagged as: programming

Update: Changed 0x00 to '\0' as per Jan-Hendrik's comment below.

There are a number of MySQL functions for calculating Levenshtein distance floating around StackOverflow and other forums. They all seem to be based off http://codejanitor.com/wp/2007/02/10/levenshtein-distance-as-a-mysql-stored-function/ (broken link).

Anyway, I couldn't get them to work for me. MySQL complained:

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 4

Well, it turns out that you need to specify a delimiter instead of the default delimiter of ;. So here's a working version of the levenstein distance function, courtesy of CodeJanitor.

DELIMITER //
CREATE FUNCTION LEVENSHTEIN (s1 VARCHAR(255), s2 VARCHAR(255))
RETURNS INT
DETERMINISTIC
BEGIN
  DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
  DECLARE s1_char CHAR;
  DECLARE cv0, cv1 VARBINARY(256);
  SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = '\0', j = 1, i = 1, c = 0;
  IF s1 = s2 THEN
    RETURN 0;
  ELSEIF s1_len = 0 THEN
    RETURN s2_len;
  ELSEIF s2_len = 0 THEN
    RETURN s1_len;
  ELSE
    WHILE j <= s2_len DO
      SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
    END WHILE;
    WHILE i <= s1_len DO
      SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
      WHILE j <= s2_len DO
        SET c = c + 1;
        IF s1_char = SUBSTRING(s2, j, 1) THEN SET cost = 0; ELSE SET cost = 1; END IF;
        SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
        IF c > c_temp THEN SET c = c_temp; END IF;
        SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
        IF c > c_temp THEN SET c = c_temp; END IF;
        SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
      END WHILE;
      SET cv1 = cv0, i = i + 1;
    END WHILE;
  END IF;
  RETURN c;
END//

Name parser links

Posted by Kelvin on 13 Apr 2011 | Tagged as: programming

I'm about to write some code to normalize names, e.g. split out firstName, middleName, lastName etc.

Here's some links on the topic:

http://search.cpan.org/dist/Lingua-EN-NameParse/lib/Lingua/EN/NameParse.pm
http://alphahelical.com/code/misc/nameparse/nameparse.php.txt
http://jasonpriem.com/human-name-parse/
http://code.google.com/p/php-name-parser/
http://www.onlineaspect.com/2009/08/17/splitting-names/

Preventing Java XML Parsers from resolving external DTDs

Posted by Kelvin on 07 Apr 2011 | Tagged as: programming

With some SAX parsers you can disable loading of external DTDs with this:

xmlReader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd" , false);

Not all do, however. Piccolo, for one, does not.

However, you can accomplish the same thing with this:

SAXReader reader = new SAXReader();
reader.setEntityResolver(new EntityResolver(){
  public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
    return new InputSource(new StringReader(""));
  }
});

10 things you should know about life at Google as an engineer

Posted by Kelvin on 30 Mar 2011 | Tagged as: programming

Slacy has a fantastic post about what Larry Page really needs to do to return Google to its startup roots, but what I really learnt about it, was what life at Google is like as an engineer. 🙂

If you're too lazy to read the article, here's the bullet points:

1. Lotsa meetings (duh)
2. Lotsa time spent compiling and fixing other people's code (for C++ devs)
3. Open-source software (or pretty much anything not invented at Google) frowned upon
4. Shitty cluster management system for scheduling jobs
5. Datacenter mayhem for deploying apps
6. If your product isn’t a billion-dollar idea, then it’s not worth Google’s time.
7. “unGoogly” system designs get shot down because they didn’t use Bigtable, GFS, Colossus, Spanner, MegaStore, BlobStore, or any of the other internal systems.
8. 20% time is a lie
9. Ignore the good ole 'Premature optimization is the root of all evil'
10. “Google Scale” is a myth (Google Search (the product) requires vast resources. Almost nothing else does, and yet is constrained and forced to run “at Google scale” when it’s completely unnecessary.)

Anything else you want to add to the list?

[SOLVED] Unknown initial character set index 'num' received from server

Posted by Kelvin on 13 Mar 2011 | Tagged as: programming

Recently when migrating from one server to another, my Java apps using an old version of Connector/J failed with this error:

java.sql.SQLException: Unknown initial character set index '192' received from server.
Initial client character set can be forced via the 'characterEncoding' property.

No changes were made in the apps, so it had to do with MySQL.

The offending lines in my.cnf are

 
[mysqld]
character_set_server=utf8
collation_server=utf8_unicode_ci

Commenting them out fixes the problem.

 
[mysqld]
;character_set_server=utf8
;collation_server=utf8_unicode_ci

Great thread on Akka use cases

Posted by Kelvin on 08 Mar 2011 | Tagged as: programming

Akka is a Scala-based framework which promises "Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Actors"

SO has a great thread on use-cases for Akka here: http://stackoverflow.com/questions/4493001/good-use-case-for-akka

A clear-headed evaluation of MongoDB vs Redis, TokyoCabinet and BerkeleyDB

Posted by Kelvin on 01 Mar 2011 | Tagged as: programming

http://perfectmarket.com/blog/not_only_nosql_review_solution_evaluation_guide_chart has a lucid comparison of MongoDB, Redis, TokyoCabinet and BerkeleyDB.

What's nice about the evaluation is that it mentions what use-cases which solution is likely to be a good fit.

While we're on this topic, how about a recap of Brewer's CAP Theorem (pun intended)?

And to round things up, checkout this Visual Guide to NoSQL Systems

Recap: The Fallacies of Distributed Computing

Posted by Kelvin on 01 Mar 2011 | Tagged as: programming, Lucene / Solr / Elasticsearch / Nutch, crawling

Just so no-one forgets, here's a recap of the Fallacies of Distributed Computing

1. The network is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. The network is secure.
5. Topology doesn’t change.
6. There is one administrator.
7. Transport cost is zero.
8. The network is homogeneous.

Hide 'Uncategorized' category in WordPress

Posted by Kelvin on 26 Nov 2010 | Tagged as: programming

The solution to this is somewhat surprising. It doesn't involve any PHP code or modifications to functions.

It's a CSS declaration!

.cat-item-1 {display:none;}

« Previous PageNext Page »