Thoughts on Lucene, Solr, crawling and vertical search 

Posts about Ubuntu

Batch convert svg to png in Ubuntu

Posted by Kelvin on 19 Oct 2011 | Tagged as: Ubuntu

This entry is part 18 of 18 in the Bash-whacking series
sudo apt-get install librsvg2-bin
for i in *; do rsvg-convert -a $i -o `echo $i | sed -e 's/svg$/png/'`; done
 

to rasterize the svg at 300dpi, shrinking dimensions by 50%:

for i in *; do rsvg-convert -z 0.5 -d 300 -p 300 -a $i -o `echo $i | sed -e 's/svg$/png/'`; done
 

Mount a .dmg file in Ubuntu

Posted by Kelvin on 11 Oct 2011 | Tagged as: Ubuntu

This entry is part 17 of 18 in the Bash-whacking series
sudo apt-get install dmg2img
dmg2img /path/to/image.dmg
sudo modprobe hfsplus
sudo mount -t hfsplus -o loop image.img /mnt
 

The .dmg archive is now mounted at /mnt. You can browse it either via command-line or via Nautilus.

Courtesy of http://iremedy.net/blog/2010/11/how-to-mount-a-dmg-file-in-ubuntu-linux/

Delete directories older than x days

Posted by Kelvin on 04 Aug 2011 | Tagged as: Ubuntu

This entry is part 16 of 18 in the Bash-whacking series

Great for cleaning up log directories.

find . -maxdepth 1 -mtime +14 -type d -exec rm -fr {} \;
 

Change 14 to the required age in days.

Determine if a server supports Gzip compression

Posted by Kelvin on 06 Jun 2011 | Tagged as: Ubuntu

This entry is part 15 of 18 in the Bash-whacking series
echo "Size WITHOUT accepting gzip"
curl http://www.supermind.org --silent --write-out "size_download=%{size_download}\n" --output /dev/null
echo "Size WITH accepting gzip"
curl http://www.supermind.org --silent -H "Accept-Encoding: gzip,deflate"  --write-out "size_download=%{size_download}\n" --output /dev/null
 

You can of course substitute the URL with a different one.

On my site, this is what I get:

$curl http://www.supermind.org --silent --write-out "size_download=%{size_download}\n" --output /dev/null
$size_download=10560
$curl http://www.supermind.org --silent -H "Accept-Encoding: gzip,deflate"  --write-out "size_download=%{size_download}\n" --output /dev/null
$size_download=4345

 

HOWTO: Add gzip support to Squid 3.1 in Ubuntu

Posted by Kelvin on 06 Jun 2011 | Tagged as: Ubuntu

The squid3 deb that's available in the apt repos don't come configured with ecap support, which is required to support serving of gzip-compressed pages to clients.

In a network environment where the majority of traffic is wireless (like where I live), reducing the payload of internal network requests will have a positive impact on performance.

Follow instructions precisely at http://code.google.com/p/squid-ecap-gzip/wiki/Installation.

You should use Squid 3.1.11 and ecap 0.03 even though more recent versions are available. I tried compiling with 3.1.12.2 and ran into a bunch of make errors, where 3.1.11 compiled just fine.

The one step where I deviated from the instructions, was when configuring Squid. I used, instead, configure options which were closer to the original Ubuntu release. Here it is:

./configure '--build=i686-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=${prefix}/lib/squid3' '--disable-maintainer-mode' '--disable-dependency-tracking' '--disable-silent-rules' '--srcdir=.' '--datadir=/usr/share/squid3' '--sysconfdir=/etc/squid3' '--mandir=/usr/share/man'  '--enable-inline' '--enable-async-io=8' '--enable-storeio=ufs,aufs,diskd' '--enable-removal-policies=lru,heap' '--enable-delay-pools' '--enable-cache-digests' '--enable-underscores' '--enable-icap-client' '--enable-follow-x-forwarded-for'  '--enable-arp-acl' '--enable-esi' '--disable-translation' '--with-logdir=/var/log/squid3' '--with-pidfile=/var/run/squid3.pid' '--with-filedescriptors=65536' '--with-large-files' '--with-default-user=proxy' '--enable-linux-netfilter' --enable-ecap
 

The last switch adds support for ecap.

Using sed to delete lines from a file

Posted by Kelvin on 21 May 2011 | Tagged as: Ubuntu

This entry is part 14 of 18 in the Bash-whacking series

Delete line containing foo

sed -i '/foo/d' filename.txt
 

Delete last line

sed -i '$d' filename.txt
 

Recursively find the n latest modified files in a directory

Posted by Kelvin on 18 May 2011 | Tagged as: programming, Ubuntu

This entry is part 13 of 18 in the Bash-whacking series

Here's how to find the latest modified files in a directory. Particularly useful when you've made some changes and can't remember what!

find . -type f -printf '%T@ %p\n' | sort -n | tail -1 | cut -f2- -d" "
 

Replace tail -1 with tail -20 to list the 20 most recent files for example.

Courtesy of StackOverflow: http://stackoverflow.com/questions/4561895/how-to-recursively-find-the-latest-modified-file-in-a-directory

Convert fixed-width file to CSV

Posted by Kelvin on 12 May 2011 | Tagged as: programming, Ubuntu

This entry is part 12 of 18 in the Bash-whacking series

After trying various sed/awk recipes to convert from fixed-width to CSV, I found a Python script that works well.

Here it is, from http://code.activestate.com/recipes/452503-convert-db-fixed-width-output-to-csv-format/

## {{{ http://code.activestate.com/recipes/452503/ (r1)
# Ian Maurer
# http://itmaurer.com/
# Convert a Fixed Width file to a CSV with Headers
#
# Requires following format:
#
# header1      header2 header3
# ------------ ------- ----------------
# data_a1      data_a2 data_a3

def writerow(ofile, row):
    for i in range(len(row)):
        row[i] = '"' + row[i].replace('"', '') + '"'
    data = ",".join(row)
    ofile.write(data)
    ofile.write("\n")

def convert(ifile, ofile):
    header = ifile.readline().strip()
    while not header:
        header = ifile.readline().strip()

    hticks = ifile.readline().strip()
    csizes = [len(cticks) for cticks in hticks.split()]
   
    line = header
    while line:

        start, row = 0, []
        for csize in csizes:
            column = line[start:start+csize].strip()
            row.append(column)
            start = start + csize + 1

        writerow(ofile, row)
        line = ifile.readline().strip()

if __name__ == "__main__":
    import sys
    if len(sys.argv) == 3:
        ifile = open(sys.argv[1], "r")
        ofile = open(sys.argv[2], "w+")
        convert(ifile, ofile)
       
    else:
        print "Usage: python convert.py <input> <output>"
## end of http://code.activestate.com/recipes/452503/ }}}
 

MD5 a directory recursively

Posted by Kelvin on 05 May 2011 | Tagged as: Ubuntu

This entry is part 11 of 18 in the Bash-whacking series

Ever need to check if a directory is exactly the same as another (including file contents)?

find . -type f -exec md5sum {} + | awk '{print $1}' | sort | md5sum
 

This runs md5sum on the individual md5sum hashes of each file.

And if you need to exclude a directory from the comparison:

find . -type f -exec md5sum {} + | grep -v dirtoexclude | awk '{print $1}' | sort | md5sum
 

[solved] checking for shout-config... no while compiling ices

Posted by Kelvin on 05 Apr 2011 | Tagged as: Ubuntu

If you're trying to compile ices and get this error:

checking for pkg-config... /usr/bin/pkg-config
configure: /usr/bin/pkg-config couldn't find libshout. Try adjusting PKG_CONFIG_PATH.
checking for shout-config... no
configure: error: Could not find a usable libshout

And you swear you've already installed libshout and libshout-devel, then you need to install libtheora and libtheora-devel. Yes, the error message is misleading.

Next Page »

02/23/2012 | Kelvin Tan | Lucene Solr Crawl Consultant