At the moment (October 2011), there's no simple way to download all videos from a playlist from KhanAcademy.org.

This simple PHP crawler script changes that. :-)

What it does is downloads the videos (from archive.org) to a subfolder, numbering and naming the videos with the respective titles (not the gibberish titles that archive.org has assigned them). Additionally, through the use of wget –continue, the crawler has auto-resume support, so even if your computer crashes in the middle of a crawl, you don't need to start all over again.

Usage

Usage is like this, assuming the script is named downkhan.php:

 

where folder is the subdirectory to save the videos in, and urls.txt is a list of urls obtained by running a regex on http://www.khanacademy.org/#browse.

Regex

The regex used was

href="(.*?)".*?><span.*?>(.*?)</span>
 

urls

Here is a few lines of a urls.txt file:

 

Here's a list of what I've created so far:

http://www.supermind.org/code/history.txt
http://www.supermind.org/code/biology.txt
http://www.supermind.org/code/finance.txt
http://www.supermind.org/code/cosmology.txt
http://www.supermind.org/code/healthcare.txt
http://www.supermind.org/code/linearalgebra.txt
http://www.supermind.org/code/statistics.txt

script code

And here's the script:

span class="st0">'argv'"\n""/""|""Fetching $url\n";
  $html = ""/<a href=\"(http:\/\/www.archive.org.*?)\"/""$counter. $title.mp4""$outfile"// initialize curl handle
// set url to post to
// allow redirects
// return into a variable
'Accept: */*', 'User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows)'// run the whole process
'Curl error: '/* return if already absolute URL */"/* queries and anchors */'#' || $rel[0] == '?'/* parse base URL and convert to local variables:
 $scheme, $host, $path */
/* remove non-directory element from path */'#/[^/]*$#', ", $path);

    /* destroy path if relative url points to root */'/') $path = ";

    /* dirty absolute URL */"$host$path/$rel";

    /* replace '//' or '/./' or '/foo/../' with '/' */'#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#''/'/* absolute URL is ready! */'://'