At the moment (October 2011), there's no simple way to download all videos from a playlist from KhanAcademy.org.
This simple PHP crawler script changes that.
What it does is downloads the videos (from archive.org) to a subfolder, numbering and naming the videos with the respective titles (not the gibberish titles that archive.org has assigned them). Additionally, through the use of wget –continue, the crawler has auto-resume support, so even if your computer crashes in the middle of a crawl, you don't need to start all over again.
Usage is like this, assuming the script is named downkhan.php:
where folder is the subdirectory to save the videos in, and urls.txt is a list of urls obtained by running a regex on http://www.khanacademy.org/#browse.
The regex used was
Here is a few lines of a urls.txt file:
Here's a list of what I've created so far:
And here's the script:
$html = ""/<a href=\"(http:\/\/www.archive.org.*?)\"/""$counter. $title.mp4""$outfile"// initialize curl handle
// set url to post to
// allow redirects
// return into a variable
'Accept: */*', 'User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows)'// run the whole process
'Curl error: '/* return if already absolute URL */"/* queries and anchors */'#' || $rel == '?'/* parse base URL and convert to local variables:
$scheme, $host, $path *//* remove non-directory element from path */'#/[^/]*$#', ", $path);
/* destroy path if relative url points to root */'/') $path = ";
/* dirty absolute URL */"$host$path/$rel";
/* replace '//' or '/./' or '/foo/../' with '/' */'#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#''/'/* absolute URL is ready! */'://'