Supermind Search Consulting Blog 
Solr - Elasticsearch - Big Data

Posts about PHP

Send response to client in PHP and continue processing

Posted by Kelvin on 03 Feb 2014 | Tagged as: PHP

Here's one way to send and close the connection to the client and for the PHP script to continue processing, presumably to perform some processing that is time-consuming:

<?php
ob_end_clean();
header("Connection: close\r\n");
header("Content-Encoding: none\r\n");
ignore_user_abort(true); // optional
ob_start();
echo ('Text user will see');
$size = ob_get_length();
header("Content-Length: $size");
ob_end_flush();     // Strange behaviour, will not work
flush();            // Unless both are called !
ob_end_clean();
 
//do processing here
sleep(5);
 
echo('Text user will never see');
//do some processing

Note that some stackoverflow answers which mention the use of ignore_user_abort are mistaken. That's not required at all. And you'll need the Content-Encoding: none header, otherwise it won't work properly with clients that accept gzip encoding for example.

Interesting PHP and apache/nginx links

Posted by Kelvin on 25 Nov 2012 | Tagged as: programming, PHP

http://code.google.com/p/rolling-curl/
A more efficient implementation of curl_multi()

https://github.com/krakjoe/pthreads
http://docs.php.net/manual/en/book.pthreads.php
Posix threads in PHP. Whoa!

http://www.underhanded.org/blog/2010/05/05
Installing Apache Worker over prefork.

http://www.wikivs.com/wiki/Apache_vs_nginx
I stumbled on this page when researching the pros/cons of Apache + mod_php vs nginx + php5-fpm

http://barry.wordpress.com/2008/04/28/load-balancer-update/
Nice posting about wordpress.com's use of nginx for load balancing.

Download KhanAcademy videos with a PHP crawler

Posted by Kelvin on 08 Oct 2011 | Tagged as: PHP, programming

At the moment (October 2011), there's no simple way to download all videos from a playlist from KhanAcademy.org.

This simple PHP crawler script changes that. 🙂

What it does is downloads the videos (from archive.org) to a subfolder, numbering and naming the videos with the respective titles (not the gibberish titles that archive.org has assigned them). Additionally, through the use of wget –continue, the crawler has auto-resume support, so even if your computer crashes in the middle of a crawl, you don't need to start all over again.

Usage

Usage is like this, assuming the script is named downkhan.php:

php downkhan.php {folder} {urls.txt}
php downkhan.php history history.txt

where folder is the subdirectory to save the videos in, and urls.txt is a list of urls obtained by running a regex on http://www.khanacademy.org/#browse.

Regex

The regex used was

href="(.*?)".*?><span.*?>(.*?)</span>

urls

Here is a few lines of a urls.txt file:

http://www.khanacademy.org/video/scale-of-earth-and--sun?playlist=Cosmology+and+Astronomy|Scale of Earth and  Sun
http://www.khanacademy.org/video/scale-of-solar-system?playlist=Cosmology+and+Astronomy|Scale of Solar System
http://www.khanacademy.org/video/scale-of-distance-to-closest-stars?playlist=Cosmology+and+Astronomy|Scale of Distance to Closest Stars

Here's a list of what I've created so far:

http://www.supermind.org/code/history.txt
http://www.supermind.org/code/biology.txt
http://www.supermind.org/code/finance.txt
http://www.supermind.org/code/cosmology.txt
http://www.supermind.org/code/healthcare.txt
http://www.supermind.org/code/linearalgebra.txt
http://www.supermind.org/code/statistics.txt

script code

And here's the script:

&lt;?php 
$args = $_SERVER['argv'];
$folder = $args[1];
$file = $args[2];
 
$arr = explode("\n", trim(file_get_contents(getcwd()."/".$file)));
$urls = array();
foreach($arr as $k) {
  $split = explode("|", $k);
  $urls[$split[0]] = $split[1];
}
 
 
mkdir($folder);
chdir($folder);
$counter = 0;
 
foreach($urls as $url=>$title) {
  $counter++;
 
  echo "Fetching $url\n";
  $html = '';
  while(!$html) $html = fetch_url($url);
  $vid = get_match("/<a href=\"(http:\/\/www.archive.org.*?)\"/", $html);
  $outfile = "$counter. $title.mp4";
 
  `wget --continue $vid -O "$outfile"`;  
}
 
function get_match($pattern, $s) {
  preg_match($pattern, $s, $matches);
  if($matches) {
    return $matches[1];
  } else return NULL;
}
 
function fetch_url($url)
{
    $curl_handle = curl_init(); // initialize curl handle
    curl_setopt($curl_handle, CURLOPT_URL, $url); // set url to post to
    curl_setopt($curl_handle, CURLOPT_FAILONERROR, 1);
    curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 5);
    curl_setopt($curl_handle, CURLINFO_TOTAL_TIME, 20);
    curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, 1); // allow redirects
    curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1); // return into a variable
    curl_setopt($curl_handle, CURLOPT_HTTPHEADER, array('Accept: */*', 'User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows)'));
    $result = curl_exec($curl_handle); // run the whole process
    if (curl_exec($curl_handle) === false) {
        echo 'Curl error: ' . curl_error($curl_handle);
    }
    curl_close($curl_handle);
    return $result;
}
 
function rel2abs($rel, $base)
{
    /* return if already absolute URL */
    if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
 
    /* queries and anchors */
    if ($rel[0] == '#' || $rel[0] == '?') return $base . $rel;
 
    /* parse base URL and convert to local variables:
 $scheme, $host, $path */
    extract(parse_url($base));
 
    /* remove non-directory element from path */
    $path = preg_replace('#/[^/]*$#', '', $path);
 
    /* destroy path if relative url points to root */
    if ($rel[0] == '/') $path = '';
 
    /* dirty absolute URL */
    $abs = "$host$path/$rel";
 
    /* replace '//' or '/./' or '/foo/../' with '/' */
    $re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
    for ($n = 1; $n > 0; $abs = preg_replace($re, '/', $abs, -1, $n)) {
    }
 
    /* absolute URL is ready! */
    return $scheme . '://' . $abs;
}

Painless CRUD in PHP via AjaxCrud

Posted by Kelvin on 08 Oct 2011 | Tagged as: programming, PHP

I recently discovered an Ajax CRUD library which makes CRUD operations positively painless: AjaxCRUD

Its features include:

– displaying list in an inline-editable table
– generates a create form
– all operations (add,edit,delete) handled via ajax
– supports 1:many relations
– only 1 class to include!!

I highly recommend you try it out!

Here is the example code:

# the code for the class
include ('ajaxCRUD.class.php');
 
# this one line of code is how you implement the class
$tblCustomer = new ajaxCRUD("Customer",
                             "tblCustomer", "pkCustomerID");
 
# don't show the primary key in the table
$tblCustomer->omitPrimaryKey();
 
# my db fields all have prefixes;
# display headers as reasonable titles
$tblCustomer->displayAs("fldFName", "First");
$tblCustomer->displayAs("fldLName", "Last");
$tblCustomer->displayAs("fldPaysBy", "Pays By");
$tblCustomer->displayAs("fldDescription", "Customer Info");
 
# set the height for my textarea
$tblCustomer->setTextareaHeight('fldDescription', 100);
 
# define allowable fields for my dropdown fields
# (this can also be done for a pk/fk relationship)
$values = array("Cash", "Credit Card", "Paypal");
$tblCustomer->defineAllowableValues("fldPaysBy", $values);
 
# add the filter box (above the table)
$tblCustomer->addAjaxFilterBox("fldFName");
 
# actually show to the table
$tblCustomer->showTable();

PHP function to send an email with file attachment

Posted by Kelvin on 11 Jun 2011 | Tagged as: programming, PHP

Courtesy of http://www.finalwebsites.com/forums/topic/php-e-mail-attachment-script

function mail_attachment($filename, $path, $mailto, $from_mail, $from_name, $replyto, $subject, $message) {
    $file = $path.$filename;
    $file_size = filesize($file);
    $handle = fopen($file, "r");
    $content = fread($handle, $file_size);
    fclose($handle);
    $content = chunk_split(base64_encode($content));
    $uid = md5(uniqid(time()));
    $name = basename($file);
    $header = "From: ".$from_name." <".$from_mail.">\r\n";
    $header .= "Reply-To: ".$replyto."\r\n";
    $header .= "MIME-Version: 1.0\r\n";
    $header .= "Content-Type: multipart/mixed; boundary=\"".$uid."\"\r\n\r\n";
    $header .= "This is a multi-part message in MIME format.\r\n";
    $header .= "--".$uid."\r\n";
    $header .= "Content-type:text/plain; charset=iso-8859-1\r\n";
    $header .= "Content-Transfer-Encoding: 7bit\r\n\r\n";
    $header .= $message."\r\n\r\n";
    $header .= "--".$uid."\r\n";
    $header .= "Content-Type: application/octet-stream; name=\"".$filename."\"\r\n"; // use different content types here
    $header .= "Content-Transfer-Encoding: base64\r\n";
    $header .= "Content-Disposition: attachment; filename=\"".$filename."\"\r\n\r\n";
    $header .= $content."\r\n\r\n";
    $header .= "--".$uid."--";
    if (mail($mailto, $subject, "", $header)) {
        echo "mail send ... OK"; // or use booleans here
    } else {
        echo "mail send ... ERROR!";
    }
}

Prettyprint xml in PHP

Posted by Kelvin on 04 Dec 2010 | Tagged as: PHP

Ever wanted to format your XML nicely? Use the SimpleDOM class.

Usage is like so:

include "SimpleDOM.php";
 
$xml = "<foo><bar>car</bar></foo>";
$dom = simpledom_load_string($xml);
$xml = $dom->asPrettyXML();
echo $xml;

Produces:

&lt;?xml version="1.0"?&gt;
&lt;foo&gt;
  &lt;bar&gt;car&lt;/bar&gt;
&lt;/foo&gt;

URLizer: a WordPress plugin to automatically linkify URLs

Posted by Kelvin on 12 Oct 2010 | Tagged as: programming, PHP

Am I the only guy using WordPress who is too lazy to type out anchors?

Well, I've been using a WordPress plugin I wrote to automagically linkify URLs for a number of years now, and finally decided to add it to Google Code.

So here it is! http://code.google.com/p/urlizer/

Run php from html files on Dreamhost

Posted by Kelvin on 10 Oct 2010 | Tagged as: programming, PHP

Modify .htaccess to include this:

Correct

AddType php-cgi .html .htm

WRONG

AddType application/x-httpd-php .php .htm .html

or

AddHandler application/x-httpd-php .html

[SOLVED] Howto build the PHP rrdtool extension

Posted by Kelvin on 09 Oct 2010 | Tagged as: PHP, programming, Ubuntu

The definitive answer is here: http://www.samtseng.liho.tw/~samtz/blog/2009/03/11/howto-build-the-php-rrdtool-extension/

If you're on Ubuntu, do this first:

sudo apt-get install rrdtool librrd-dev php5-dev

Then follow the steps above.

[SOLVED] curl: (56) Received problem 2 in the chunky parser

Posted by Kelvin on 09 Oct 2010 | Tagged as: programming, crawling, PHP

The problem is described here:

http://curl.haxx.se/mail/lib-2006-04/0046.html

I successfully tracked the problem to the "Connection:" header. It seems that
if the "Connection: keep-alive" request header is not sent the server will
respond with data which is not chunked . It will still reply with a
"Transfer-Encoding: chunked" response header though.
I don't think this behavior is normal and it is not a cURL problem. I'll
consider the case closed but if somebody wants to make something about it I
can send additional info and test it further.

The workaround is simple: have curl use HTTP version 1.0 instead of 1.1.

In PHP, add this:

curl_setopt($curl_handle, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0 );

Next Page »