Stream-Based Compression

PHP’s request-based architecture does not necessarily make it the best choice for processing very large amounts of data. Stream-based data processing, however, works around this limitation by not requiring all data to be loaded into memory before processing it. Two examples of stream-based data processing that have been available since PHP 5.1.2 are xmlreader and xmlwriter. Instead of requiring a whole XML document to be loaded into memory first, they are capable of consuming (or producing, respectively) an XML document in a stream-based fashion.

Compressing and un-compressing, however, which potentially also has to deal with large amounts of data, did not work on streams. This has changed with PHP 7. To make use of this new feature, you will need to have the zlib extension installed.

PHP 7 has four new functions, deflate_init(), deflate_add(), inflate_init(), and inflate_add(). Let us try to compress some data:

$context = deflate_init(ZLIB_ENCODING_GZIP);

$data = '';

$data .= deflate_add($context, 'data ...', ZLIB_NO_FLUSH);
$data .= deflate_add($context, '... more data ...', ZLIB_NO_FLUSH);
$data .= deflate_add($context, '... even more data', ZLIB_FINISH);

var_dump($data);

To compress (or un-compress) data, we need a context, which technically just is a PHP resource. When creating the context, we need to specify the compression algorithm. We do this by passing one of the following constants: ZLIB_ENCODING_GZIP, ZLIB_ENCODING_DEFLATE, or ZLIB_ENCODING_RAW which denotes another type of deflate algorithm.

Optionally, we could pass additional parameters to deflate_init() if we need more control over how the data is being processed and compressed.

The second parameter passed to deflate_add() is the data to compress. The third parameter tells PHP how to deal with the compression. To achieve the most efficient compression results, you should use ZLIB_NO_FLUSH in all but the last calls. With the last call, you need to pass ZLIB_FINISH.

The output of the above code is just some binary data that is – obviously – illegible. We can, however, un-compress it again:

$compressContext = deflate_init(ZLIB_ENCODING_DEFLATE);

$compressedData = '';

$compressedData .= deflate_add(
    $compressContext, 'data ...', ZLIB_NO_FLUSH
);

$compressedData .= deflate_add(
    $compressContext, '... more data ...', ZLIB_NO_FLUSH
);

$compressedData .= deflate_add(
    $compressContext, '... even more data', ZLIB_FINISH
);

$unCompressContext = inflate_init(ZLIB_ENCODING_DEFLATE);

$unCompressedData = inflate_add(
    $unCompressContext, $compressedData, ZLIB_NO_FLUSH
);

$unCompressedData .= inflate_add(
    $unCompressContext, null, ZLIB_FINISH
);

print $unCompressedData;

Luckily – or unsurprisingly – the result is:

data ...... more data ...... even more data