The Road to PHP 7
When you plan to upgrade PHP 5 to the next version, you need to look for PHP 7. Confused? Well, even Microsoft did it with Windows 10. While we don’t know the real reasons why Microsoft skipped Windows 9, we can give you the background on why there is no PHP 6. Actually, there was a time when PHP 6 existed. To understand the story, we must travel back in time a few years.
PHP 5 had just been released. Critics say that PHP 5.0 was more of an alpha version, PHP 5.1 was more of a beta version, and PHP 5.2 could be considered a stable release. Still, adding full-blown support for object-oriented programming to a language that until PHP 4 was a purely procedural scripting language was a giant leap towards making PHP a solid platform for professional software development for large enterprise applications.
Soon after the release of PHP 5, in November 2005, most of the core developers that were active at the time got together in Paris and discussed how the next version of PHP could (or should) look like. The meeting minutes can still be found online.
At the time, Yahoo was probably one of the largest platforms built using PHP. The original author of PHP, Rasmus Lerdorf, was working for Yahoo, as were other core developers such as Sara Golemon and Andrei Zmievski. They suggested that the next PHP version, PHP 6, should be Unicode-based. From the viewpoint of a company that runs web portals in multiple languages and different countries, having Unicode support in PHP must have been a very appealing idea.
The big problem with Unicode, in short, is that strings do not comprise single-byte characters anymore. In ASCII, one character equals one byte. In Unicode, there are over 128,000 characters (to be exact, they are called code points). This requires at least part of those characters to be represented by more than one byte. In fact, there are different ways of representing Unicode, among them UTF-16, where each character (well, code point) is being represented by two bytes, or UTF-8, a variable-length encoding that is backwards-compatible to ASCII, but also contains characters that are represented by two or three bytes. UTF-8 is a good compromise when it comes to the length of the encoded string. This is the reason why today UTF-8 is the default encoding used on the internet that should be assumed whenever no particular encoding is specified.
PHP, at the time, was not Unicode-based. Still, you could of
course use PHP to generate or process UTF-8 content. The problem was
(or is) that when working with a variable-length encoding, internal
functions like strlen()
do not work as expected on
non-ASCII strings, because they count bytes rather than code points.
There are PHP extensions like mbstring
that allow you
to work around this problem by providing Unicode-aware string
functions. The PHP language itself, even today, does not support
Unicode out of the box.
The question was how to get from an ASCII-based language to a Unicode-based language. PHP 5 was seeing very slow adoption and about 90% of all PHP users were still on PHP 4 in late 2006. There are various reasons for this. First, most important software like Wordpress, Drupal, or Typo3, had been written in PHP 4. This, by the way, is the reason why many projects still struggle with their procedural past even today, and have a hard time moving to a real object-oriented development approach.
Most well-known applications and frameworks had been written in PHP 4, so this version was the first choice for hosting companies. At the time, using a shared hosting service (running multiple virtual hosts that belong to multiple users on one physical machine) was common, and an accepted best practice. Today, thanks to virtualization and containers, shared hosting does not have any significance anymore.
Since hosting companies offered PHP 4 support by default, and the most widely used applications worked with PHP 4, most users just did not bother to upgrade to PHP 5, or even ask their hosting company to provide PHP 5. The major PHP software projects, in turn, saw no real reason why to switch to PHP 5, because most of their users were still running PHP 4. This was a truly vicious circle that took many years to break.
In 2007, the “Go PHP 5” initiative ultimately managed to get the most relevant projects on board. Imagine: PHP 4 still had to be supported, PHP 5 was the current version, and PHP 6 was already under development. It was a hard time for the PHP core developers, because they had to work on three major versions at the same time, without an awful lot of core developers at hand.
To make the migration from PHP 5 to PHP 6 as smooth as possible,
and to avoid repeating the slow adoption of PHP 5 that was still
going on, the decision was made to add a configuration switch to
php.ini
that would configure PHP to either work in
Unicode mode, or in ASCII mode. On an implementation level, this
meant to check for this switch in every built-in PHP function, and
execute one code path for ASCII, and another for Unicode.
As work on PHP 6 progressed, it became more and more apparent this approach would not be feasible. PHP’s codebase would pretty much double its size, and as we all know, redundant code is hard to maintain and prone to errors. The plan might have been to help the users to transition to Unicode, and then get rid of the ASCII branches as quickly as possible. But what if adoption was slow, just as it had happened with PHP 5? The work on Unicode support in PHP 6 dragged along, though other really cool features started to find their way into what was designated to become PHP 6. Some of those features were namespaces, autoloading, late static binding, and anonymous functions.
At some point, PHP 6 development, at least the Unicode part, had pretty much stalled, so eventually PHP 6 was shelved. The decision was made to release all the new features as PHP 5.3. The expression “PHP 5.3 = PHP 6 - Unicode” was commonly used to explain what the PHP 5.3 release was about.
Printed books were pretty much the state of the publishing art back in 2009. And since PHP had gained a lot of market share, being the predominant programming language on the world wide web, many book publishers were eager to have a book about PHP 6 as soon as possible.
With an open source project, however, you have a moving target. The release date is usually defined as “it’s ready when it’s ready”, and the scope may … vary. You basically never know when and if a certain feature will make it into PHP until the final release. Namespaces, for example, had already been implemented for PHP 5.0, but had been removed shortly before its final release. Back then, this has caused some major issues for Sebastian, by the way, because he was in the process of writing a book on professional software development with PHP 5, and had to continuously update his manuscript to adapt it to the changes to the software.
Some time later Stefan, who had been contracted to write a book on the new features of PHP 6, was clever enough to not put a fixed date into the contract, but used the actual release date of PHP 6 as a target to publish the book. However, with PHP 6 gone, Stefan’s book got renamed to PHP 5.3, which did not help sales, because who spends money on a book about a minor version of some software?
Some publishers understand how things work, but others are clueless. Not noticing that PHP 6 had become PHP 5.3, multiple books had been released with PHP 6 in the title. Some of them were just re-releases of older books with an updated title. We actually know of one case where one of the authors found out about the “updated” book release when the publisher sent a free copy. Imagine having your name on the front of a book about some software that does not even exist!
So when the next effort to bring a new major version of PHP into the world was started in 2014, it was decided not to name it PHP 6, because the name was considered “burned”. Plus, nobody should support untalented publishers.
Following PHP 5.3, there was a period where further development of the language was dragging a bit. The PHP project has always been special among many other open source projects, because there is no real leadership. PHP is a very democratic project that has improved the processes quite a bit over the years. During the work on PHP 5.4, for example, an RFC process to suggest (or request) new features was established. The decisions about those RFCs are made by voting. Before PHP 5.4, it was more about who had the loudest voice.
Around that time, some people started to doubt the future of PHP. For a few years, there had been no major progress, let aside some nice new features here and there. A new major version was nowhere in sight. The future of PHP seemed at stake. Believe it or not, it was Facebook that brought PHP development back to life.
Over the years, Facebook had grown from a fun project of a Harvard student to the biggest social network in the world. And Facebook is written in PHP. The size of Facebook’s codebase is not publicly known, but according to sources on the web it is between 10 to 30 million lines of code. That is definitely a lot of code.
Following its massive growth, Facebook realized that the execution of PHP bytecode is a rather time-consuming and resource-intensive task. This does not come as a surprise for an interpreted language. Note: we are not talking about a problem that can be solved by introducing a bytecode cache that prevents PHP from compiling the same source files over and over. We are talking about the execution of the compiled bytecode.
Should Facebook have re-implemented their software in another language? With millions of lines of code, this is hardly an option. So if the execution of PHP code is too slow, why not compile PHP down to machine code? A small team of engineers, led by Haiping Zhao, developed a cross-compiler that would convert PHP source code into C++ source code. This source code could be compiled to machine code using a highly optimizing C++ compiler.
Basically, Facebook had built a new runtime environment for PHP. They called it HipHop. HipHop outperformed PHP 5 and used far fewer CPU cycles to execute the PHP code. Facebook used HipHop in production from 2009 to 2013 and reported a 50% reduction in CPU usage when serving web traffic and serving twice the API traffic using 30% less CPU. We assume that switching to HipHop has saved Facebook millions of dollars in yearly operating cost.
The PHP project had been posed with a challenge: suddenly, there was an alternative to PHP, backed by a major company. The PHP project accepted the challenge. The significant performance improvements achieved between PHP 5.4 and PHP 5.6 were the result.
Things then went rather silent at Facebook for a few years. When Sebastian and Stefan were in the Silicon Valley in 2013 to speak at a conference, Facebook invited us over to look at “something they had done”. We were eager to go, not only because Facebook offers free ice cream and has a BBQ smokehouse on campus. What they showed us (under NDA at the time) was mind-blowing. It was HHVM, the HipHop Virtual Machine. Facebook had taken the idea of a new PHP runtime further, and created a just-in-time (JIT) compiler that would compile PHP source code directly to machine code. In addition, they had solved some challenges that the original HipHop implementation had posed them with – challenges like having to use BitTorrent for deployment after compiling the source code into a 1 GB binary.
But Facebook had not even stopped there. They had created their own new programming language, called Hack, on top of PHP. Basically they did this to become more efficient when compiling the PHP code, because statically typed source code can be compiled to far more efficient code than dynamically typed source code. Following some promotion work by Facebook, some large PHP shops like Box.com and Wikipedia temporarily switched to HHVM in late 2014.
Back then, Facebook was interested in HHVM to be compatible with PHP. This is not surprising: Facebook’s software was written in PHP and they needed HHVM to be able to run it. The more PHP code was replaced with Hack code at Facebook, the less Facebook was interested in keeping HHVM compatible with PHP. In late 2016, early 2017, popular Open Source projects such as Symfony or PHPUnit were burdened by having to support PHP 5, PHP 7, and HHVM within the same codebase. When HHVM did not behave excactly like the official PHP runtime, these projects could not expect help from Facebook. Instead, they had to add workarounds for HHVM to their software. One popular project after another decided to drop support for HHVM. In September 2018, Facebook made their intentions clear when they announced the end of PHP support in HHVM. HHVM 4, released in February 2019, exclusively supports Hack.
At the time of writing, we know of only one (major) company besides Facebook that uses HHVM and Hack in production: Slack.