Strings

Changes to String Handling

The string handling in PHP 7 changed for two very important areas. First, strings that resemble hexadecimal numbers are no longer considered to be numeric:

var_dump(is_numeric("0x123"));

Executing the above sample will now return false as compared to PHP 5 where true was returned. This has drastic effects on comparisons and arithmetic operations:

var_dump("0x123" == "291");

Since no casting or interpretation of the hexadecimal value is performed, the two strings are no longer considered eqal:

bool(false)

Even mathematical calculations do not lead to the previously expected result:

var_dump("0xe" + "0x1");

As neither string is interpreted as hexadecimal value, the result is no longer 16 but 0:

int(0)

To still handle these types of strings, the function filter_var() can be used. In combination with the flags FILTER_VALIDATE_INT and FILTER_FLAG_ALLOW_HEX the string, if valid, gets explicitly converted to an integer:

$int = filter_var("0xff", FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_HEX);
var_dump($int);

Running this code will produce the following output:

int(255)

The second change to string handling is due to the addition of Unicode Codepoint Escape Syntax for double-quoted strings and heredocs. As a result of this addition, strings containing "\u{" followed by an invalid sequence will now trigger an error. Strings that merely contain "\u" are not affected.

Return Value of substr() Changed in an Edge Case

Due to PHP’s loosely typed nature, return values are often misinterpreted. A built-in function like strpos(), for instance, can either return 0, meaning the needle has been found at position 0 of the haystack, or false, meaning the needle was not found anywhere in the haystack. To differentiate between the two, the strict comparison === rather than == has to be used on the result of strpos().

One of the backwards compatibility breaks that PHP 7 introduces might in fact not even be noticed unless strict comparison is used. The function substr() will return an empty string rather than false when being asked for a substring starting on the string boundary:

$string = 'string';

var_dump(substr($string, 5, 1));
var_dump(substr($string, 6, 1));
var_dump(substr($string, 7, 1));

The result is:

string(1) "g"
string(0) ""
bool(false)

In the unlikely event that your software relies on PHP 5’s behavior, you can either adjust your code, or create a wrapper function around substr() that mimics the old behavior. Unless you explicitly checked the return value’s type, however, your application has probably never noticed anyway.

Operators and Invalid Strings

In addition to numbers, the +, -, *, /, **, %, <<, >>, |, &, and ^ operators also work on strings in PHP. When used with one of these operators, a string is automatically cast to a number before the operation is performed. This is why "1" + "1" evaluates to 2 in PHP.

Prior to PHP 7.1, when a string contains digits as well as letters and begins with a number then PHP would silently perform an operation such as + on the numeric part of the string:

var_dump('1 elePHPant' + '1 elePHPant');

Executing the code shown above with PHP 5 or PHP 7.0 will print the output shown below:

int(2)

What happens here is that PHP automatically casts the string '1 elePHPant' to an integer, and the result is 1. We get the same result when we explicitly perform this operation:

var_dump((int) '1 elePHPant');

Executing the code shown above will print the output shown below:

int(1)

PHP 7.1 and later still automatically casts strings that begin with a number to an integer when an operator such a + is used, but it triggers a notice when this happens:

var_dump('1 elePHPant' + '1 elePHPant');

Executing the code shown above with 7.1 or later will print the output shown below:

Notice: A non well formed numeric value encountered in ...
Notice: A non well formed numeric value encountered in ...
int(2)

Prior to PHP 7.1, when a string does not begin with a number then PHP would silently perform an operation such as + using 0 as the “value” of the string:

var_dump(1 + 'string');

Executing the code shown above with PHP 5 or PHP 7.0 will print the output shown below:

(int) 1

What happens here is that PHP automatically casts the string to an integer and the result is 0. We get the same result when we explicitly perform this operation:

var_dump((int) 'string');

Executing the code shown above will print the output shown below:

int(0)

PHP 7.1 and later trigger an E_WARNING when an operator such a + is used with non-numeric string operands:

var_dump(1 + 'string');

Executing the code shown above with 7.1 or later will print the output shown below:

Warning: A non-numeric value encountered in ...
int(1)

The introduction of these E_NOTICE and E_WARNING errors may create backwards-incompatibility issues when custom error handlers are used. However, at least the E_WARNING error brings to light code errors that should be fixed.

To make code that uses the operators listed above on strings work correctly with PHP 7.1 and later you need to ensure, especially when you are dealing with values that come from a request variable, for instance, that a string can safely be interpreted as an integer, for instance.

Unfortunately, this is not as easy as using the built-in is_int() function as this returns false for a string such as '123'. The example below shows how a function that performs such a check could look like:

function string_is_integer($variable)
{
    return (string) (int) $variable === (string) $variable;
}

var_dump(string_is_integer('abc'));
var_dump(string_is_integer('123'));
var_dump(string_is_integer(123));

Executing the code shown above will print the output shown below:

bool(false)
bool(true)
bool(true)

Case Mapping and Folding in mbstring

Changing the case of a string using strtoupper() and strtolower() simply replaces lower case character with their upper case counterpart, and vice versa. In real life, however, changing case is not always that simple. In German, for example, there exists a special character ß (a ligature of s and z, thus the corresponding named HTML entity is called &szlig;). ß is a lower case character which used to have no upper case correspondence. Coincidentally, an upper case ß has been made official in 2017, so the given example is kind of outdated, but even asking a German teacher we could not come up with a better example.

var_dump(mb_strtoupper('straße'));

When converting the lower case ß to upper case, PHP 7.3 and above will replace ß with SS:

string(7) "STRASSE"

This is considered a backward incompatible change because the string length changes, which did not happen until PHP 7.2.

If you want more control over the case conversion, use the function mb_convert_case(), which allows you to specify a conversion mode. The following modes are available: MB_CASE_UPPER, MB_CASE_LOWER, MB_CASE_TITLE, MB_CASE_FOLD, MB_CASE_UPPER_SIMPLE, MB_CASE_LOWER_SIMPLE, MB_CASE_TITLE_SIMPLE, MB_CASE_FOLD_SIMPLE.

Named Captures in mb_ereg_replace()

A few years ago, somebody said that regular expressions “look like you threw a cat onto your keyboard”. While regular expressions can be quite useful, we feel that things can get out of control pretty quickly with regards to their readability. Granted, current IDEs offer great tooling that helps us understand and test regular expressions, but

The mb_ereg_replace() function now allows you to assign a name to a group of characters (so-called “named captures”), which can make it a little easier to reference a pattern in the replacement string, or make the whole regular expression yet more difficult to read – you decide.

This addition can be considered a backward incompatible change because some patterns might be interpreted differently now, potentially leading to undesired results.