Strings
Changes to String Handling
The string handling in PHP 7 changed for two very important areas. First, strings that resemble hexadecimal numbers are no longer considered to be numeric:
var_dump(is_numeric("0x123"));
Executing the above sample will now return false
as
compared to PHP 5 where true
was returned. This has
drastic effects on comparisons and arithmetic operations:
var_dump("0x123" == "291");
Since no casting or interpretation of the hexadecimal value is performed, the two strings are no longer considered eqal:
bool(false)
Even mathematical calculations do not lead to the previously expected result:
var_dump("0xe" + "0x1");
As neither string is interpreted as hexadecimal value, the result
is no longer 16
but 0
:
int(0)
To still handle these types of strings, the function
filter_var()
can be used. In combination with the flags
FILTER_VALIDATE_INT
and
FILTER_FLAG_ALLOW_HEX
the string, if valid, gets
explicitly converted to an integer:
$int = filter_var("0xff", FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_HEX);
var_dump($int);
Running this code will produce the following output:
int(255)
The second change to string handling is due to the addition of
Unicode Codepoint Escape Syntax for double-quoted strings and
heredocs. As a result of this addition, strings containing
"\u{"
followed by an invalid sequence will now trigger
an error. Strings that merely contain "\u"
are not
affected.
Return Value of
substr()
Changed in an Edge Case
Due to PHP’s loosely typed nature, return values are often
misinterpreted. A built-in function like strpos()
, for
instance, can either return 0
, meaning the needle has
been found at position 0 of the haystack, or false
,
meaning the needle was not found anywhere in the haystack. To
differentiate between the two, the strict comparison
===
rather than ==
has to be used on the
result of strpos()
.
One of the backwards compatibility breaks that PHP 7 introduces
might in fact not even be noticed unless strict comparison is used.
The function substr()
will return an empty string
rather than false
when being asked for a substring
starting on the string boundary:
$string = 'string';
var_dump(substr($string, 5, 1));
var_dump(substr($string, 6, 1));
var_dump(substr($string, 7, 1));
The result is:
string(1) "g"
string(0) ""
bool(false)
In the unlikely event that your software relies on PHP 5’s
behavior, you can either adjust your code, or create a wrapper
function around substr()
that mimics the old behavior.
Unless you explicitly checked the return value’s type, however, your
application has probably never noticed anyway.
Operators and Invalid Strings
In addition to numbers, the +
, -
,
*
, /
, **
, %
,
<<
, >>
, |
,
&
, and ^
operators also work on
strings in PHP. When used with one of these operators, a string is
automatically cast to a number before the operation is performed.
This is why "1" + "1"
evaluates to 2
in
PHP.
Prior to PHP 7.1, when a string contains digits as well as
letters and begins with a number then PHP would silently perform an
operation such as +
on the numeric part of the
string:
var_dump('1 elePHPant' + '1 elePHPant');
Executing the code shown above with PHP 5 or PHP 7.0 will print the output shown below:
int(2)
What happens here is that PHP automatically casts the string
'1 elePHPant'
to an integer, and the result is
1
. We get the same result when we explicitly perform
this operation:
var_dump((int) '1 elePHPant');
Executing the code shown above will print the output shown below:
int(1)
PHP 7.1 and later still automatically casts strings that begin
with a number to an integer when an operator such a +
is used, but it triggers a notice when this happens:
var_dump('1 elePHPant' + '1 elePHPant');
Executing the code shown above with 7.1 or later will print the output shown below:
Notice: A non well formed numeric value encountered in ...
Notice: A non well formed numeric value encountered in ...
int(2)
Prior to PHP 7.1, when a string does not begin with a number then
PHP would silently perform an operation such as +
using
0
as the “value” of the string:
var_dump(1 + 'string');
Executing the code shown above with PHP 5 or PHP 7.0 will print the output shown below:
(int) 1
What happens here is that PHP automatically casts the string to
an integer and the result is 0
. We get the same result
when we explicitly perform this operation:
var_dump((int) 'string');
Executing the code shown above will print the output shown below:
int(0)
PHP 7.1 and later trigger an E_WARNING
when an
operator such a +
is used with non-numeric string
operands:
var_dump(1 + 'string');
Executing the code shown above with 7.1 or later will print the output shown below:
Warning: A non-numeric value encountered in ...
int(1)
The introduction of these E_NOTICE
and
E_WARNING
errors may create backwards-incompatibility
issues when custom error handlers are used. However, at least the
E_WARNING
error brings to light code errors that should
be fixed.
To make code that uses the operators listed above on strings work correctly with PHP 7.1 and later you need to ensure, especially when you are dealing with values that come from a request variable, for instance, that a string can safely be interpreted as an integer, for instance.
Unfortunately, this is not as easy as using the built-in
is_int()
function as this returns false
for a string such as '123'
. The example below shows how
a function that performs such a check could look like:
function string_is_integer($variable)
{
return (string) (int) $variable === (string) $variable;
}
var_dump(string_is_integer('abc'));
var_dump(string_is_integer('123'));
var_dump(string_is_integer(123));
Executing the code shown above will print the output shown below:
bool(false)
bool(true)
bool(true)
Case Mapping and Folding
in mbstring
Changing the case of a string using strtoupper()
and
strtolower()
simply replaces lower case character with
their upper case counterpart, and vice versa. In real life, however,
changing case is not always that simple. In German, for example,
there exists a special character ß
(a ligature of s and
z, thus the corresponding named HTML entity is called ß).
ß
is a lower case character which used to have no upper
case correspondence. Coincidentally, an upper case ß
has been made official in 2017, so the given example is kind of
outdated, but even asking a German teacher we could not come up with
a better example.
var_dump(mb_strtoupper('straße'));
When converting the lower case ß
to upper case, PHP
7.3 and above will replace ß
with SS
:
string(7) "STRASSE"
This is considered a backward incompatible change because the string length changes, which did not happen until PHP 7.2.
If you want more control over the case conversion, use the
function mb_convert_case()
, which allows you to specify
a conversion mode. The following modes are available:
MB_CASE_UPPER
, MB_CASE_LOWER
,
MB_CASE_TITLE
, MB_CASE_FOLD
,
MB_CASE_UPPER_SIMPLE
,
MB_CASE_LOWER_SIMPLE
,
MB_CASE_TITLE_SIMPLE
,
MB_CASE_FOLD_SIMPLE
.
Named Captures in
mb_ereg_replace()
A few years ago, somebody said that regular expressions “look like you threw a cat onto your keyboard”. While regular expressions can be quite useful, we feel that things can get out of control pretty quickly with regards to their readability. Granted, current IDEs offer great tooling that helps us understand and test regular expressions, but
The mb_ereg_replace()
function now allows you to
assign a name to a group of characters (so-called “named captures”),
which can make it a little easier to reference a pattern in the
replacement string, or make the whole regular expression yet more
difficult to read – you decide.
This addition can be considered a backward incompatible change because some patterns might be interpreted differently now, potentially leading to undesired results.