New
Functions mb_chr()
and mb_ord()
The PHP function chr()
creates a string with an
ASCII character of the given value. 65
is the capital
A
, for example:
var_dump(chr(65));
string(1) "A"
The ord()
function converts a single string
character back to its numeric ASCII value:
var_dump(ord('A'));
int(65)
Both functions, just like all other string-related PHP functions,
do not work correctly on multibyte strings like UTF-8. Let us try a
German special character, the ß
:
$char = 'ß';
var_dump($char);
var_dump(ord($char));
As mentioned elsewhere in this book, ß
is a
non-ASCII character that is represented by two bytes, as the first
var_dump()
shows:
string(2) "ß"
int(195)
The string length is two, but the result of ord()
is
just one byte (a value between 0 and 255). Does not really make
sense, does it? That is because PHP just looks at the first byte of
the string, because it does not know about multibyte strings. You
can also try
var_dump(chr(ord('ß')));
to see PHP fail.
To remedy this shortcoming, the multibyte equivalents of
ord()
and chr()
, namely
mb_ord()
and mb_chr()
have been introduced
as part of the mbstring
extension. Both require the
encoding to be passed as second parameter, but since UTF-8 is
assumed by default, we can also leave the second parameter out:
var_dump(mb_ord('ß'));
var_dump(mb_chr(mb_ord('ß')));
The result makes far more sense now:
int(223)
string(2) "ß"
And while we are at it:
var_dump('ß');
var_dump(mb_strlen('ß'));
This will result in:
string(2) "ß"
int(1)
Here you can directly see the difference between counting bytes and counting code points.