New Functions mb_chr() and mb

New Functions `mb_chr()` and `mb_ord()`

The PHP function chr() creates a string with an ASCII character of the given value. 65 is the capital A, for example:

var_dump(chr(65));

string(1) "A"

The ord() function converts a single string character back to its numeric ASCII value:

var_dump(ord('A'));

int(65)

Both functions, just like all other string-related PHP functions, do not work correctly on multibyte strings like UTF-8. Let us try a German special character, the ß:

$char = 'ß';

var_dump($char);
var_dump(ord($char));

As mentioned elsewhere in this book, ß is a non-ASCII character that is represented by two bytes, as the first var_dump() shows:

string(2) "ß"
int(195)

The string length is two, but the result of ord() is just one byte (a value between 0 and 255). Does not really make sense, does it? That is because PHP just looks at the first byte of the string, because it does not know about multibyte strings. You can also try

var_dump(chr(ord('ß')));

to see PHP fail.

To remedy this shortcoming, the multibyte equivalents of ord() and chr(), namely mb_ord() and mb_chr() have been introduced as part of the mbstring extension. Both require the encoding to be passed as second parameter, but since UTF-8 is assumed by default, we can also leave the second parameter out:

var_dump(mb_ord('ß'));
var_dump(mb_chr(mb_ord('ß')));

The result makes far more sense now:

int(223)
string(2) "ß"

And while we are at it:

var_dump('ß');
var_dump(mb_strlen('ß'));

This will result in:

string(2) "ß"
int(1)

Here you can directly see the difference between counting bytes and counting code points.

New Functions mb_chr() and mb_ord()

New Functions `mb_chr()` and `mb_ord()`