Custom Object Serialization
A new mechanism for custom object serialization has been
implemented. Before we can discuss __serialize()
and
__unserialize()
, though, we should recap the existing
mechanisms it aims to replace.
When an object is serialized, either explicitly using the
serialize()
function or implicitly through session
management, then all its properties are serialized:
class C
{
private string $x = 'a';
private int $y = 1;
}
print serialize(new C);
Executing the code shown above will print the output shown below:
O:1:"C":2:{s:4:"Cx";s:1:"a";s:4:"Cy";i:1;}
With a trained eye it is easy to spot that the string shown above
encodes information about an object of the C
class
("C"
) and its properties $x
("Cx"
) and $y
("Cy"
).
$x
contains a string of length 1 with value
"a"
(s:1:"a"
) and $y
contains
an integer with value 1
(i:1
).
The default behaviour explained above is usually sufficient. However, sometime you need to customize how an object is serialized. Resources such as database connections or file handles cannot be serialized, for instance, so you may want to exclude properties that contain a resource from the serialization of an object.
The oldest mechanism for customizing the serialization of objects
is that of __sleep()
and __wakeup()
. When
an object has a __sleep()
method then it will
automatically be called when the object is serialized.
__sleep()
returns an array with the names of the
properties that should be considered for serialization:
class C
{
private string $x = 'a';
private int $y = 1;
public function __sleep(): array
{
return ['x', 'y'];
}
public function __wakeup(): void
{
// ...
}
}
print serialize(new C);
Executing the code shown above will print the output shown below:
O:1:"C":2:{s:4:"Cx";s:1:"a";s:4:"Cy";i:1;}
It should come as no surprise that this output is exactly the
same we got before as __sleep()
returns the names of
both properties. To make it clear how __sleep()
works,
let us change our example so that only $x
is considered
for serialization:
class C
{
private string $x = 'a';
private int $y = 1;
public function __sleep(): array
{
return ['x'];
}
public function __wakeup(): void
{
// ...
}
}
print serialize(new C);
Executing the code shown above will print the output shown below:
O:1:"C":1:{s:4:"Cx";s:1:"a";}
It should go without saying that when you customize object serialization you need to make sure that you always end up with a valid object during unserialization.
__sleep()
can only be used to exclude properties
from serialization and is not convenient to work with when the
serialized representation should be significantly different from
object’s in-memory representation. This is a shortcoming that the
Serializable
interface, which was introduced in PHP
5.1, was supposed to solve:
interface Serializable
{
public function serialize(): string;
public function unserialize(string $serialized): void;
}
Classes that implement the Serializable
interface no
longer support __sleep()
and __wakeup()
,
meaning these methods will no longer be automatically called during
serialization and unserialization.
When an object that implements the Serializable
interface is to be serialized, the object’s serialize()
method will be called automatically. serialize()
must
then return a string, and unserialize()
must be able to
reconstruct the object’s state from such a string:
class C implements Serializable
{
private string $x = 'a';
private int $y = 1;
public function serialize(): string
{
return serialize([$this->x, $this->y]);
}
public function unserialize($serialized): void
{
[$this->x, $this->y] = unserialize($serialized);
}
}
print serialize(new C);
Executing the code shown above will print the output shown below:
C:1:"C":26:{a:2:{i:0;s:1:"a";i:1;i:1;}}
Custom object serialization using the Serializable
interface is, unfortunately, fundamentally and generally broken, for
instance when both a child class and its parent class implement the
same interface. The implementation of serialize()
cannot just delegate to the parent’s implementation because that
will just return a string representation of the parent’s state. This
way multiple references to the same object in the object graph
cannot be properly detected. It is outside the scope of this book to
cover all the intricacies of the design shortcomings and
implementation flaws of Serializable
. These are
explained in detail in the RFC
that lead to __serialize()
and
__unserialize()
which comprise the new mechanism for
custom object serialization that we are now ready to discuss.
__serialize()
is automatically called when an object
is serialized and must return an array representation of the
object’s state that should be serialized. Analogously,
__unserialize()
is automatically called when an object
is to be unserialized. The array representation returned by
__serialize()
is passed as an argument to
__unserialize()
:
class C
{
private string $x = 'a';
private int $y = 1;
public function __serialize(): array
{
return ['x' => $this->x, 'y' => $this->y];
}
public function __unserialize(array $data): void
{
$this->x = $data['x'];
$this->y = $data['y'];
}
}
print serialize(new C);
Executing the code shown above will print the output shown below:
O:1:"C":2:{s:1:"x";s:1:"a";s:1:"y";i:1;}
This overcomes the limitations of __sleep()
and
__wakeup()
without suffering from the same problems as
Serializable
. Let us hope that the old mechanisms for
custom object serialization will be deprecated and subsequently
removed at some point in the future. Having three mechanisms for
solving the same problem, one that is the best practice and two
legacy ones, is confusing for PHP developers and makes PHP itself
harder to maintain than it has to be.