Custom Object Serialization

A new mechanism for custom object serialization has been implemented. Before we can discuss __serialize() and __unserialize(), though, we should recap the existing mechanisms it aims to replace.

When an object is serialized, either explicitly using the serialize() function or implicitly through session management, then all its properties are serialized:

class C
{
    private string $x = 'a';
    private int $y = 1;
}

print serialize(new C);

Executing the code shown above will print the output shown below:

O:1:"C":2:{s:4:"Cx";s:1:"a";s:4:"Cy";i:1;}

With a trained eye it is easy to spot that the string shown above encodes information about an object of the C class ("C") and its properties $x ("Cx") and $y ("Cy"). $x contains a string of length 1 with value "a" (s:1:"a") and $y contains an integer with value 1 (i:1).

The default behaviour explained above is usually sufficient. However, sometime you need to customize how an object is serialized. Resources such as database connections or file handles cannot be serialized, for instance, so you may want to exclude properties that contain a resource from the serialization of an object.

The oldest mechanism for customizing the serialization of objects is that of __sleep() and __wakeup(). When an object has a __sleep() method then it will automatically be called when the object is serialized. __sleep() returns an array with the names of the properties that should be considered for serialization:

class C
{
    private string $x = 'a';
    private int $y = 1;

    public function __sleep(): array
    {
        return ['x', 'y'];
    }

    public function __wakeup(): void
    {
        // ...
    }
}

print serialize(new C);

Executing the code shown above will print the output shown below:

O:1:"C":2:{s:4:"Cx";s:1:"a";s:4:"Cy";i:1;}

It should come as no surprise that this output is exactly the same we got before as __sleep() returns the names of both properties. To make it clear how __sleep() works, let us change our example so that only $x is considered for serialization:

class C
{
    private string $x = 'a';
    private int $y = 1;

    public function __sleep(): array
    {
        return ['x'];
    }

    public function __wakeup(): void
    {
        // ...
    }
}

print serialize(new C);

Executing the code shown above will print the output shown below:

O:1:"C":1:{s:4:"Cx";s:1:"a";}

It should go without saying that when you customize object serialization you need to make sure that you always end up with a valid object during unserialization.

__sleep() can only be used to exclude properties from serialization and is not convenient to work with when the serialized representation should be significantly different from object’s in-memory representation. This is a shortcoming that the Serializable interface, which was introduced in PHP 5.1, was supposed to solve:

interface Serializable
{
    public function serialize(): string;
    public function unserialize(string $serialized): void;
}

Classes that implement the Serializable interface no longer support __sleep() and __wakeup(), meaning these methods will no longer be automatically called during serialization and unserialization.

When an object that implements the Serializable interface is to be serialized, the object’s serialize() method will be called automatically. serialize() must then return a string, and unserialize() must be able to reconstruct the object’s state from such a string:

class C implements Serializable
{
    private string $x = 'a';
    private int $y = 1;

    public function serialize(): string
    {
        return serialize([$this->x, $this->y]);
    }

    public function unserialize($serialized): void
    {
        [$this->x, $this->y] = unserialize($serialized);
    }
}

print serialize(new C);

Executing the code shown above will print the output shown below:

C:1:"C":26:{a:2:{i:0;s:1:"a";i:1;i:1;}}

Custom object serialization using the Serializable interface is, unfortunately, fundamentally and generally broken, for instance when both a child class and its parent class implement the same interface. The implementation of serialize() cannot just delegate to the parent’s implementation because that will just return a string representation of the parent’s state. This way multiple references to the same object in the object graph cannot be properly detected. It is outside the scope of this book to cover all the intricacies of the design shortcomings and implementation flaws of Serializable. These are explained in detail in the RFC that lead to __serialize() and __unserialize() which comprise the new mechanism for custom object serialization that we are now ready to discuss.

__serialize() is automatically called when an object is serialized and must return an array representation of the object’s state that should be serialized. Analogously, __unserialize() is automatically called when an object is to be unserialized. The array representation returned by __serialize() is passed as an argument to __unserialize():

class C
{
    private string $x = 'a';
    private int $y = 1;

    public function __serialize(): array
    {
        return ['x' => $this->x, 'y' => $this->y];
    }

    public function __unserialize(array $data): void
    {
        $this->x = $data['x'];
        $this->y = $data['y'];
    }
}

print serialize(new C);

Executing the code shown above will print the output shown below:

O:1:"C":2:{s:1:"x";s:1:"a";s:1:"y";i:1;}

This overcomes the limitations of __sleep() and __wakeup() without suffering from the same problems as Serializable. Let us hope that the old mechanisms for custom object serialization will be deprecated and subsequently removed at some point in the future. Having three mechanisms for solving the same problem, one that is the best practice and two legacy ones, is confusing for PHP developers and makes PHP itself harder to maintain than it has to be.