Serialization

Item85: Prefer alternatives to Java serialization

A fundamental problem with serialization is that its attack surface is too big to protect, and constantly growing: Object graphs are deserialized by invoking the readObject method on an ObjectInputStream. In the process of deserializing a byte stream, this method can execute code from any of these types, so the code for all of these types is part of the attack surface.

The best way to avoid serialization exploits is never to deserialize anything. There is no reason to use Java serialization in any new system you write. There are other mechanisms for translating between objects and byte sequences that avoid many of the dangers of Java serialization, like JSON.

If we can't avoid Java serialization entirely, the next best alternative is to never deserialize untrusted data. If we can't avoid serialization and we aren't absolutely certain of the safety of the data you're deserializing, use the object deserialization filtering added in Java 9 and backported to earlier releases. Accepting classes by default and rejecting a list of potentially dangerous ones is known as blacklisting; rejecting classes by default and accepting a list of those that are presumed safe is known as whitelisting. Prefer whitelisting to blacklisting.

In summary, serialization is dangerous and should be avoided.

Item86: Implement Serializable with great caution

Allowing a class's instances to be serialized can be as simple as adding the words implements Serializable to its declaration. A major cost of implementing Serializable is that it decreases the flexibility to change a class's implementation once it has been released. If we do not make the effort to design a custom serialized form but merely accept the default, the serialized form will forever be tied to the class's original internal representation, which makes the class's private and package-private instance fields become part of its exported API.

A second cost of implementing Serializable is that it increases the likelihood of bugs and security holes. Normally, objects are created with constructors; serialization is an extralinguistic mechanism for creating objects.

A third cost of implementing Serializable is that it increases the testing burden associated with releasing a new version of a class. When a serializable class is revised, it is important to check that it is possible to serialize an instance in the new release and deserialize it in old releases, and vice versa.

Implementing Serializable is not a decision to be undertaken lightly. It is essential if a class is to participate in a framework that relies on Java serialization for object transmission or persistence.

Classes designed for inheritance (Item 19) should rarely implement Serializable, and interfaces should rarely extend it. Classes designed for inheritance that do implement Serializable include Throwable and Component.

There is one caveat regarding the decision not to implement Serializable. If a class designed for inheritance is not serializable, it may require extra effort to write a serializable subclass.

Inner classes (Item 24) should not implement Serializable. The default serialized form of an inner class is ill-defined. A static member class can, however, implement Serializable.

Item87: Consider using a custom serialized form

If a class implements Serializable and uses the default serialized form, we would never be able to escape completely from the throwaway implementation. So do not accept the default serialized form without first considering whether it is appropriate. Generally speaking, we should accept the default serialized form only if it is largely identical to the encoding that we would choose if we were designing a custom serialized form.

The default serialized form of an object is a reasonably efficient encoding of the physical representation of the object graph rooted at the object. It is likely to be appropriate if an object's physical representation is identical to its logical content. Even if we decide that the default serialized form is appropriate, we often must provide a readObject method to ensure invariants and security.

Using the default serialized form when an object's physical representation differs substantially from its logical data content has four disadvantages:

  • It permanently ties the exported API to the current internal representation.
  • It can consume excessive space.
  • It can consume excessive time.
  • It can cause stack overflows.

Whether or not you accept the default serialized form, every instance field that isn't labeled transient will be serialized when the defaultWriteObject method is invoked. Therefore, every instance field that can be declared transient should be. Before deciding to make a field nontransient, convince yourself that its value is part of the logical state of the object. Remember the fields labeled transient would be initialized to their default values when an instance is deserialized.

Whether or not we use the default serialized form, you must impose any synchronization on object serialization that we would impose on any other method that reads the entire state of the object.

private synchronized void writeObject(ObjectOutputStream s) throws IOException {
	s.defaultWriteObject();
}

Regardless of what serialized form we choose, declare an explicit serial version UID in every serializable class we write. It is not required that serial version UIDs be unique. Do not change the serial version UID unless we want to break compatibility with all existing serialized instances of a class.

Item88: Write readObject methods defensively

The problem is that the readObject method is effectively another public constructor, and it demands all of the same care as any other constructor. Just as a constructor must check its arguments for validity (Item 49) and make defensive copies of parameters where appropriate (Item 50), so must a readObject method.

When an object is deserialized, it is critical to defensively copy any field containing an object reference that a client must not possess. Therefore, every serializable immutable class containing private mutable components must defensively copy these components in its readObject method.

There is one other similarity between readObject methods and constructors that applies to nonfinal serializable classes. Like a constructor, a readObject method must not invoke an overridable method, either directly or indirectly (Item 19).

Here, in summary form, are the guidelines for writing a readObject method:

  • For classes with object reference fields that must remain private, defensively copy each object in such a field. Mutable components of immutable classes fall into this category.
  • Check any invariants and throw an InvalidObjectException if a check fails. The checks should follow any defensive copying.
  • If an entire object graph must be validated after it is deserialized, use the ObjectInputValidation interface (not discussed in this book).
  • Do not invoke any overridable methods in the class, directly or indirectly.

Item89: For instance control, prefer enum types to readResolve

Item 3 describes the Singleton pattern and gives the following example of a singleton class:

public class Elvis {
  public static final Elvis INSTANCE = new Elvis();
  private Elvis() { ... }
  
  public void leaveTheBuilding() { ... }
}

This class, however, would no longer be a singleton if the words implements Serializable were added to its declaration. Any readObject method, whether explicit or default, returns a newly created instance, which will not be the same instance that was created at class initialization time.

The readResolve feature allows you to substitute another instance for the one created by readObject. In most uses of this feature, no reference to the newly created object is retained, so it immediately becomes eligible for garbage collection.

private Object readResolve() {
  // Return the one true Elvis and let the garbage collector
  // take care of the Elvis impersonator.
  return INSTANCE;
}

However, if we depend on readResolve for instance control, all instance fields with object reference types must be declared transient. Otherwise, it is possible for a determined attacker to secure a reference to the deserialized object before its readResolve method is run, and then "steal" the deserialized instance, which should be garbage collected.

We can fix this by making the class a single-element enum type. If so, Java guarantees you that there can be no instances besides the declared constants, unless an attacker abuses a privileged method such as AccessibleObject.setAccessible.

The accessibility of readResolve is significant. If we place a readResolve method on a final class, it should be private. If we place a readResolve method on a nonfinal class, we must carefully consider its accessibility.

Item90: Consider serialization proxies instead of serialized instances

The decision to implement Serializable increases the likelihood of bugs and security problems. There is, however, a technique that greatly reduces these risks. This technique is known as the serialization proxy pattern.

  1. Design a private static nested class that concisely represents the logical state of an instance of the enclosing class (serialization proxy).

  2. The nested class should have a single constructor, whose parameter type is the enclosing class. This constructor merely copies the data from its argument.

  3. Both the enclosing class and its serialization proxy must be declared to implement Serializable. By design, the default serialized form of the serialization proxy is the perfect serialized form of the enclosing class. And the writeReplace method translates an instance of the enclosing class to its serialization proxy prior to serialization.

  4. With this writeReplace method in place, the serialization system will never generate a serialized instance of the enclosing class, but an attacker might fabricate one in an attempt to violate the class's invariants. To defend it:

    private void readObject(ObjectInputStream stream) throws InvalidObjectException {
    	throw new InvalidObjectException("Proxy required");
    }
    
  5. For deserialization: provide a readResolve method on the SerializationProxy class that returns a logically equivalent instance of the enclosing class.

The serialization proxy approach is superior to the defensive copying approach:

  • It makes the enclosing classes be truly immutable.
  • It deserves little though.
  • It allows the deserialized instance to have a different class from the originally serialized instance.

The serialization proxy pattern has some limitations:

  • It is not compatible with classes that are extendable by their users.
  • It is not compatible with some classes whose object graphs contain circularities.
  • The added power and safety of the serialization proxy pattern are not free.