Data Deserialization

Computer data is generally organized in data structures such as arrays, records, graphs, classes, or other configurations for efficiency. When data structures need to be stored or transmitted to another location, such as across a network, they need to go through a process called serialization. This process converts and changes the data organization into a linear format that is needed for storage or transmission across computing devices.

Using Java as an example platform for serialization, an object of type Address would logically have members of street, city, state, and postal code as shown in the diagram below.


Once serialized, this data is converted into a linear data format (such as the XML text form in the diagram) representing the Address object.

The deserialization process from the linear data is the reverse, and causes the Address object to be instantiated in memory as shown in this diagram:


What is Serialization Used For?

The process of serialization has been used as far back as the 1980s in the Courier RPC application protocol of the Xerox Network Systems protocol suite. Serialization is needed for data storage and data transportation between nodes on a network because with computers, data is stored and transported linearly. Serialization is also used to pass information between computers for remote method invocation where a procedure/function found on a remote machine can be invoked as if it is on the local computer. Using serialization, an object can be transferred across domains through firewalls, as well as be used for different languages and platforms.

The formats of serialized objects are standardized so as to be able to be read by different platforms, if needed. Some of the platforms that support serialization processes include python, perl, php, ruby, and Java. The Microsoft .Net platform also supports serialization functions by utilizing the XMLSerializer and DataContractSerializer classes, as well as the more powerful but more vulnerable BinaryFormatter and NetDataContractSerializer classes. XML, YAML, and JSON are amongst the most popularly used formats of serialized data.

What are the Vulnerabilities?

Vulnerabilities in the serialization processes have been discovered as far back as 2003, such as the jsscript.c thaw function vulnerability with Mozilla rev 3.2. Since then, a variety of deserialization bugs have been discovered. Mitre lists 37 CVEs related to deserialization, with the Apache Commons Collections deserialization bug discovered in 2015 garnering the greatest amount of attention.

A security exploit against a vulnerable serialization process involves an attacker injecting malicious data into the serialized data, such that the injection would instantiate into malicious code upon deserialization. Before deserialization occurs on serialized data, an attacker could exploit this process by inserting malicious data in the linear form of the data as illustrated below:

deserialization exploit

Once this exploited data is then instantiated in computer memory through deserialization, the malicious code of the attacker would be in a usable form to perform further malicious activities commanded by the attacker:

What is the Impact of These Vulnerabilities?

The serialization vulnerability is of high impact as it could allow an attacker to perform remote code execution and remotely administer the victim machine. The surface for this attack is increased by web accessible servers running software that is vulnerable to deserialization exploits. The most widely used software that was vulnerable to deserialization exploits, prior to being patched, is the Apache Commons Collections library.

The Apache Commons Collections library allows serialization and is used in 70 other software libraries. As long as the Apache Commons Collections library is referenced in the class-path of any library, an attacker can use that library to call the deserialization method of the Commons Collections library and instantiate their malicious code on the machine. Apache has since released patches to disable deserialization of unsafe data for the Commons Collections library.

MS-ISAC Recommendations

Best practices to protect against deserialization vulnerability exploits include the following measures:

  • Apply all the latest patches after appropriate testing and keep your software up-to-date. Information on the most recent Apache Commons Collection patch is available in MS-ISAC Cybersecurity Advisory 2015-152: 
  • Adhere to the principle of least privilege by minimizing or disabling access to administrative privileges to reduce impact of exploit.
  • When developing software, minimize usage of deserialization by reducing unnecessary data transfers across applications/systems and reducing the amount of files written to disk. Also consider developing your own format for data transfer if needed, to reduce probability of misuse of data transfer functionality by attackers. Developing a new format would be useful such that attackers won’t easily know which location in the serialized data to insert their code for successful attacks.
  • Follow a secure development lifecycle alongside your software development lifecycle.

Recommendations for securing against Apache Commons Collections deserialization exploits:

  • Search for jar files that contain the class InvokerTransformer.class and remove that class from the jar file or delete the jar file after backup to make sure your programs work properly after deletion of the file or class. If you don’t have that class in your system, then you aren’t vulnerable to the Apache Commons Collections attack.
  • Search network traffic for hex data “AC ED 00 05” to identify hosts that are doing serialization and block that traffic if needed. This would block remote Commons Collections deserialization attacks to your host.
  • For developing applications that require serialization, the readObject() method can be overloaded to support safety-checking during deserialization.