Software Engineering
java versioning storage serialization file-storage
Updated Wed, 27 Jul 2022 16:01:30 GMT

Storing object-graphs with class-evolution in Java with transformation (long time archiving)


A common problem is to store objects (with graph), and to load them back. This is easy as long the stored object representation matches the executing code. But as time goes by, requirements change and stored objects do not match any longer the code. The stored objects should not loose their data, and the code in the clients should work with the latest object-models.

So a transformation has to occur somehow between loading the data and returning a object to the client.

I know that there exist some libraries such as XStream, gson, protobuf and avro. They could load older objects, but afaik just ignore data that does not match any longer the fields in the class (maybe I missed something).
(When I'm talking about storing and serialization I do not mean Java's built in serialization mechanism.)


So what is the question? I'm now researching for some time, and there seems no library that adresses this issue (class evolution) without data-loss. I hope to find another working solution or idea how to implement it by myself here.

I have some requirements:

  • File-based - I want to be able to store the serialized object on disk
  • Appendable - I want to append multiple objects to one file without loading the whole file in memory again and again
  • Support for multiple versions in one file - A file could contain objects with different version (only of the same type)
  • Transformation - Data should be accessible by using the same type, even when changed in between.
  • Generic - The mechanism itself has to be generic, so I could use it for different objects (different objects do not get mixed in one file, only different versions of one type).

It would be nice if the storing format is human-readable.

Already stored objects are not updateable (at least not without huge effort). Think of an long-term archive.


I could give an example for better illustration. Let's suppose we have two Pojos that we want to serialize.

public class MyPojo {
    String text;
    Long number;
    Integer[] values;
    SubPojo pojo;
public class SubPojo {
    List<String> items;

In the next version we might have renamed a field (text->content), changed a type (Integer[]->List), and have transformed a field to List (SubPojo->List) where the previous field now is the first element of the new list (not loosing data, just transforming to new representation).

public class MyPojo {
    String content;
    Long number;
    List<Integer> values;
    List<SubPojo> pojo;
public class SubPojo {
    List<String> items;

Some pseudocode how the client might take use of this:

// Write
Serializer ser = new Serializer();
MyPojo pojo = new MyPojo(); = ...; // set fields, file, append);
// Read (a version later)
Serializer ser = new Serializer();
ser.registerTransformer(new TransformerV1ToV2());
List<MyPojo> pojos = ser.load(file);

This approach has some drawbacks:

  • The transformer have to work on some kind of intermediate format (could be the backed stored format such as json or xml)
  • You don't know how the class format was at some point in time, since you're only transforming relative to the previous version and mapping to the final class, making search for errors hard
  • Performance (depending on how the transformation happens)


I decided to use the transformation approach. JSON is used as intermediate-format, this way I can ensure the correctness using Json Schema between versions (optional).

Conceptional you can think in two ways if you like: As object-mapper, or as json-format that is expressed in java-objects.

I open-sourced my solution at

The documentation will be extended in the next time. I also added support for custom-serializer/deserializer and polymorph types.

External Links

External links referenced by this document: