JSON serialization caching
Development | Darjan Bogdan

JSON serialization caching

Wednesday, Oct 7, 2015 • 7 min read
Even the fastest Json serializers are simply not fast enough for the task at hand? Have no fear, there is a way to enhance the overall performance of the serialization process.

Every now and then you will get a requirement in which you will have to satisfy quite optimistic API performance criteria. In most of the cases, the standard Json serializers will be fast enough. However, there are scenarios in which you will need to meet extremely low response times, while serving rather complex and large object collections and, to make things more interesting, do some computations on them. If that’s the case, you would have to minimize the serialization time as much as possible.

Introduction

In this post, I will not introduce yet another super-fast serializer. I also promise not to rely on quick and dirty techniques to enhance the existing ones. I will introduce a pretty simple mechanism which will allow you to optimize Json serialization output. It is important to point out that the proposed solution will be made on top of the Json.NET serializer which is already one of the fastest serializers out there. The technique presented here should also be applicable to all other serializers.

As the title indicates, I will introduce a caching infrastructre for serialized objects. What does that mean? Basically, when a serializer serializes an object, its result will be stored (cached) into the memory. This will ensure that every subsequent serialization of that particular object won’t need to go through the serialization process again; a cached result will be used instead. You could easily say, “but I always serialize new objects created due to the database fetch (via ORM tool or similar)” and you will be right. Unfortunately, if an endpoint always returns newly created objects, you won’t have any benefit from this technique. However, there are still a lot of scenarios in which it shows a great potential.

I am pretty sure a lot of you use some sort of in-memory caching provider, or you simply store collections or objects somewhere in memory, and you retrieve them upon request. If that is the case, you will be able to implement a serialization caching solution.

Implementation

First thing you will need to implement is a base class which will be inherited by each object which needs to be serialized.

public abstract class CachingSerializable
{
    private string serializedObject;
    private Action<JsonWriter, JsonSerializer> GetSerializedObject;

    public CachingSerializable()
    {
        GetSerializedObject = SerializeInit;
    }

    public void Serialize(JsonWriter writer, JsonSerializer serializer)
    {
        GetSerializedObject(writer, serializer);
    }

    private void SerializeInit(JsonWriter writer, JsonSerializer serializer)
    {
        if (serializedObject == null)
        {
            using (StringWriter stringWriter = new StringWriter())
            {
                serializer.Serialize(stringWriter, this);
                serializedObject = stringWriter.ToString();
            }
        }

        GetSerializedObject = (wr, sr) =>
        {
            wr.WriteRawValue(serializedObject);
        };

        GetSerializedObject(writer, serializer);
    }
}

As you can see, CachingSerializable is a base class which will store (cache) the result of the object serialization. In order to enable the caching of an object, you will need to inherit this class and implement your own class which may represent a part of your business model (e.g. public class Foo : CachingSerializable {}).

We also need to implement a custom Json converter, which inherits Json.Net’s JsonConverter class.

public class CachingConverter : JsonConverter
{
    private JsonSerializer defaultSerializer;

    public CachingConverter(JsonSerializer serializer)
    {
        this.defaultSerializer = serializer;
    }

    public override bool CanConvert(Type objectType)
    {
        return true;
    }

    public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
    {
        return serializer.Deserialize(reader, objectType);
    }

    public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
    {
        var cacheValue = (CachingSerializable)value;
        cacheValue.Serialize(writer, this.defaultSerializer);
    }
}

Custom converter will hold a reference to the default serializer your system uses, and it will only change the behavior of the WriteJson method. On top of that, objects that are converted via CachingConverter must be typeof(CachingSerializable). Now you probably get a better understanding why objects need to inherit the CachingSerializable abstract class. As you can see, this class has a Serialize method which serializes objects that implement it. Serialize method essentially serializes object via default serializer and stores the result to a private field.

To put all the pieces together, you need to implement Json.Net’s custom resolver. There is no point going into details about what is contract resolver, since you certainly could get a broader view by reading the official documentation. The implementation itself is straightforward, you need to inherit the DefaultContractResolver class and override the CreateContract method.

public class CachingContractResolver : DefaultContractResolver
{
    protected override JsonContract CreateContract(Type objectType)
    {
        JsonContract contract = base.CreateContract(objectType);

        if (typeof(CachingSerializable).IsAssignableFrom(objectType))
        {
            //Create default serializer with default contract resolver and converters
            var serializer = JsonSerializer.Create(CachingSettings.Default);
            //Add custom converter to the serializer
            contract.Converter = new CachingConverter(serializer);
        }

        return contract;
    }
}

Inside the overriden method, we firstly need to get the base contract of the object’s type which is going to be serialized. Secondly, we are doing Type check of the object type passed to us. If CachingSerializable is assignable from the object type, we need to switch the default converter with the already created custom CachingConverter. The side effect of implementing our custom converter is that every object which inherits CachingSerializable will use that converter when it gets serialized. Every other object type which doesn’t inherit CachingSerializable will use Json.Net’s default converter.

The very last thing you need to do is to set the ContractResolver property to the newly created custom CachingContractResolver in your global JsonSerializer.

In other words, for every business model object or any DTO which inherits CachingSerializable, the serializer will use a custom CachingConverter which overrides the Write method. Furthermore, the serializer will call the overriden Write method that consequently calls Serialize method of the object that is going to be serialized. As I already mentioned, the Write method will use the default serializer (with a default converter) and the result of the serialization will be stored inside object’s private field. As a consequence of the result caching, every other serialization of that very same object won’t be serialized again - a cached serialization result will be used instead.

Now that you have a complete overview of the benefits of this caching mechanism, you may wonder what’s the performance benefit.

Example and Test Results

In order to test this technique appropriately, I’ve build a fake weather forecast system which consists of a large number of weather stations. Each weather station has a collection of weather readings. Each weather reading has information received from multiple sensors. A business model is slightly larger, but you could get a complete business model overview in my Github repository.

The system works this way: every 2.5 seconds each weather station will receive up to 5 new or updated (corrected) weather readings. Let’s assume we need extremely precise readings, because data correctness does matter, so we have to correct the existing readings if needed. Also, let’s say that the system needs to broadcast a complete collection of weather readings to weather stations in real time upon arrival of each new or updated reading.

Please take into account that this is a test example, and I won’t go into details on how the system should be reorganized or changed in order to implement a production-ready service. A base requirement says that the system needs to update weather station data and return from the API endpoint within 2.5 seconds.

Last thing we need to mention is that our server works with in-memory collection of immutable objects (weather stations). When server receives a new object (weather reading) we need to perform some computation and add it to the collection (of weather readings). Upon receiving the updated object (weather reading), we need to remove the existing one and add the updated object to the collection.

Due to the restrictions on the core business logic modifications, we implement the serialization caching because the server stores collections of immutable objects - that’s enough to meet the serialization caching implementation requirements. At the same time, we won’t modify core business logic at all, which is a great benefit of proposed implementation.

Tests

Test scenario has been set up in a way that each weather station receives up to 5 new or updated readings every 2.5 seconds, for the total of 20 iterations or measurements. To be able to get the real insight into the overall performance, it is important to mention that the test results are made on three different weather station collection sizes.

In the first test, we have created a collection of 100 weather stations, prepopulated with 50 readings. 20 measurements were made. After each measurement, a weather stations' collection gets updated. In the Fig. 1. you can see the difference between the default and cached serialization. An interesting thing you can notice is that the average serialization time (of the default serialization) will linearly increase as the collection size increases. At the same time, a caching serialization technique will maintain close to constant average times, no matter what the the collection size is.

Serialization performance comparison - 100 weather stations Figure 1. Serialization performance comparison - 100 weather stations

If you pay attention to the first measurement, you can see that it was done on the smallest number of objects but the serialization time is the highest. This behavior is actually pretty common and expected, but it heavily depends on the serialization library’s implementation. Since tests are made on top of Json.Net’s library, which uses reflection to be able to perform automatic serialization of different types, its authors had to implement code optimizations and enhancements in order to circumvent the performance issue of the reflection itself. As you can see by the difference between first two measurements, they did a pretty good job. As a consequence of code optimizations and probably some sort of reflection results caching, first serialization will always last longer than the rest.

On top of that, a graph has a few spikes, which look strange. After some time spent on the subject, we realized that the spikes are caused by the .Net’s garbage collector which kicked in and caused a little serialization time discrepancy.

In the next two figures, you could see the performance results made on a 1000 and 10000 weather stations, with a 50000 and 500000 weather readings, respectively.

Serialization performance comparison - 1000 weather stations Figure 2. Serialization performance comparison - 1000 weather stations

Serialization performance comparison - 10000 weather stations Figure 3. Serialization performance comparison - 10000 weather stations

Both in the Fig. 2. and Fig. 3. you will notice that the GC won’t cause tests discrepancy anymore, primarily due to the serialization duration. In addition, the linear increase of duration related to the collection size is more noticeable.

In conclusion, the performance benefit of serialization caching mechanism is quite obvious. As I have already pointed out, there are some constraints related to the proposed implementation. If you are not able to implement the serialization caching mechanism inside your project, but at the same time you have performance issues, I advise you to go through the performance tips section inside Json.Net’s documentation, you will certainly find great tips and tricks there.

Fully functional serialization caching implementation, as well as a complete test model (and results) can be found at my Github page.