Recently I needed to implement a client application that would fetch data from a web service. The client had to be written in C++ using the Qt 5.4 libraries, the server was already running Apache2 with php 5.something but the encoding of the actual data was free for me to chose.

“That’s easy!”, I thought. “I will transport it as JSON and…” but then I started to think about it. Why would JSON be the obvious answer? Perhaps XML would be a better choice? Perhaps not. Since I had my doubts, I decided to take a scientific approach: I would decide on a metric and conduct some controlled experiments, and select the technology that had the best metric. To save you the tension, it did in fact turn out to be JSON that won for this particular case. The rest of this post will describe the metrics, methodology, present the data, draw a conclusion and offer the entire codebase for download.

Methodology

The data in question,would be a long list of flat objects without any relations; quite similar to a directory listing. I therefore wrote a server script that would return a list of “files” over http encoded in three different ways: As json, as flat attribute based xml and as xml with data as nested child nodes:

The json would, thus look like this:

[
{"name": "File_1.extension", "mtime": "YYYY-MM-DDTHH:MM:SS TZ", ... },
{"name": "File_2.extension", "mtime": "YYYY-MM-DDTHH:MM:SS TZ", ... }
...
]

the flat xml like this:

<files>
  <file name="File_1.extension" mtime="YYYY-MM-DDTHH:MM:SS TZ" .../>
  <file name="File_2.extension" mtime="YYYY-MM-DDTHH:MM:SS TZ" .../>
  ...
</files>

while the nested xml would look like this:

<files>
  <file>
    <name>File_1.extension</name>
    <mtime>YYYY-MM-DDTHH:MM:SS TZ</mtime>
    ...
  </file>
  <file>
    <name>File_2.extension</name>
    <mtime>YYYY-MM-DDTHH:MM:SS TZ</mtime>
    ...
  </file> 
    ...
  ...
</files>

The actual appearance of the data would of course depend on the encoder chosen.
For each request, the server would be told, by GET parameters, how many files to put in the list and what encoding to use. The server script would return the encoded data and a measurement of how long it took the data to be generated. The encoding duration was send back as a http response header.

Upon receiving the data, the client would register the size of the response, how long it took in total to download the data, how long it took to encode the data and how long it took to decode the data back to a QList of objects.

This approach would give me four things to measure:

  1. The time it took to encode the data.
  2. The size of the encoded data.
  3. The length of the http request.
  4. The time it took to decode it.

Although I expected the duration of the http request to be derived directly from the size of the encoded data and choice of decoder, I would still measure it.

On the server side for encoders, I chose php’s builtin json encoding and php’s simplexml xml generator while on the client side, I would use Qt 5’s json library, a 3rd party json library and Qt 5’s xml libraries for decoders. For decoding xml I would benchmark both the QXmlSimpleReader with friends, the now deprecated DOM classes and the stream readers that are supposed to be the new black in XML parsing with Qt. The code for both client and server side can be found at github.

The complete testmatrix therefore looked like

Table 1: The six test runs listed as a match between encoders and decoders.
Testruns Server Side
Json Xml w. attributes Xml w. child nodes
client Qt’s Json 1
Flavos’s JSon 2
Qt’s Sax 3 4
Qt’s DOM 5 6
Qt’s Stream 7 8

Metrics

As mentioned earlier, there would be four test point for each test: Encoding time, EE, Transfer time TT, data size, SS and decoding time DD. I would normalize the data against test run 1 and weigh the data points with the following weights:

EE 40% Because server computational power is expensive to me as a server owner.
TT 20% Because data transfer in my case would be done asynchronously in the background and a delay in user experience is acceptable. This is not a low latency application.
SS 35% Because outgoing data are paid by byte and even if I would compress it on the fly, uncompressed size matters.
DD 5% Because computational power on the client side is cheap for me and this since is a low latency application a delay is perfectly fine.

By doing so, I sould end up with one number, indexindex, per test to rank the technologies after where lower would mean better and

index=0.4N(E)+0.2N(T)+0.35N(S)+0.05N(D)index=0.4N(E)+0.2N(T)+0.35N(S)+0.05N(D)

Result

I ran the experments using the code found at github and I got the data in table 2 and figure 1:

Table 2: Data averaged from 2100 files per request and 100 requests.
Test EE (μsμs) σσ TT (μsμs) σσ SS (bytes) σσ DD (μsμs) σσ
qt5-json 3064 1031 11120 0 225691 0 12290 3345
Flavio’s json 3130 1154 11300 0 225691 0 43070 10788
qt5-sax-attributes 24178 6110 35920 0 223624 0 26360 7694
qt5-dom-attributes 24066 5916 36696 0 223624 0 39274 8378
qt5-stream-attributes 23559 5964 35110 0 223624 0 21598 6301
qt5-sax-childnodes 32678 5342 46800 0 278224 0 37650 7199
qt5-dom-childnodes 32359 5142 45950 0 278224 0 49650 6358
qt5-stream-childnodes 32616 6041 46200 0 278224 0 39600 7038

 

Data averaged from 2100 files per request and 100 repetitions.
                         Figure 1: Averaged samples from 2100 files per request and 100 requests.

 

Normalizing the data (and omiting σσ) makes the data look like table 3 and figure 2:

Table 3: Normalized test results.
Test EE (μsμs) TT (μsμs) SS (bytes) DD (μsμs)
qt5-json 1.00 1.00 1.00 1.00
Flavio’s json 1.02 1.02 1.00 3.50
qt5-sax-attributes 7.89 3.23 0.99 2.14
qt5-dom-attributes 7.86 3.30 0.99 3.20
qt5-stream-attributes 7.69 3.16 0.99 1.76
qt5-sax-childnodes 10.67 4.21 1.23 3.06
qt5-dom-childnodes 10.56 4.13 1.23 4.04
qt5-stream-childnodes 10.65 4.15 1.23 3.22

 

Normalized test results.
                                                                    Figure 2: Normalized test results.

 

Applying the weights chosen, gives the following result:

Table 4: Sorted index calculated based on the weights.
Test N(E)N(E) WW N(T)N(T) WW N(S)N(S) WW N(D)N(D) WW Index
qt5-json 1.00 0.40 1.00 0.20 1.00 0.35 1.00 0.05 1.00
Flavio’s json 1.02 0.40 1.02 0.20 1.00 0.35 3.50 0.05 1.14
qt5-sax-attributes 7.89 0.40 3.23 0.20 0.99 0.35 2.14 0.05 4.26
qt5-stream-attributes 7.69 0.40 3.16 0.20 0.99 0.35 1.76 0.05 4.14
qt5-dom-attributes 7.86 0.40 3.30 0.20 0.99 0.35 3.20 0.05 4.31
qt5-dom-childnodes 10.56 0.40 4.13 0.20 1.23 0.35 4.04 0.05 5.68
qt5-stream-childnodes 10.65 0.40 4.15 0.20 1.23 0.35 3.22 0.05 5.68
qt5-sax-childnodes 10.67 0.40 4.21 0.20 1.23 0.35 3.06 0.05 5.69

Discussion

While the data are quite clear, there are several issues that may impact the result and performance. Some of those are:

Error Checking
No error checking was implemented in parsing the XML on the client side. Implementing error checking may change the result, but I beleive it will only emphasize the result, making XML slower.
Sanity Checking
The decoded data were not checked for sanity. This was on purpose as I beleived such a sanity check would apply to all technologies and just offset the result. However, I may be wrong on that.
Isolated System
The tests were performed on my normal laptop while I used it for my normal activities: Youtube, xkdc.org and casual coding. This is most likely why there are small variations in the encoding time which was supposed to be the same for across the encoders. Running the tests on an isolated and dedicated system might provide more uniform results. Most likely the standard derivation will decrease as well.

These issues aside, I still beleive the data are solid and represents an objective evaluation of what I was looking for.

Conclusion

As a coinsidence, it turned out that the over all best set of encoder and decoder was my initial choice: PHP’s builtin json encoder and Qt 5’s builtin json decoder. Not only was it best on the three most important metrics, but it was also best on the least important one (Ok. Let’s call it a draw on SS). That was a nice bonus.

So, my first instinct was right, but now I KNOW it IS right; Therefore, Json it is.

All code, data, gnuplot commands, and graphs can be found at github.