Written by Morten Winkler Jørgensen
2016/10/03
Recently I needed to implement a client application that would fetch data from a web service. The client had to be written in C++ using the Qt 5.4 libraries, the server was already running Apache2 with php 5.*something* but the encoding of the actual data was free for me to chose.
“That’s easy!”, I thought. “I will transport it as JSON and…” but then I started to think about it. Why would JSON be the obvious answer? Perhaps XML would be a better choice? Perhaps not. Since I had my doubts, I decided to take a scientific approach: I would decide on a metric and conduct some controlled experiments, and select the technology that had the best metric. To save you the tension, it did in fact turn out to be JSON that won for this particular case. The rest of this post will describe the metrics, methodology, present the data, draw a conclusion and offer the entire codebase for download.
Methodology
The data in question would be a long list of flat objects without any relations; quite similar to a directory listing. I therefore wrote a server script that would return a list of “files” over http encoded in three different ways: As json, as flat attribute based xml and as xml with data as nested child nodes:
The JSON would therefore look like this:
[
{"name": "File_1.extension", "mtime": "YYYY-MM-DDTHH:MM:SS TZ", ... },
{"name": "File_2.extension", "mtime": "YYYY-MM-DDTHH:MM:SS TZ", ... }
...
]
The flat xml like this:
...
While the nested xml would look like this:
File_1.extension
YYYY-MM-DDTHH:MM:SS TZ
...
File_2.extension
YYYY-MM-DDTHH:MM:SS TZ
...
...
...
The actual appearance of the data would of course depend on the encoder chosen.
For each request, the server would be told, by GET parameters, how many files to put in the list and what encoding to use. The server script would return the encoded data and a measurement of how long it took the data to be generated. The encoding duration was send back as a http response header.
Upon receiving the data, the client would register the size of the response, how long it took in total to download the data, how long it took to encode the data and how long it took to decode the data back to a QList of objects.
This approach would give me four things to measure:
- The time it took to encode the data.
- The size of the encoded data.
- The length of the http request.
- The time it took to decode it.
Although I expected the duration of the http request to be derived directly from the size of the encoded data and choice of decoder, I would still measure it.
On the server side for encoders, I chose php’s builtin json encoding and php’s simplexml xml generator while on the client side, I would use Qt 5’s json library, a 3rd party json library and Qt 5’s xml libraries for decoders. For decoding xml I would benchmark both the QXmlSimpleReader with friends, the now deprecated DOM classes and the stream readers that are supposed to be the new black in XML parsing with Qt. The code for both client and server side can be found at github.
The complete testmatrix therefore looked like
Testruns | Server Side | |||
Json | Xml w. attributes | Xml w. child nodes | ||
client | Qt’s Json | 1 | ||
Flavos’s JSon | 2 | |||
Qt’s Sax | 3 | 4 | ||
Qt’s DOM | 5 | 6 | ||
Qt’s Stream | 7 | 8 |
Metrics
As mentioned earlier, there would be four test point for each test: Encoding time, EE, Transfer time TT, data size, SS and decoding time DD. I would normalize the data against test run 1 and weigh the data points with the following weights:
EE | 40% | Because server computational power is expensive to me as a server owner. |
TT | 20% | Because data transfer in my case would be done asynchronously in the background and a delay in user experience is acceptable. This is not a low latency application. |
SS | 35% | Because outgoing data are paid by byte and even if I would compress it on the fly, uncompressed size matters. |
DD | 5% | Because computational power on the client side is cheap for me and this since is a low latency application a delay is perfectly fine. |
By doing so, I sould end up with one number, indexindex, per test to rank the technologies after where lower would mean better and
Result
I ran the experments using the code found at github and I got the data in table 2 and figure 1:
Test | EE (μsμs) | σσ | TT (μsμs) | σσ | SS (bytes) | σσ | DD (μsμs) | σσ |
qt5-json | 3064 | 1031 | 11120 | 0 | 225691 | 0 | 12290 | 3345 |
Flavio’s json | 3130 | 1154 | 11300 | 0 | 225691 | 0 | 43070 | 10788 |
qt5-sax-attributes | 24178 | 6110 | 35920 | 0 | 223624 | 0 | 26360 | 7694 |
qt5-dom-attributes | 24066 | 5916 | 36696 | 0 | 223624 | 0 | 39274 | 8378 |
qt5-stream-attributes | 23559 | 5964 | 35110 | 0 | 223624 | 0 | 21598 | 6301 |
qt5-sax-childnodes | 32678 | 5342 | 46800 | 0 | 278224 | 0 | 37650 | 7199 |
qt5-dom-childnodes | 32359 | 5142 | 45950 | 0 | 278224 | 0 | 49650 | 6358 |
qt5-stream-childnodes | 32616 | 6041 | 46200 | 0 | 278224 | 0 | 39600 | 7038 |
Normalizing the data (and omiting σσ) makes the data look like table 3 and figure 2:
Test | EE (μsμs) | TT (μsμs) | SS (bytes) | DD (μsμs) |
qt5-json | 1.00 | 1.00 | 1.00 | 1.00 |
Flavio’s json | 1.02 | 1.02 | 1.00 | 3.50 |
qt5-sax-attributes | 7.89 | 3.23 | 0.99 | 2.14 |
qt5-dom-attributes | 7.86 | 3.30 | 0.99 | 3.20 |
qt5-stream-attributes | 7.69 | 3.16 | 0.99 | 1.76 |
qt5-sax-childnodes | 10.67 | 4.21 | 1.23 | 3.06 |
qt5-dom-childnodes | 10.56 | 4.13 | 1.23 | 4.04 |
qt5-stream-childnodes | 10.65 | 4.15 | 1.23 | 3.22 |
Applying the weights chosen, gives the following result:
Test | N(E)N(E) | WW | N(T)N(T) | WW | N(S)N(S) | WW | N(D)N(D) | WW | Index |
qt5-json | 1.00 | 0.40 | 1.00 | 0.20 | 1.00 | 0.35 | 1.00 | 0.05 | 1.00 |
Flavio’s json | 1.02 | 0.40 | 1.02 | 0.20 | 1.00 | 0.35 | 3.50 | 0.05 | 1.14 |
qt5-sax-attributes | 7.89 | 0.40 | 3.23 | 0.20 | 0.99 | 0.35 | 2.14 | 0.05 | 4.26 |
qt5-stream-attributes | 7.69 | 0.40 | 3.16 | 0.20 | 0.99 | 0.35 | 1.76 | 0.05 | 4.14 |
qt5-dom-attributes | 7.86 | 0.40 | 3.30 | 0.20 | 0.99 | 0.35 | 3.20 | 0.05 | 4.31 |
qt5-dom-childnodes | 10.56 | 0.40 | 4.13 | 0.20 | 1.23 | 0.35 | 4.04 | 0.05 | 5.68 |
qt5-stream-childnodes | 10.65 | 0.40 | 4.15 | 0.20 | 1.23 | 0.35 | 3.22 | 0.05 | 5.68 |
qt5-sax-childnodes | 10.67 | 0.40 | 4.21 | 0.20 | 1.23 | 0.35 | 3.06 | 0.05 | 5.69 |
Discussion
While the data are quite clear, there are several issues that may impact the result and performance. Some of those are:
- Error Checking
- No error checking was implemented in parsing the XML on the client side. Implementing error checking may change the result, but I beleive it will only emphasize the result, making XML slower.
- Sanity Checking
- The decoded data were not checked for sanity. This was on purpose as I beleived such a sanity check would apply to all technologies and just offset the result. However, I may be wrong on that.
- Isolated System
- The tests were performed on my normal laptop while I used it for my normal activities: Youtube, xkdc.org and casual coding. This is most likely why there are small variations in the encoding time which was supposed to be the same for across the encoders. Running the tests on an isolated and dedicated system might provide more uniform results. Most likely the standard derivation will decrease as well.
These issues aside, I still beleive the data are solid and represents an objective evaluation of what I was looking for.
Conclusion
As a coinsidence, it turned out that the over all best set of encoder and decoder was my initial choice: PHP’s builtin json encoder and Qt 5’s builtin json decoder. Not only was it best on the three most important metrics, but it was also best on the least important one (Ok. Let’s call it a draw on SS). That was a nice bonus.
So, my first instinct was right, but now I KNOW it IS right; Therefore, Json it is.
All code, data, gnuplot commands, and graphs can be found at github.