XML/JSON Comparison

Written by Morten Winkler Jørgensen
2016/10/03

Recently I needed to implement a client application that would fetch data from a web service. The client had to be written in C++ using the Qt 5.4 libraries, the server was already running Apache2 with php 5.*something* but the encoding of the actual data was free for me to chose.

“That’s easy!”, I thought. “I will transport it as JSON and…” but then I started to think about it. Why would JSON be the obvious answer? Perhaps XML would be a better choice? Perhaps not. Since I had my doubts, I decided to take a scientific approach: I would decide on a metric and conduct some controlled experiments, and select the technology that had the best metric. To save you the tension, it did in fact turn out to be JSON that won for this particular case. The rest of this post will describe the metrics, methodology, present the data, draw a conclusion and offer the entire codebase for download.

Methodology

The data in question would be a long list of flat objects without any relations; quite similar to a directory listing. I therefore wrote a server script that would return a list of “files” over http encoded in three different ways: As json, as flat attribute based xml and as xml with data as nested child nodes:

The JSON would therefore look like this:

				
					[
{"name": "File_1.extension", "mtime": "YYYY-MM-DDTHH:MM:SS TZ", ... },
{"name": "File_2.extension", "mtime": "YYYY-MM-DDTHH:MM:SS TZ", ... }
...
]

The flat xml like this:

				
					<files>
  <file name="File_1.extension" mtime="YYYY-MM-DDTHH:MM:SS TZ" .../>
  <file name="File_2.extension" mtime="YYYY-MM-DDTHH:MM:SS TZ" .../>
  ...
</files>

While the nested xml would look like this:

				
					<files>
  <file>
    <name>File_1.extension</name>
    <mtime>YYYY-MM-DDTHH:MM:SS TZ</mtime>
    ...
  </file>
  <file>
    <name>File_2.extension</name>
    <mtime>YYYY-MM-DDTHH:MM:SS TZ</mtime>
    ...
  </file> 
    ...
  ...
</files>

The actual appearance of the data would of course depend on the encoder chosen.
For each request, the server would be told, by GET parameters, how many files to put in the list and what encoding to use. The server script would return the encoded data and a measurement of how long it took the data to be generated. The encoding duration was send back as a http response header.

Upon receiving the data, the client would register the size of the response, how long it took in total to download the data, how long it took to encode the data and how long it took to decode the data back to a QList of objects.

This approach would give me four things to measure:

The time it took to encode the data.
The size of the encoded data.
The length of the http request.
The time it took to decode it.

Although I expected the duration of the http request to be derived directly from the size of the encoded data and choice of decoder, I would still measure it.

On the server side for encoders, I chose php’s builtin json encoding and php’s simplexml xml generator while on the client side, I would use Qt 5’s json library, a 3rd party json library and Qt 5’s xml libraries for decoders. For decoding xml I would benchmark both the QXmlSimpleReader with friends, the now deprecated DOM classes and the stream readers that are supposed to be the new black in XML parsing with Qt. The code for both client and server side can be found at github.

The complete testmatrix therefore looked like

Table 1: The six test runs listed as a match between encoders and decoders.

Testruns		Server Side
Testruns		Json	Xml w. attributes	Xml w. child nodes
client	Qt’s Json	1
	Flavos’s JSon	2
	Qt’s Sax		3	4
	Qt’s DOM		5	6
	Qt’s Stream		7	8

Metrics

As mentioned earlier, there would be four test point for each test: Encoding time, $E$ , Transfer time $T$ , data size, $S$ and decoding time $D$ . I would normalize the data against test run 1 and weigh the data points with the following weights:

$E$	40%	Because server computational power is expensive to me as a server owner.
$T$	20%	Because data transfer in my case would be done asynchronously in the background and a delay in user experience is acceptable. This is not a low latency application.
$S$	35%	Because outgoing data are paid by byte and even if I would compress it on the fly, uncompressed size matters.
$D$	5%	Because computational power on the client side is cheap for me and this since is a low latency application a delay is perfectly fine.

By doing so, I sould end up with one number, $i n d e x$ , per test to rank the technologies after where lower would mean better and

i n d e x = 0.4 N (E) + 0.2 N (T) + 0.35 N (S) + 0.05 N (D)

Result

I ran the experments using the code found at github and I got the data in table 2 and figure 1:

Table 2: Data averaged from 2100 files per request and 100 requests.

Test	$E$ ( $μ s$ )	$σ$	$T$ ( $μ s$ )	$σ$	$S$ (bytes)	$σ$	$D$ ( $μ s$ )	$σ$
qt5-json	3064	1031	11120	0	225691	0	12290	3345
Flavio’s json	3130	1154	11300	0	225691	0	43070	10788
qt5-sax-attributes	24178	6110	35920	0	223624	0	26360	7694
qt5-dom-attributes	24066	5916	36696	0	223624	0	39274	8378
qt5-stream-attributes	23559	5964	35110	0	223624	0	21598	6301
qt5-sax-childnodes	32678	5342	46800	0	278224	0	37650	7199
qt5-dom-childnodes	32359	5142	45950	0	278224	0	49650	6358
qt5-stream-childnodes	32616	6041	46200	0	278224	0	39600	7038

Normalizing the data (and omiting $σ$ ) makes the data look like table 3 and figure 2:

Table 3: Normalized test results.

Test	$E$ ( $μ s$ )	$T$ ( $μ s$ )	$S$ (bytes)	$D$ ( $μ s$ )
qt5-json	1.00	1.00	1.00	1.00
Flavio’s json	1.02	1.02	1.00	3.50
qt5-sax-attributes	7.89	3.23	0.99	2.14
qt5-dom-attributes	7.86	3.30	0.99	3.20
qt5-stream-attributes	7.69	3.16	0.99	1.76
qt5-sax-childnodes	10.67	4.21	1.23	3.06
qt5-dom-childnodes	10.56	4.13	1.23	4.04
qt5-stream-childnodes	10.65	4.15	1.23	3.22

Applying the weights chosen, gives the following result:

Table 4: Sorted index calculated based on the weights.

Test	$N (E)$	$W$	$N (T)$	$W$	$N (S)$	$W$	$N (D)$	$W$	Index
qt5-json	1.00	0.40	1.00	0.20	1.00	0.35	1.00	0.05	1.00
Flavio’s json	1.02	0.40	1.02	0.20	1.00	0.35	3.50	0.05	1.14
qt5-sax-attributes	7.89	0.40	3.23	0.20	0.99	0.35	2.14	0.05	4.26
qt5-stream-attributes	7.69	0.40	3.16	0.20	0.99	0.35	1.76	0.05	4.14
qt5-dom-attributes	7.86	0.40	3.30	0.20	0.99	0.35	3.20	0.05	4.31
qt5-dom-childnodes	10.56	0.40	4.13	0.20	1.23	0.35	4.04	0.05	5.68
qt5-stream-childnodes	10.65	0.40	4.15	0.20	1.23	0.35	3.22	0.05	5.68
qt5-sax-childnodes	10.67	0.40	4.21	0.20	1.23	0.35	3.06	0.05	5.69

Discussion

While the data are quite clear, there are several issues that may impact the result and performance. Some of those are:

Error Checking: No error checking was implemented in parsing the XML on the client side. Implementing error checking may change the result, but I beleive it will only emphasize the result, making XML slower.
Sanity Checking: The decoded data were not checked for sanity. This was on purpose as I beleived such a sanity check would apply to all technologies and just offset the result. However, I may be wrong on that.
Isolated System: The tests were performed on my normal laptop while I used it for my normal activities: Youtube, xkdc.org and casual coding. This is most likely why there are small variations in the encoding time which was supposed to be the same for across the encoders. Running the tests on an isolated and dedicated system might provide more uniform results. Most likely the standard derivation will decrease as well.

These issues aside, I still beleive the data are solid and represents an objective evaluation of what I was looking for.

Conclusion

As a coinsidence, it turned out that the over all best set of encoder and decoder was my initial choice: PHP’s builtin json encoder and Qt 5’s builtin json decoder. Not only was it best on the three most important metrics, but it was also best on the least important one (Ok. Let’s call it a draw on $S$ ). That was a nice bonus.

So, my first instinct was right, but now I KNOW it IS right; Therefore, Json it is.

All code, data, gnuplot commands, and graphs can be found at github.

XML/JSON Comparison

Methodology

Metrics

Result

Discussion

Conclusion

Sitemap

Services

About us

Customer stories

Hydrema

Adelbert Haas

QuadSAT

Leica Geosystems

Poken

VertusTech

Datazone

ESA – European Space Agency

DEIF

Blog