XML/JSON Comparison

JSON or XML
Written by Morten Winkler Jørgensen
2016/10/03

Recently I needed to implement a client application that would fetch data from a web service. The client had to be written in C++ using the Qt 5.4 libraries, the server was already running Apache2 with php 5.*something* but the encoding of the actual data was free for me to chose. 

“That’s easy!”, I thought. “I will transport it as JSON and…” but then I started to think about it. Why would JSON be the obvious answer? Perhaps XML would be a better choice? Perhaps not. Since I had my doubts, I decided to take a scientific approach: I would decide on a metric and conduct some controlled experiments, and select the technology that had the best metric. To save you the tension, it did in fact turn out to be JSON that won for this particular case. The rest of this post will describe the metrics, methodology, present the data, draw a conclusion and offer the entire codebase for download.

Methodology

The data in question would be a long list of flat objects without any relations; quite similar to a directory listing. I therefore wrote a server script that would return a list of “files” over http encoded in three different ways: As json, as flat attribute based xml and as xml with data as nested child nodes:

The JSON would therefore look like this:

				
					[
{"name": "File_1.extension", "mtime": "YYYY-MM-DDTHH:MM:SS TZ", ... },
{"name": "File_2.extension", "mtime": "YYYY-MM-DDTHH:MM:SS TZ", ... }
...
]
				
			

The flat xml like this:

				
					<files>
  <file name="File_1.extension" mtime="YYYY-MM-DDTHH:MM:SS TZ" .../>
  <file name="File_2.extension" mtime="YYYY-MM-DDTHH:MM:SS TZ" .../>
  ...
</files>
				
			

While the nested xml would look like this:

				
					<files>
  <file>
    <name>File_1.extension</name>
    <mtime>YYYY-MM-DDTHH:MM:SS TZ</mtime>
    ...
  </file>
  <file>
    <name>File_2.extension</name>
    <mtime>YYYY-MM-DDTHH:MM:SS TZ</mtime>
    ...
  </file> 
    ...
  ...
</files>
				
			

The actual appearance of the data would of course depend on the encoder chosen.
For each request, the server would be told, by GET parameters, how many files to put in the list and what encoding to use. The server script would return the encoded data and a measurement of how long it took the data to be generated. The encoding duration was send back as a http response header.

Upon receiving the data, the client would register the size of the response, how long it took in total to download the data, how long it took to encode the data and how long it took to decode the data back to a QList of objects.

This approach would give me four things to measure:

  1. The time it took to encode the data.
  2. The size of the encoded data.
  3. The length of the http request.
  4. The time it took to decode it.

Although I expected the duration of the http request to be derived directly from the size of the encoded data and choice of decoder, I would still measure it.

On the server side for encoders, I chose php’s builtin json encoding and php’s simplexml xml generator while on the client side, I would use Qt 5’s json library, a 3rd party json library and Qt 5’s xml libraries for decoders. For decoding xml I would benchmark both the QXmlSimpleReader with friends, the now deprecated DOM classes and the stream readers that are supposed to be the new black in XML parsing with Qt. The code for both client and server side can be found at github.

The complete testmatrix therefore looked like

Table 1: The six test runs listed as a match between encoders and decoders.
TestrunsServer Side
JsonXml w. attributesXml w. child nodes
clientQt’s Json1  
Flavos’s JSon2  
Qt’s Sax 34
Qt’s DOM 56
Qt’s Stream 78

Metrics

As mentioned earlier, there would be four test point for each test: Encoding time, EE, Transfer time TT, data size, SS and decoding time DD. I would normalize the data against test run 1 and weigh the data points with the following weights:

EE40%Because server computational power is expensive to me as a server owner.
TT20%Because data transfer in my case would be done asynchronously in the background and a delay in user experience is acceptable. This is not a low latency application.
SS35%Because outgoing data are paid by byte and even if I would compress it on the fly, uncompressed size matters.
DD5%Because computational power on the client side is cheap for me and this since is a low latency application a delay is perfectly fine.

By doing so, I sould end up with one number, indexindex, per test to rank the technologies after where lower would mean better and

index=0.4N(E)+0.2N(T)+0.35N(S)+0.05N(D)index=0.4N(E)+0.2N(T)+0.35N(S)+0.05N(D)

Result

I ran the experments using the code found at github and I got the data in table 2 and figure 1:

Table 2: Data averaged from 2100 files per request and 100 requests.
TestEE (μsμs)σσTT (μsμs)σσSS (bytes)σσDD (μsμs)σσ
qt5-json306410311112002256910122903345
Flavio’s json3130115411300022569104307010788
qt5-sax-attributes2417861103592002236240263607694
qt5-dom-attributes2406659163669602236240392748378
qt5-stream-attributes2355959643511002236240215986301
qt5-sax-childnodes3267853424680002782240376507199
qt5-dom-childnodes3235951424595002782240496506358
qt5-stream-childnodes3261660414620002782240396007038

 

Normalizing the data (and omiting σσ) makes the data look like table 3 and figure 2:

Table 3: Normalized test results.
TestEE (μsμs)TT (μsμs)SS (bytes)DD (μsμs)
qt5-json1.001.001.001.00
Flavio’s json1.021.021.003.50
qt5-sax-attributes7.893.230.992.14
qt5-dom-attributes7.863.300.993.20
qt5-stream-attributes7.693.160.991.76
qt5-sax-childnodes10.674.211.233.06
qt5-dom-childnodes10.564.131.234.04
qt5-stream-childnodes10.654.151.233.22

 

Applying the weights chosen, gives the following result:

Table 4: Sorted index calculated based on the weights.
TestN(E)N(E)WWN(T)N(T)WWN(S)N(S)WWN(D)N(D)WWIndex
qt5-json1.000.401.000.201.000.351.000.051.00
Flavio’s json1.020.401.020.201.000.353.500.051.14
qt5-sax-attributes7.890.403.230.200.990.352.140.054.26
qt5-stream-attributes7.690.403.160.200.990.351.760.054.14
qt5-dom-attributes7.860.403.300.200.990.353.200.054.31
qt5-dom-childnodes10.560.404.130.201.230.354.040.055.68
qt5-stream-childnodes10.650.404.150.201.230.353.220.055.68
qt5-sax-childnodes10.670.404.210.201.230.353.060.055.69

Discussion

While the data are quite clear, there are several issues that may impact the result and performance. Some of those are:

Error Checking
No error checking was implemented in parsing the XML on the client side. Implementing error checking may change the result, but I beleive it will only emphasize the result, making XML slower.
Sanity Checking
The decoded data were not checked for sanity. This was on purpose as I beleived such a sanity check would apply to all technologies and just offset the result. However, I may be wrong on that.
Isolated System
The tests were performed on my normal laptop while I used it for my normal activities: Youtubexkdc.org and casual coding. This is most likely why there are small variations in the encoding time which was supposed to be the same for across the encoders. Running the tests on an isolated and dedicated system might provide more uniform results. Most likely the standard derivation will decrease as well.

These issues aside, I still beleive the data are solid and represents an objective evaluation of what I was looking for.

Conclusion

As a coinsidence, it turned out that the over all best set of encoder and decoder was my initial choice: PHP’s builtin json encoder and Qt 5’s builtin json decoder. Not only was it best on the three most important metrics, but it was also best on the least important one (Ok. Let’s call it a draw on SS). That was a nice bonus.

So, my first instinct was right, but now I KNOW it IS right; Therefore, Json it is.

All code, data, gnuplot commands, and graphs can be found at github.

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments