Python JSON or C++ JSON Parsing
At Gnip, we parse about half a billion JSON activities from our firehoses of social media every day. Until recently, I believed that the time I would save parsing social activities with C++ command line tool would more than justify additional time it takes to develop in C++. This turns out to be wrong.
Comparing the native JSON parser in Python2.7 and the UltraJSON parser to a C++ implementation linked to jsoncpp indicates that UltraJSON is by far the best choice, achieveing about twice the parsing rate of C++ for Gnip’s normalized JSON Activity Stream format. UltraJSON parsed Twitter activities at near 20MB/second.
Plot of elapsed time to parse increasingly large JSON files. (Lower numbers are better.)
Additional details, scripts, data and code is available on github.