Skip to content

Python JSON or C++ JSON Parsing

October 27, 2012
tags: , ,

At Gnip, we parse about half a billion JSON activities from our firehoses of social media every day. Until recently, I believed that the time I would save parsing social activities with C++ command line tool would more than justify additional time it takes to develop in C++. This turns out to be wrong.

Comparing the native JSON parser in Python2.7 and the UltraJSON parser to a C++ implementation linked to jsoncpp indicates that UltraJSON is by far the best choice, achieveing about twice the parsing rate of C++ for Gnip’s normalized JSON Activity Stream format. UltraJSON parsed Twitter activities at near 20MB/second.

 

Plot of elapsed time to parse increasingly large JSON files.  (Lower numbers are better.)

Additional details, scripts, data and code is available on github.

Advertisement
No comments yet

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: