R, e.g.: New Years Eve Tweets
May 5, 2012
First, simple plotting with ggplot.
New Year’s eve is an interesting time to look at twitter volumes because the event moves from time zone to time zone over a 24 hour period. Below are hourly NY-related tweets by UTC.
The three major humps in volume correspond to the Americas, Europe and Asia.
Not too surprisingly, comparing 2010 and 2011 shows twitter usage growing in most timezones. In the 2010 vs. 2011 scatter plot there are three outliers for growth.
(Basic skill to add here is to exclude the outliers from the linear regression in some R-ish way, rather than manually. Use this to estimate volumes based on the other points and output the difference.)
#!/usr/bin/env Rscript # # Plot time series volume data # install.packages("ggplot2", dependencies = TRUE ) # install.packages("gridExtra", dependencies = TRUE ) library(ggplot2) library(gridExtra) X <-read.delim("./twitter.timeline.csv.byhour.csv", sep=",", header=TRUE) # parse datetime strings X$date <- as.POSIXct(X$time) summary(X) # ts is unix style timestamp sub0 <- X[ (X$ts > 1293814803 & X$ts < 1293901203) , ] sub1 <- X[ (X$ts > 1325350803 & X$ts < 1325437203) , ] summary(sub0) summary(sub1) p0 <- qplot(sub0$date, sub0$count, geom="bar", stat="identity", xlab="2010: 12/31 - 1/1 GMT", ylab="tweets/hr") + scale_y_continuous(limits = c(0, 1100000)) p1 <- qplot(sub1$date, sub1$count, geom="bar", stat="identity", xlab="2011: 12/31 - 1/1 GMT", ylab="tweets/hr") + scale_y_continuous(limits = c(0, 1100000)) png(filename = "./NYTweets-World-2011.png", width = 600, height = 600, units = 'px') print( p1 ) dev.off() png(filename = "./NYTweets-World-2011v2010.png", width = 600, height = 600, units = 'px') print( grid.arrange(p0, p1, ncol = 1, main="Compare 2010 vs 2011") ) dev.off() png(filename = "./NYTweets-Growth-2011v2010.png", width = 600, height = 600, units = 'px') print( qplot(sub0$count, sub1$count, xlab="2010 tweets/hr", ylab="2011 tweets/hr") + geom_smooth(method=lm,se=FALSE) + scale_y_continuous(limits = c(0, 1100000)) + scale_x_continuous(limits = c(0, 1100000)) ) dev.off()
No comments yet