Twitter stats using Ruby
Posted by Alpha Wed, 02 Jan 2008 17:40:00 GMT
I saw Damon Cortesi’s Twitter Stats script last night, and decided to make a Ruby version. This was before he released his code, so it’s reverse-engineered rather than ported. I’ll take a look later tonight to see how much the logic differs.
Edit: This code is rather inelegant, and I’ve replaced the clunky CSV files with an Sqlite3 database. You can find the new and improved scripts here. The following should still work, and I’m leaving it here for posterity’s sake.
tweet.rb
First up, I wrote a quick Tweet class to actually get all of my tweets.
require 'hpricot' require 'open-uri' class Tweet def initialize(user) @user_url = "http://twitter.com/#{user}" @doc = Hpricot(open(@user_url)) @page = 1 @tweets = [current_tweet] @tweets += page_to_tweets end def current_tweet tweet,time = @doc/'div.desc'/'p' tweet = tweet.inner_html time = DateTime.parse(time.at('abbr')['title']) [tweet, time] end def page_to_tweets (@doc/'div.tab'/'tr.hentry').map do |tweet| tweet,time = tweet/'span' tweet = tweet.inner_html.gsub(/^\s*(.*)\s*$/, '\1') time = DateTime.parse(time.at('abbr')['title']) [tweet, time] end end def older? (@doc/'div.tab'/'div.pagination'/'a').last.inner_text =~ /Older/ end def succ if @tweets.empty? return nil unless older? @page += 1 @doc = Hpricot(open("#{@user_url}?page=#{@page}")) @tweets = page_to_tweets end @tweets.shift end end
download_to_csv.rb
Next, a quick script to download the tweets into a CSV file. This is actually a bit over-engineered, as it’ll only download tweets that have not been previously downloaded. Note that this takes the username as a command line argument.
#!/usr/bin/env ruby require 'fastercsv' require 'tweet' base_path = File.dirname(__FILE__) csv_files = Dir["#{base_path}/*.csv"].sort_by do |filename| DateTime.parse(File.basename(filename, '.csv')) end last_update = DateTime.parse(File.basename(csv_files.last, '.csv')) unless csv_files.empty? tweets = Tweet.new(ARGV.shift) current_update_time = tweets.current_tweet.last if last_update.nil? or current_update_time > last_update FasterCSV.open(File.join(base_path, "#{current_update_time.to_s}.csv"), 'w') do |csv| while t = tweets.succ tweet,time = t break if last_update and time <= last_update csv << [tweet, time.to_s] end end end
generate_graphs.rb
And last, creating the graphs of the statistics from the CSV files.
#!/usr/bin/env ruby require 'fastercsv' require 'gchart' require 'tweet' base_path = File.dirname(__FILE__) year = 2007 month_data = Array.new(12, 0) hour_data = Array.new(24, 0) reply_data = Hash.new(0) Dir["#{base_path}/*.csv"].each do |filename| FasterCSV.foreach(filename) do |row| tweet = row.first time = DateTime.parse(row.last) month_data[time.month - 1] += 1 if time.year == year hour_data[(time.hour-8)%24] += 1 if time.year == year reply_data[$1] += 1 if tweet =~ /@<a href="\/([^"]+)">\1<\/a>/ and time.year == year end end puts GChart.line( :title => 'Tweets per Hour', :data => hour_data, :width => 400, :height => 300, :extras => { 'chxt' => 'x,y', 'chxl' => "0:|#{(0..23).to_a.join('|')}|1:|#{hour_data.min}|#{hour_data.max}" } ).to_url puts GChart.bar( :title => 'Tweets per Month', :data => month_data, :width => 400, :height => 300, :extras => { 'chxt' => 'x,y', 'chxl' => "0:|#{Date::ABBR_MONTHNAMES.compact.join('|')}|1:|#{month_data.min}|#{month_data.max}" }, :orientation => :vertical ).to_url
Sweet! I can’t wait to check this out.
(Now I’m curious if my logic was accurate – any idea if the graphs match?)
Damon: I don’t know, but I’ll compare the graphs after I convert my scripts to use an Sqlite3 db instead of CSV files tonight.
WOW!! you made my day… Thank you so much.