Saving a Twitter Timeline to Pandas for Analysis
My application for a Twitter developer account was approved, and so I wrote my first program using the Twitter API today. It uses the twython library to retrieve a particular user's timeline and saves the timestamps, text, and like/retweet counts to a Pandas dataframe.
A few notes:
- I hate that Twitter doesn't use ISO 8601 timestamps, unlike the Steem API. They look like YYYY-MM-DDTHH:MM:SS so you can just sort and compare them as strings. Nor does it use any of the other perfectly good standards, it looks like "Thu Apr 06 15:28:43 +0000 2017" so the entire first page of "Twitter API date format" results in Google is "how the heck do I parse this in my favorite programming language." The result I got from StackOverflow uses the email date parser.
- I also hate that it seems standard these days to make it impossible to avoid overlap in REST APIs. You can query for a start point or and end point, but they are inclusive. Is there a good design reason I'm missing here?
- The twitter API docs are very clear that you're getting retweets whether or not you wanted them, so you'd better include
include_rts=1
so your code doesn't break at a future point when some hapless intern fixes the bug.
#!/usr/bin/python3
from twython import Twython
import json
import pprint
import pandas
from datetime import datetime, timedelta
from email.utils import parsedate_tz
with open( "secret.json", "r" ) as f:
secret = json.load( f )
if "access" in secret:
twitter = Twython( secret['key'], access_token=secret['access'] )
else:
twitter = Twython( secret['key'], secret['secret'], oauth_version=2 )
access_token = twitter.obtain_access_token()
print( "access_token", access_token )
# Source: https://stackoverflow.com/questions/7703865/going-from-twitter-date-to-python-datetime-date
def timestamp_to_datetime( ts ):
time_tuple = parsedate_tz( ts.strip() )
dt = datetime( *time_tuple[:6] )
return dt - timedelta( seconds=time_tuple[-1] )
tweets = {}
lastTime = datetime.now()
endTime = lastTime - timedelta( days = 365 )
lastId = None
screen_name = "NextRoguelike"
keys = [ 'id', 'created_at', 'text', 'retweet_count', 'favorite_count' ]
while endTime < lastTime:
# API returns in reverse timeline order, starting with max_id,
# so it will be duplicated.
if lastId is None:
timeline = twitter.get_user_timeline( screen_name=screen_name, count=100,
include_rts=1 )
else:
timeline = twitter.get_user_timeline( screen_name=screen_name, count=100,
include_rts=1, max_id = lastId )
print( len( timeline ), "responses" )
# FIXME: won't work for some account that only tweeted once :)
if len( timeline ) <= 1:
break
for t in timeline:
lastId = t['id']
lastTime = timestamp_to_datetime( t['created_at'] )
tweets[ lastId ] = [ t[k] for k in keys ]
df = pandas.DataFrame.from_dict( tweets, orient = 'index', columns = keys )
df.to_pickle( screen_name + "-tweets.pkl" )
https://gist.github.com/mgritter/9ece2b8f1d7b3cdebe385b9737958a94
Hello! Your post has been resteemed and upvoted by @ilovecoding because we love coding! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!
Reply !stop to disable the comment. Thanks!
Hi Mark. Interesting account you have here!
I have a question on this:
Did you feel like they couldn't approve it for some reason? I thought to do it, never did, but if they go into some sort of screening it makes me anxious! Have I to justify why I want the developer account and what I'm going to do with it?
Yes, Twitter has an application form to fill out that asks for a description of what you plan to do with the API. It took about 20 days for them to review and approve it.