Saving a Twitter Timeline to Pandas for Analysis

in #programming6 years ago

My application for a Twitter developer account was approved, and so I wrote my first program using the Twitter API today. It uses the twython library to retrieve a particular user's timeline and saves the timestamps, text, and like/retweet counts to a Pandas dataframe.

A few notes:

  • I hate that Twitter doesn't use ISO 8601 timestamps, unlike the Steem API. They look like YYYY-MM-DDTHH:MM:SS so you can just sort and compare them as strings. Nor does it use any of the other perfectly good standards, it looks like "Thu Apr 06 15:28:43 +0000 2017" so the entire first page of "Twitter API date format" results in Google is "how the heck do I parse this in my favorite programming language." The result I got from StackOverflow uses the email date parser.
  • I also hate that it seems standard these days to make it impossible to avoid overlap in REST APIs. You can query for a start point or and end point, but they are inclusive. Is there a good design reason I'm missing here?
  • The twitter API docs are very clear that you're getting retweets whether or not you wanted them, so you'd better include include_rts=1 so your code doesn't break at a future point when some hapless intern fixes the bug.
#!/usr/bin/python3

from twython import Twython
import json
import pprint
import pandas
from datetime import datetime, timedelta
from email.utils import parsedate_tz

with open( "secret.json", "r" ) as f:
    secret = json.load( f )

if "access" in secret:
    twitter = Twython( secret['key'], access_token=secret['access'] )
else:
    twitter = Twython( secret['key'], secret['secret'], oauth_version=2 )
    access_token = twitter.obtain_access_token()
    print( "access_token", access_token )

# Source: https://stackoverflow.com/questions/7703865/going-from-twitter-date-to-python-datetime-date
def timestamp_to_datetime( ts ):
    time_tuple = parsedate_tz( ts.strip() )
    dt = datetime( *time_tuple[:6] )
    return dt - timedelta( seconds=time_tuple[-1] )
    
tweets = {}
lastTime = datetime.now()
endTime = lastTime - timedelta( days = 365 )
lastId = None
screen_name = "NextRoguelike"
keys = [ 'id', 'created_at', 'text', 'retweet_count', 'favorite_count' ]

while endTime < lastTime:
    # API returns in reverse timeline order, starting with max_id,
    # so it will be duplicated.
    if lastId is None:
        timeline = twitter.get_user_timeline( screen_name=screen_name, count=100,
                                              include_rts=1 )
    else:
        timeline = twitter.get_user_timeline( screen_name=screen_name, count=100,
                                              include_rts=1, max_id = lastId )

    print( len( timeline ), "responses" )
    
    # FIXME: won't work for some account that only tweeted once :)
    if len( timeline ) <= 1:
        break

    for t in timeline:
        lastId = t['id']
        lastTime = timestamp_to_datetime( t['created_at'] )
        tweets[ lastId ] = [ t[k] for k in keys ]

df = pandas.DataFrame.from_dict( tweets, orient = 'index', columns = keys )
df.to_pickle( screen_name + "-tweets.pkl" )

https://gist.github.com/mgritter/9ece2b8f1d7b3cdebe385b9737958a94

Sort:  

Hello! Your post has been resteemed and upvoted by @ilovecoding because we love coding! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!

Reply !stop to disable the comment. Thanks!

Hi Mark. Interesting account you have here!

I have a question on this:

My application for a Twitter developer account was approved

Did you feel like they couldn't approve it for some reason? I thought to do it, never did, but if they go into some sort of screening it makes me anxious! Have I to justify why I want the developer account and what I'm going to do with it?

Yes, Twitter has an application form to fill out that asks for a description of what you plan to do with the API. It took about 20 days for them to review and approve it.

Coin Marketplace

STEEM 0.15
TRX 0.17
JST 0.028
BTC 69075.42
ETH 2475.71
USDT 1.00
SBD 2.35