Acquiring Twitter Data

R. Alexander Miłowski

milowski@ischool.berkeley.edu

School of Information, UC Berkeley

An Exemplar for Data Acquisition

We will use Twitter as an example to:

Anatomy of a Long-Lived Data Acquisition Process

Often has the following properties:

Twitter API

A Web API that provides:

  1. REST API — metadata/information about users, resources, etc.
  2. REST API — post tweets, etc.
  3. REST API — historical search over tweet data
  4. Streaming API — real-time retrieval of tweet data

#3 and #4 are most interesting for data acquisition.

Application Setup

You need a twitter account and then:

  1. Visit https://apps.twitter.com
  2. Click on the Create New App button.
  3. Fill in the Name, Description, and Website fields.
  4. Agree to the terms and click on the Create your Twitter application button.
  5. Click on the API keys tab.
  6. Click on the Create my access token button.
  7. Wait, reload, wait, reload ... until the access token shows up.

You are now dangerous!

Tweepy - Twitter in Python

Note: See online API documentation.

import tweepy

consumer_key = "..."         # Your API Key
consumer_secret = "..."      # Your API Secret

access_token = "..."         # Your Access Token
access_token_secret = "..."  # Your Access Token Secret

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

for tweet in api.search(q="minecraft"):
   print tweet.text
         

Live demo

Scaling?

Time for some live demos of a more complete solution...