ACA Public Sentiment Project
Proposal
Christopher G. Healey
We will collect and analyze recent social network discussions about the Affordable Care Act (a.k.a. ObamaCare). Specifically, we will collect tweets from Twitter, a social network that allows users to post short text messages of up to 140 characters. We will apply topic clustering and sentiment analysis to the tweets, then interpret the results to provide a summary of the current major topics related to the ACA and their associated sentiment.
We will use Twitter's real-time streaming API to collect tweets from Twitter that contain the keywords:
We will use the TweetCapture program provided to us to connect to Twitter's real-time stream (the firehose) and collect tweets by keyword. Based on a check of Twitter's recent tweet activity, we anticipate being able to collect approximately 24,000 tweets per day (see the Data Source Justification section below for more details), or about 150,000 tweets over a 1-week period. Again based on recent tweet activity, we can observe topics like:
We will perform topic clustering on the tweets, to identify major topics of discussion. We will then perform sentiment estimation on each major topic, to determine a general sentiment (specifically, a positive, neutral, or negative pleasure) for the topic's tweets.
We anticipate a number of challenges we will need to overcome as part of our project.
RT @mr_prez What
r u talkin 'bout, ur ACA sounds bo-gus! :'(
>:O http://bit.ly/1eYmVWG
.RT @fuma ur ACA idea is teh sh*te!!!!
#urbandictionary
.In spite of the fact that the ACA was passed in March 2010, public sentiment continuing to polarize around the Act and its provisions. The upcoming midterm elections in November 2014 have provided an opportunity for both supporters and opponents to re-energize arguments for and against the Act (1, 2). In addition, a number of legal challenges to the Act are working their way through the lower courts (1, 2, 3, 4), with an expectation that the conflicting decisions will be referred to the Supreme Court in the near future.
Given current interest in the ACA, and the differing opinions on the pros and cons of the Act, we believe a sufficient number of tweets, with appropriate sentiment and topic variability, will be available through Twitter. Preliminary investigation indicates an available rate of approximately 1000 tweets/hour, with a wide range of comments and opinions embedded in the tweets we previewed. Based on these findings, we feel confident we can collect the raw data needed to support our goals and analysis plan for this project.
We will provide the following deliverables at the end of the project.