Data Collection and Exploratory Data Analysis
Introduction
This project replicates professional data analysis on a topic of current interest (influenza), and extending the analysis to include another public data source (twitter). I have replicated the data analysis graph and heatmap by CDC on real flu data. Further, I have collected 20000 tweets regarding the keyword "flu", "influenza" and "fever" using Twitter Restful API, filter out the retweets which serve as the other public data source and grouped tweets according to the geolocation to represent the outbreak of influenza using heatmap. Also, built a responsive web app using Shiny in R to publish the results. The flu data used in this project corresponds to the United States of America.
Shiny App
Data Analysis using the data from CDC
Data Analysis on Twitter Data
- Keyword - Flu
- Keyword - Fever:
- Comparison with CDC heatmap: