Select one of the problems from below that you will enjoy working on. Ideally perform your analysis in an ipython notebook. Post the notebook on Github and submit your results.
Trajectories of Taxis
This dataset contains the trajectories of thousands of taxis operating in China. Your task is to read through the following paper and produce the first graphs (distribution of distances and sampling time interval).
Next, please pick a trajectory for a particular trip and determine its smoothed trajectory (using Kalman filter for example or splines)
Airline On-Time Arrivals
Use the US Dept. of Transportation on-time arrival data for non-stop domestic flights by major air carriers to predict arrival delays.
Build a binary classification model for predicting arrival delays or a regression model that predicts the extent of the delay. Do not use departure delay as an input feature.
Global Terrorist Attacks
Global Terrorism Database (GTD) is an open-source database including information on terrorist events around the world from 1970 through 2014. Some portion of the attacks have not been attributed to a particular terrorist group.
Use attack type, weapons used, description of the attack, etc. to build a model that can predict what group may have been responsible for an incident.