We require fellows to work on a small challenge problem to assess problem solving and coding capabilities. Select a problem from the list below. Ideally perform your analysis in a jupyter notebook. Post the notebook on Github and submit your results.
Some hints for hacking our challenge:
- Ask yourself why would they have selected this problem for the challenge? What are some gotchas in this domain I should know about?
- What is the highest level of accuracy that others have achieved with this dataset or similar problems / datasets ?
- What types of visualizations will help me grasp the nature of the problem / data?
- What feature engineering might help improve the signal?
- Which modeling techniques are good at capturing the types of relationships I see in this data?
- Now that I have a model, how can I be sure that I didn't introduce a bug in the code? If results are too good to be true, they probably are!
- What are some of the weakness of the model and and how can the model be improved with additional work?
Select Your Challenge Problem
Global Terrorist Attacks
Global Terrorism Database (GTD) is an open-source database including information on terrorist events around the world from 1970 through 2014. Some portion of the attacks have not been attributed to a particular terrorist group.
Use attack type, weapons used, description of the attack, etc. to build a model that can predict what group may have been responsible for an incident.