This is the final project for P8105 Data Science. Our topic is “Mass Shootings in America (1966-2016)”. There are five group members collaborating on this project. The project repo can be found here. Our final porject report is named “p8105_Final_Report.Rmd” and “p8105_Final_Report.html”.
Our source data was from Stanford’s Mass Shootings in America Project. The data set we used can be found in the “data” folder in our repo.
In the following sections, we will discuss more details about the project.
“THE fifth and last auxiliary right of the subject, that I shall at present mention, is that of having arms for their defense, suitable to their condition and degree, and such as are allowed by law.” Sir William Blackstone’s thoughts about the Laws of England then became the famous Amendment II of the US Constitution. However, the founding fathers probably did not foresee this Freeland now has the highest homicide rate by firearm across the Western Hemisphere. On October 24 of this year, there was a male shot on W 169th St Between Broadway and Ft. Washington. This is the most recent and relevant shooting incident had potential threats to the CUMC community. When such unfortunate disturbance occurs, seeking more information and looking for better protection will be instinctive. This incident was the direct motivation for our project, and we have to admit that the media coverages of more and more mass shooting cases across the States also intensify our interests on this topic.
The initial question we wanted to ask was usually when and why the mass shootings take place. In the Exploratory Analysis which is the main analysis body of this report, you can see we start with analyzing the descriptions of the incidents. We are trying to extract the keywords to find the possible reasons and common characteristics for those events. Detailed analysis will be seen along with the associated plots. As the analysis goes deeper and deeper, we are seeing some interesting trends and surprising results over the course. As the school-related incidents really stand out in the analysis, we are trying to figure out what is the main cause of school mass shootings? As students living on campus, this question is particularly important to us.
As mentioned in the previous section, our dataset comes from the Stanford’s Mass Shootings in America Project. After comparing with other data sources, we chose this one because it is from a reputational academic institute, and it is well maintained. It validates its data against various sources so it is credible. We wrote an email to ask for the permission to use their data in our project and made ourselves clear that we are using this data for class purposes. The data dictionary is attached to our repo and it is well defined.
Although the data is well maintained, we still needed to perform the necessary ETL (Extract Transform Load) process for our use. After reading in the original .xlsx file, 1) corrections were made based on the official correction guidelines, 2) the data was cleaned and tidied for easier analysis, and 3) transformations and variable creations were performed to accompany our analysis purposes. The cleaned data set was written into a .csv file named “Stanford_MSA_Database_for_release_06142016_v2.csv”.
The following plots along with their comments are for exploratory analysis.
In this section, you can see several figures showing the most common words seen in the case descripitons. From these results, we can see the outcome is usually not good (killed) when such shooting incidents occur. However, when we try to get the possible reasons, there seems to be many factors resulting in these tragedies. Furthermore, we analyzed several categories in details such as the gun types, targeted victims, possible motivations, and the mental illness history for the general public.
After seeing the above results, we would like to further investigate the possible covariate relationships presented in the dataset.
Comment: From the plot, we can see that most of the states’ people use handgun. Among the guns which types are known, shorguns were not used lot of times in mass-shooting. The most notabe state is California, Texas and Florida, it was very likely related to the law of gun use in those staes.
Comment: From the plot, whether most of the people have military experiences are unknown. But as for poeple who have military experience. The most preferred gun type is handgun, it may due to the portability of handguns. The second preferred gun type multiple guns. The reason of it may be people who have military experience tend to be more farmiliar with different types of guns. They would choose multiple gun types to make more casualties. Thus, government should pay more attention on guiding the useage of guns for soldiers. Moreover, rifles are the third preferred gun type among people who have military experience.
At first, we stratified the age of different criminals to four levels: children, teenager, senior, olds. To be more specific: children(age<18), teenager(18\(\leq\)age<30), senior(30\(\leq\)age<65), olds(age greater or equal to 65).
From the plot, among children, tennagers and senior, hundgun is the most preferred gun type. The second preference is multiple gun. And the third preference is rifle. For teenagers, shotgun is the fourth preferenced gun type in mass shooting. Therefore, it is obvious that most cases of mass-shooting are related to hundguns, to reduce the occurrence of mass-shooting substantially, government should strengthen the management of handguns.
Comment: The most common possible motives are mental illness, social dispute and domestic dispute.
Comment: It comes to our attention that there are several situations having a high risk of shooting.
The shooter may get terminated from work, denial of status, being reprimanded or punished for workplace or other behavior, and then shot colleague/workmate/business acquaintance.
The shooting arises from domestic dispute.
The shooter has mental illness and shot students/classmates/teacher. This situation is kind of informative. Is it because of high pressure or something?
From the exploratory plots, one relationship attracted our attention particularly. We were suspecting if students are experiencing excessive stress and mental illness. It seems a majority part of the motivations for the school-related mass shooting cases had some kind of link with the mental health status of the shooters (usually students). Now, we are performing a \(\chi^2\) test to test for independence between school_related and history_of_mental as they are both categorical variables. From the test results shown below, we can see that the p-value is smaller than the significance level. This means students are indeed more likely to have experienced some kind of mental illness in our shooters’ sample. Our recommendation will be asking the educators to care more about students’ mental health and stress control.
Below are the contingency table and the test statistics:
No | Unknown | Yes | |
---|---|---|---|
No | 72 | 109 | 60 |
Yes | 21 | 10 | 35 |
Test statistic | df | P value |
---|---|---|
25.41 | 2 | 3.038e-06 * * * |
From the above analysis, we can see our results are pretty shocking. The trend for mass shootings is going up and the shooter tends to be younger. If we do not take precautions, there may be more serious gun issues in the near future. School shootings are persistent, however, this is a very concerning issue. Many students committed homicides because of excessive stress or mental problems, this should alert the educators to give more psychological interventions to guide the students on how to deal with pressure and failures. On the other hand, arising domestic violence is becoming a real trouble, especially for the past decade. To prevent mass shootings, all parties in the society should take actions. People should learn how to deal with troubles and conflicts by other means and how to protect themselves if there is a shooter-in-active situation. The government and Congress should take actions to regulate firearms more strictly if the gun ban is not a feasible option.