Speaker 1 (00:03):
This is the Data Science Conversations podcast with Damien Deighan and Dr. Phillip Diesinger. We feature cutting edge data science and AI research from the world’s leading academic minds and industry practitioners, so you can expand your knowledge and grow your career. This podcast is sponsored by Data Science Talent the data science recruitment experts.
Speaker 1 (00:32):
Welcome to the Data Science Conversations podcast. My name is Damian Deighan and I’m here with my cohost, Dr. Phillip Diesinger. Joining us today is Professor Maurizio Porfiri to talk about his latest academic research, which is using data science to uncover why sales of guns in the USA actually increase after a mass shooting event. It’s extremely important work with some surprising conclusions. By way of background, Maurizio conducts and supervises research on complex systems with applications from mechanics to behavior, public health and robotics. He’s an Institute Professor at New York University’s Tandon School of Engineering, and additionally he has appointments at The Center for Urban Science and Progress and the Departments of Mechanical and Aerospace Engineering, Biomedical Engineering, and Civil and Urban Engineering. Maurizio received a Master’s and PhD Degrees in Engineering Mechanics from Virginia Tech, also a Laurea in Electrical Engineering with honors and a PhD in Theoretical and Applied Mechanics in 2001 and 2005 respectively.
Speaker 1 (01:51):
He’s been on the Faculty of the Mechanical and Aerospace Engineering Department since 2006, when he founded the Dynamical Systems Laboratory. Maurizio is the author of more than 350 journal publications, including papers in Nature, Nature’s Human Behaviour, and Physical Review Letters. He has many, many significant career recognitions, which include being listed in The Brilliant 10 list of popular science in 2010, The National Science Foundation CAREER Award, and The Research Excellence Award from NYU Tandon School of Engineering. His research has been heavily featured in major media outlets, such as CNN, NPR, the Scientific American and the Discovery Channel. We’re very excited to have you here today Maurizio, thank you so much for joining us.
Speaker 2 (02:53):
Thank you very much for the introduction.
Speaker 1 (02:53):
Fantastic, and just to get us going, could you tell us what was it initially that motivated you to dedicate your research to the field of complex systems?
Speaker 2 (03:04):
Yes. So complex systems, it’s a new field, but it’s an old field. It is something extremely fascinating because it combines many different disciplines and it requires the development of new mathematical techniques to uncover emergent behaviour that is the result of interactions between many, many, many agents. We have seen at the Nobel Laureate this year, for example Professor Parisi being one of the pioneer in this field. He did amazing work that touched on climate, quantum mechanics, collective behavior. There are many analogies between these fields and the complex systems theory helped Parisi link in this area and providing somehow a unified framework to understand the inner workings of so different systems.
Speaker 1 (03:57):
Ok, great, and what then about that field – how did you go from there to doing research into guns and mass shootings?
Speaker 2 (04:05):
That’s more of a personal choice. When I started my career, I primarily had been focusing on understanding collective dynamics in animal groups, trying to understand interactions between leading agents, let them be humans, let them be fish or other animals that display these beautiful emerging [inaudible] More lately I discovered that similar relationships can arise in very different contexts, related to public health. For example, I’ve been looking at the diffusion of policies in the United States understanding for example, how laws related to alcohol has been diffused throughout the country. And, something that came about really as a personal interest, was trying to understand the emergence of [inaudible] underlying the sales of guns. I have graduated from Virginia Tech, which unfortunately came to fame about 14 years ago, due to tremendous mass shooting that has actually happened in the building where I was studying. That was the year before I left the university. One of my committee members lost his life to protect his students and that shook us very, very much. So over the years, the study of gun buying always attracted my interest . Over the years I started to think that perhaps I could do something and 4 years ago I started this journey trying to see what I could do to help the field and bring techniques from other areas in collective dynamics, to understand the phenomenon of gun violence.
Speaker 1 (05:57):
Thank you Maurizio for sharing that very personal story, and for then going on to really make a contribution in this field.
Speaker 3 (06:06):
You have a very interesting publication in Nature Human Behaviour in 2019 on gun violence, analyzing the root causes as you had already said. Can you talk us through what are the kinds of questions that you set out to answer with the data?
Speaker 2 (06:21):
Initially, we started looking at the relationship between the occurrence of mass shootings and the purchase of guns. So it has been widely documented in the literature that there is an uptick of sales right after a mass shooting. If you indeed plot the data, you will find a quite strong correlation. So you will find that there are more guns sold after a mass shooting than in times when mass shootings are less frequent. And therefore on the basis of these correlations, there have been a general sense that people buy guns because they fear for themselves. So what we tried to do is apply some techniques from different fields to understand if this correlation would beget somehow a form of causation, or if it would be simply a correlation between two variables, which we know may not be as indicative of a causal mechanism. So the journey that we started back then was to understand if there was some mediating variable that would indeed underlie the correlation between sales and violence.
Speaker 3 (07:41):
So it was clear that there was correlation in the data, but it was not clear whether it was causality, right? Can you help us understand a little bit better what kind of data you used for this analysis? So if I recall correctly, you had three very key datasets that you based your analysis on.
Speaker 2 (08:00):
With respect to the mass shootings, believe it or not, we still don’t have a universal definition. So we need to specify criteria for identifying a mass shooting. So what we did, we utilized a dataset from Mother Jones, it’s quite popular in the field and refers to what we call public mass shootings, which are events that are happening in a public space without an underlying relationship of blood, like there is no family relationship between the shooters and the victims and there are more than four casualties in the event. And there is no relationship with criminality. It’s not a gang related crime. It’s what we really define as those impossible to explain events, so to speak. We collected data from 1999 to 2017. We have data that in principle at a daily resolution because we know the date at which the mass shooting has happened.
Speaker 2 (09:09):
We cleaned the data a little bit. So we took this Mother Jones data, we verified one by one, and ultimately we collected the clean dataset with the number of fatalities, the date they happened and we utilized these as the mass shootings time series. So this I think is pretty standard and well understood. What instead is more questionable, or at least more interesting, is the gun ownership dataset. We don’t have a gun ownership dataset, that’s the reality. We do not know where the guns are. We do not know when they are bought, we just have proxies. We can have a bunch of proxies that can be utilized to estimate sales. The proxy that we utilized is the number of background checks. The background checker has been active only since 1999. So we had about 20 years, a little shy on that.
Speaker 2 (10:14):
So when you go to buy a gun, you will have a background check from the background checker so I can understand if there was a sale in a particular State, I don’t have more information at the State level. And what they did back then was at a monthly time resolution? This as a proxy is not perfect at all because I can do a background check and then do not get the gun. I may just say, I don’t want the gun. Or the background check may fail, or I can get a gun through another means, I can get even illegally, but even if I want it legally, I can get it as a gift, I can get it buying at a fair. So there are many other ways that are not counted in the background check. So we need to take with a grain of salt, it’s a proxy.
Speaker 2 (11:02):
So based on these two datasets, the gun ownership through background checks at a monthly resolution. Then the other is mass shooting from Mother Jones. We can run a simple correlation and we discover a strong relationship between the two variables. We have that there are more sales when there is a mass shooting. So the data mass shooting I mean on a monthly resolution are a bunch of zero and then sometime there is an event, sometimes there are more than one event. There are several incidents on a yearly basis. If I’m not mistaken, the number of events that we looked at was less than 100, somewhere on the order of 70. So over 20 years, you can do the math. Instead, the background check time series is very rich and very interesting. On a monthly resolution, what you observe is a strong seasonality. We’ve two peaks during the year, the beginning of spring, March or April and the other peak is around November, December.
Speaker 2 (12:07):
Why there are these two peaks. We have a couple of theories why people are buying guns more in these two specific times. More likely there are more sales around Christmas because of gifts. And with respect to the April March timeline, it is likely related to the tax rebate. So in the US, after completing the tax cycle, you may get some money back if you overpaid. And a typical investment is to buy a gun with that extra check that you get. Then the data is absolutely not stationary on average. We observe a quite strong trend upward. So we have more and more guns, which are coming into the country. There is a clear trend, to give some sense, if I estimate from the background check, how many guns are coming to the country, I can tell you that a couple of years back we reached the number of guns that have passed the number of people in the US. The number is going up and up and up – the trend is upward.
Speaker 3 (13:21):
And how did you process this data to deal with these effects that are in there?
Speaker 2 (13:25):
These techniques rely on stationary of the time series. So white correlation you can do for variables that are going up, or that if you want it to pick up a causal link, then you want to be sure that you just are not relating to trends. So what we needed to do is first of all, clean the data and make sure it would be stationary. We have done this in two consecutive steps. There’s the whole, we remove the trend. So we identified a trend in the data, and we simply eliminated it then obtained a dataset that didn’t have any trend, but still a seasonality effect. And then we utilized a technique which is called tramosites that is heavily used in econometrics to filter out the seasonal effect. So at the end, we obtain a time series that didn’t have a trend, and didn’t have a seasonality. And we verified with a simple study [inaudible] with a particular that at the end, we had a session item series and [inaudible] series we could understand if the local upticks, the local changes in sales would indeed relate to mass shootings or to something else, which is indeed at the very heart of the questions we want to address.
Speaker 3 (14:51):
Very good, yeah. So, in addition to the mass shootings event data and the background checks data, you had also a third dataset, which is the media output that was generated at the time, yeah. Can you talk us through how you measured that, how you quantified that?
Speaker 2 (15:04):
Now, this is the key part of the research. So we have these two variables, and we want to understand if there is something that underlie the relationship between the two of them. So what we did we, as a first step, we tried to quantify the media coverage about the particular events. We went on a database that is called ProQuest, and that is routinely used by journalists to look at different thesis. And we can get information about articles published about specific topics. From ProQuest we can select the outlet in which we want the publication to happen and what we did, we looked at Washington Post and New York Times. We looked for three specific topics. Articles that discuss shootings, excluding any discussion on regulation. Second topic, shootings and regulation. And then third topic, unemployment. And what we tried to do was to capture three phenomenon.
Speaker 2 (16:20):
One was how media cover the act itself. Then second, how media covers the discussion that typically follows a mass shooting, which entails the possibility of coming up with new regulation. A regulation that may curtail access to guns. And then the third one was a time series that instead, would be some type of control telling us what is the level of unemployment in the country? Unemployment is important because it tends to be related to social unrest. So somehow telling you that a gun can be helpful if you want to protect yourself. If they call on me, goes down and somehow the situation gets really bad.
Speaker 3 (17:10):
That’s super interesting. So you basically had these three data sources basically background checks, events, and also media output that you quantified like you just described. And you already mentioned that there are correlations in there, of course. But the question is, is there other causation in the data? And this is always a very hard question to answer. Can you talk us through the thinking process that you had when you approached this analysis and what kind of analytical tools you eventually chose for this task?
Speaker 2 (17:43):
We have to be very, very specific on what type of causality we want to look at because this is something where there are very, very strong opinions and indeed, it’s always a tough argument because we don’t have a manipulation of the system, it’s not that I can manipulate the system and pick up the specific effect of one variable or another one. What I am doing, I’m doing in observation. And then from the observation, I’m trying to figure it out if one variable has some predictive effect on some other variable. So this specific notion of causality we look at is within the notion of being their grand causality, specifically, what we try to do is the following. Very simply, let’s take time series and we want to argue that one variable has a causal effect on the other variable. How do we check it? What we do is we measure the entropy of the variables. The entropy is a measure of the degree of uncertainty. So loosely speaking, what we try to do is we are trying to understand if the uncertainty, even the prediction of the future of the effect variable is reduced by adding additional knowledge regarding the presence of the causal variable.
Speaker 3 (19:18):
Yeah. So you’re basically, you’re dealing with time series data, yeah. And you’re looking, if you can predict a given time series by using the historic data from a different time series, right? And that’s kind of your criteria for causality in this sense, yeah?
Speaker 2 (19:34):
You are perfectly right. The prediction that we make is a prediction in a statistical sense in which we are trying to minimize the uncertainty. So that’s what we try to do. So we work with these notion of entropy. In order for everything to work, we need to add stationary time series because mathematically or practically, we will be computing a bunch of probability maths functions. So we must be sure that what has happened in 2015 is reminiscent of what is happening in 2017. If instead, everything keeps changing every month, then you are in a very tough spot because you cannot utilize your past data or your present data to estimate the underlying probability maths functions. What we do in this study, we have moved people’s time series. So it’s really important that we consider the existence of other time series, because you may end up with phenomena that are pretty bad, like a common driver. So hypothetically, you can have a causal variable which influences both two effect variables and then you look at interaction between the two aspects and they look like interacting with each other. But in reality, they are not because they are under a common driver. So what we need to do when we perform our analysis, we apply something a little different that we call conditional transfer entropy so that when you do the computation, you always account for the remaining variable. So I have the cause, the effect and the rest of the world that is accessible to us at least. So [inaudible] this is by controlling four time variations of the other variable. So what I do now, I compute a bunch of transfer entropies between old pairs of variables conditioned on what is left. We have a statistical test that can help us understand that if the interaction that I am determining are different than chance.
Speaker 2 (21:47):
Once I recognize those that are significant, I do get an understanding of the causal relationships underlying my variables. We have very, very little data. So if you follow the story from the beginning, we have 18 years, 12 months but it is really not that much. So in principle, you want to do many, many, many things, but in reality, you are bound the only way you have to perform the analysis. So there are a few technical details that you may be interested in learning through the paper, which is how we consolidate the information about the time series in a small number of variables that can be estimated through the process. So we cannot, for example say, what is the exact value of the sale? We will only be able to say, sales are going up, sales are going down. Media is going up, media is going down. So we simplified the series into a sequence of symbols that are telling about increase or decrease, and then perform the analysis on these binary and series. And that helps quite a bit because we can actually perform something that is statistically robust. We’ve a dataset that is not as resolved in time as one may wish.
Speaker 1 (23:12):
What are the outcomes of this? What did you learn from it?
Speaker 2 (23:14):
So the first thing that we learned, which is the beautiful one I think, is that if you perform the analysis by accounting for media coverage on firearm regulation, then we observe that there is a causal relationship from these particular media source and the background check, but we don’t discover any effect of neither media shooting nor media unemployment, nor mass shootings on background checks. So the theory that people buy guns because of self protection may not be as strong as we were thinking at the very beginning. And the analysis that we conducted suggests a different story. People may be buying guns because they fear that regulations may come up and they will not be able to buy guns in the future again. So it’s not the fear of being a victim, but either fear of not being a buyer in the next future. So it’s a very different mechanism that the data science analysis suggest with respect to what would have been our initial intuition.
Speaker 1 (24:37):
So what you’re saying is it’s really consumer behavioural driven rather than obviously a threat to personal safety. What do you think are the implications for this, for policymakers of this finding?
Speaker 2 (24:52):
I think it is more viable to conduct analysis at a State level because policies are very, very different depending on the States. And the appetite for guns is also very different depending on the State. And the response of this data is also different depending on [inaudible] factor, let them be ideology or your logical, geographical, but every State is different. That’s another interesting finding of our paper that I’m happy to talk about.
Speaker 3 (25:23):
How did you quantify the legal restrictiveness in the individual States?
Speaker 2 (25:27):
What we went on doing next was trying to understand if States behave like one, or if there are differences between States. What we thought was that if the hypothesis that you buy guns because you fear you cannot buy again, then we would expect that in States which are very, very, very strict the effect will be smaller because if I have already told the laws that challenged my ability to buy, then you know what, if they come up with an extra law, let it be, I still cannot buy so I won’t be able to buy later. But instead if I am living in a State, which has few regulation, then if they come up with new ones, I may not be able to buy. So what we expected is that the values of the transfer entropy in the entire analysis would be mediated by what we call the legal restrictiveness of the particular State.
Speaker 2 (26:33):
So what we did is we counted the proportion of laws related to firearm safety that were active in each of the 50 States out of a total of 133. And based on that, we ranked the States and we went from the strictest to the loosest. And there is quite a bit of variation. Interestingly, the variation is not dynamic in the sense that a State that 10 years ago, tended to be looser in terms of regulation its the loose States that are very strict remain to be strict. There is quite a bit of variation. You have huge differences, for example, between a State like Massachusetts, really, really strict, and then another State like Vermont with many less regulations. And what we were able to uncover is that there is a strong effect of the law restrictiveness. So you need in the States in which you have more laws, you are the more restrictive environment you have a weaker effect of media coverage or regulations on the sale and vice versa in States which are loser in terms of regulation, you have a much stronger effect. So from the policy making point of view, that one is an indication that different policies affect somehow how people respond to mass shootings and how they act in terms of buying guns.
Speaker 3 (28:09):
That’s super interesting. Yeah. How did you quantify the restrictiveness of the laws? Did you categorize it or how do you measure that?
Speaker 2 (28:16):
Yes. So we looked at an existing database for the State Firearms Law Project in which all these regulations are spelled out and we identified those that pertain to safety. And there is a group of them, and this has been done by other researchers. And then we went State by State and counted which ones of these laws were active in that particular State. So each State we have basically a score from zero to one, one is if they have the entire deck of laws, zero if they adopted none.
Speaker 3 (28:54):
You’ve talked us about this analysis already with the State restrictiveness, legal restrictiveness. One question, would it be possible or would it make sense to also consider for the event dataset, a vade factor and not just a binary variable, yeah? So as a vade factor you could think of the fatalities for instance, or something like that, or people being wounded. Would that contribute to the analysis. Did you consider something like that?
Speaker 2 (29:23):
We didn’t consider it, given that we have only couple of hundred data points, we didn’t have the luxury of being able to resolve these details, but I do think it’s very important and this is something that you need. We are looking at it now with the new dataset that has been released by the FBI, relating to the background checks where now we are able to get daily resolution. So now that I get a much better time series of thousands of samples, I am starting to look into exactly what you are describing. Anecdotally, for example, the largest speaking, the media output on coverage, if I’m not mistaken, either exactly in correspondence of the Sandy Hook massacre, which had involved the young children in Connecticut. Each event is different to some extent, not only in the severity of the event itself, but also in the echo they receive in the media and the discussion they trigger. So I do believe that it’s important to somehow account for more variables that entailed, for example, the location of the event, understanding why, what were the drivers of the shooter, understanding how many weapons did the shooter carry? Did he want to kill many more people than what he was able to do? Accounting for many, many more variables to describe the shooting accident can actually help understand more dynamics of the process. But for now, the analysis that we have is only zero one happened or didn’t happen in a month.
Speaker 3 (31:02):
And that shows already a lot of insights, so that’s very successfully done, very well done. Um, any limitations. So you already started giving a little bit of perspective on next steps, basically what you are still working on and so on. Any limitations of the analysis that you would still want to work on?
Speaker 2 (31:19):
There are so many, so many limitations. First of all, ownership. We are really not counting all the guns that are out there, so we need to do better than this. So we started working on other metrics of gun prevalence based on, for example, the suicide with a firearm that is another proxy of gun prevalence. Is it greater, maybe not. So what we have been trying to do is looking at multiple proxies and trying to construct a model that can help us predict the number of guns. With that we can have better insight and use it for better estimating ownership. So this is work in progress. Then the media, media we consider only two venues in the study. Somebody can argue that the venues are also not covering the broad political range of opinions in the US. We have done some work in this direction, in our following publication, where we have looked at different media sources like the Chicago Tribune, The Tribune in new Orleans, we have picked the new outlets so we have done something in that direction. Is it perfect? Certainly not. Because when you speak of media, media is what they give you it is not what you process. So I believe that something important for us to do in the future is also looking at beta data. That can be quite important because it can really help us understanding the sentiment of people who own guns. When, if media coverage is that we are receiving information and many times also we receive information through other means of social media. So we really need to go in that direction. I want to understand it, and we need to work to get these additional dataset analysis. Analysis that we were only able to do, binary variable, a monthly resolution. We should do much, much better. So we are getting data now at a daily resolution that should help utliizing more variables at the same time, exploring other confounding effects, looking at what you described before, which is a more detailed representation of the mass shooting. Also, we want to catalog the media depending on their audience. So we have a lot of work ahead.
Speaker 1 (33:49):
So Maurizio, what other final comments or thoughts would you like to leave with us regarding your work?
Speaker 2 (33:58):
Mass shootings are very rare, extreme events that cover a very, very small fraction of the lives that are lost to gun violence every year. Roughly we are talking about 0.1%. So we shouldn’t be looking only at mass shootings as an outcome of firearm violence. And indeed the work that we are trying to do now brings many, many more variables. We are looking at suicides. We are looking at homicides. We are focusing on domestic violence. We are looking at gangs as well. So we want to get a much, much bigger picture of the outcomes of firearm related violence. So that I think is really, really important. So mass shootings are events that trigger a profound discussion because they are gruesome, they are unexpected, very difficult to explain. But, they are the tip of the iceberg and there is much more, not more profound, but there is a baseline problem that is losing life to firearm violence and that is something that we would like to understand, we would like to discuss.
Speaker 2 (35:22):
So some of the work that we are doing now is indeed looking at these more complex firearm ecosystem with all its pieces – prevalence, violence, regulation. We are, we have been fortunate to receive support from the [inaudible] Foundation and we have put together a very strong team with expertise in human behaviour, policymaking, public health, applied mathematics, data science and we are trying to look at the three different scales. We look at the micro scale, which is how individuals behave and what are individual responses to firearm violence, firearm prevalence, and any firearm related stimuli. So we want to do the Twitter study. We want to do a serious psychological investigation of individual behaviour. Then we have the mesoscale, which is the State level. So our State respond, what are policies inactive in our States, how our State coordinate in their response to firearm violence. And last, the nationwide response, which is how as a nation, we are dealing with firearm violence and this one is mainly my piece of work, which is looking at the causal relationship, but we would like to patch all these pieces together and be able to offer this multi-scale understanding of firearm violence from the driver to ultimately the outcome.
Speaker 1 (36:55):
Great, thanks. Maurizio, it’s absolutely inspiring and very, very important work that you’re doing because it is such a huge, huge problem, so I congratulate you on that.
Speaker 2 (37:07):
Thank you. I am very proud of this work because it uses data science without any agenda. So when we looked at data, we didn’t have a desire to explore a relationship that’s to another one. We objectively looked at the dataset and applied techniques that are from our field to try to understand what could be potential mechanisms explaining the data. I believe that this is extremely important because it provides policymakers of whatever political orientation they’re from, with objective answers to questions. A scientifically grounded understanding of a problem so that they can make their decision with the best knowledge on what are consequences. I think this is really, really important and it’s key to our work.
Speaker 3 (38:04):
And if somebody wants to take a look at the data, or maybe your analysis or the paper, where can they find the materials?
Speaker 2 (38:11):
So we have a GitHub repository from my lab, so they can just search the GitHub of the Dynamical Systems Labs at NYU, and they can get the entire dataset. All the data that we analyzed are available there, they are all clearly organized. We’ve files that can help going through them. And then we have the codes or whatever codes were utilized for a performed [inaudible] Analysis is available on the GitHub. We have a couple of papers, which I believe would be a good read. One is a paper in 2019 in Nature Human Behaviour, entitled Media Coverage in Firearm Acquisition in the Aftermath of a Mass Shooting. And the second paper, which I also recommend reading is Self Protection Versus Fear of Stricter Firearm Regulations, examining the drivers of firearm acquisitions in the aftermath of a mass shooting. The papers I believe compliment each other with different techniques, looking at the national and more granular State level analysis, and overall answering the questions we have been talking about.
Speaker 1 (39:32):
You will be able to find the links to both of those papers, the datasets and the GitHub repository in the show notes and on our website at DataScienceConversations.com. And that sadly brings today’s episode to close. A really, really fascinating conversation. If you enjoyed this episode, then I should just let you know that our next show will feature another one of NYU’s finest academics, where we will be talking about AI ethics and in particular, how companies are now using AI for hiring employees. But for today Maurizio, thank you so much for joining us on the show. It was a really superb conversation.
Speaker 2 (40:18):
Thank you very much for having me.
Speaker 1 (40:20):
And thank you also to my cohost Phillip Diesinger. And of course, to you for listening, we look forward to having you with us on the next episode.