Speaker 1 (00:04):
This is the Data Science Conversations Podcast with Damien Deighan and Dr. Philipp Diesinger. We feature cutting edge data science and AI research from the world’s leading academic minds and industry practitioners, so you can expand your knowledge and grow your career. This podcast is sponsored by Data Science, Talent, Data Science Recruitment Experts. Welcome to the Data Science Conversations podcast. My name is Damien Deighan, and as usual, I’m here with my co-host Philipp Diesinger. How’s it going, Philipp?
Speaker 2 (00:41):
It’s going great Damien. Thanks. Looking forward to the great topic that we have here today.
Speaker 1 (00:46):
Indeed. We’ve had a few months break, but now we’re back and raring to go, and very much looking forward to presenting you with a really interesting piece of academic research. Today we’re talking about how science is communicated and often miscommunicated in online media. Our expert guest on this very important topic is Ágnes Horvát. Ágnes, we’re delighted to have you with us. How are you doing?
Speaker 3 (01:11):
I’m very well, thank you. Excited to be here.
Speaker 1 (01:14):
Great. So, by way of background, Ágnes is currently an assistant professor in communication and computer science at Northwestern University Illinois, where she directs the technology and social behavior PhD program. Prior to moving to the US she obtained a PhD in physics from the University of Heidelberg in Germany. After that, she embarked on her Post Doc research at the Northwestern Institute on Complex Systems. Her research topic lies at the intersection of computational social science, social computing and communication, and use interdisciplinary approaches from network and data science. Ágnes’s research and teaching has been recognized with a National Science Foundation career, and C R I I award her research group. The Lab on Innovation Networks and knowledge investigates how networks can lead to bias and information sharing and processing on digital platforms. So Ágnes is uniquely positioned to tell us all about what happens when science gets communicated online. So Ágnes, perhaps we just start with how you made the journey from physics PhD to your current area of research.
Speaker 3 (02:25):
During the PhD I did a lot of network modeling, inference of connections. I think some of the people who you are, um, interested in reaching might know this by the name of link prediction was a hot topic, I dunno, 10 years ago. So through that research, I did manage to get a feeling for how data science was used in the biological space, but then also in the social space. And the latter appealed far more to me, given my interests at the time. And that has not changed since then. So I moved to the US to join someone who’s doing cutting edge computational social science, and that has been a leading direction throughout my work. So the, the broad area is quite interesting today because we have these massive data sets that all of us are seeing pop up, and, uh, we have better and better tools to, to understand what’s going on in those data sets. And then the hope is that some of those old questions from the social sciences, from communication can be answered with those new data sets and new tools.
Speaker 1 (03:31):
Great. And perhaps you can give us a, maybe a brief overview of where science is currently in terms of the digital era we find ourselves in.
Speaker 3 (03:42):
Yeah. So as someone who studied physics and is very excited about natural sciences and STEM in general, I found myself in a world where, where some of the scientific findings that, scholars work hard on discovering or not reaching the public appropriately. Maybe not fast enough, maybe not accurately enough, maybe not packaged in the right context. And I think that’s a really big problem. Science’s role does not stop when the paper is published, when the discoveries have been made public to the peers in the community. But it,, ideally would reach taxpayers and the younger generation who can then make better decisions for our future. So, in general, I think the state of science is not worse than it was 10 years ago, 20 years ago, 30 years ago. But I think some of the questions are still the same, and it’s our job to finally do more.
Speaker 3 (04:43):
And so I and my research team, have joined that work where we would try and understand better how science is disseminated. We like to use some of the new platforms, digital media sources to better understand how that might look like in real settings. And then to try and use that knowledge to improve on some of the processes that we know all are problematic. One of them being related to misinformation, the other being related to biases and what type of science is communicated. And then maybe a third one that I’ll preface here is related to who is involved in sharing that science. So what audiences are engaging with science?
Speaker 2 (05:30):
Thanks Ágnes, you already talked about communicating basically research of science. When you use the term,, what kind of audiences do you have in mind? Is it communicating outcomes of scientific research to the general population? Or is it communication between scientists or mix of all of those?
Speaker 3 (05:47):
So ideally mix of those two plus communicating science to policy makers, governments, the institutions that can enact change. Think of problems around climate change. I think we are past the state where we think that individual audiences or individual scientists could make big breakthroughs. So it’s important as you are pointing out to make sure that we communicate science properly within academic communities. Scholars talking to scholars and, and making sure that the cutting edge research reaches relevant colleagues, but then also communicating with general audiences, the public, the importance of this part cannot be overstated. I’m sure all of you are thinking about the pandemic and thinking about the many ways in which the pandemic showed us how important it was to reach people. And regardless of demographic, regardless of background, regardless of location, to make sure that they are aware of the most important results so that they can change their behavior accordingly, or adjust their daily lives accordingly. And some of that research was in flux as, you might remember. So doing this communication in a timely manner, in a very effective way is essential. And then the third piece, as I mentioned, interested government interested policy makers, should also have the access to, not necessarily the nitty gritty of the research, but the main takeaways with an understanding of course, of the assumptions that went into the model so that they can, present this to decision makers and they can make sure that the right policies are enacted.
Speaker 2 (07:35):
So between the scientists who conducts the research and then the audience, there’s obviously some communication channels, some networks, some medium in between. Yeah. Maybe news outlets. Could you talk a bit about that and what you focused on in your own research?
Speaker 3 (07:53):
I mentioned that science is not in a worse shape than it was maybe 30 years ago. But one main change between then and today is that we have all these new platforms, digital media platforms that have changed fundamentally the way we communicate with each other, the way, formal and informal information sharing is happening. All of you are very well aware that social media has fundamentally changed the way we interact with our, friends, acquaintances with our larger audiences. News have changed quite a bit. Digital news have revolutionized, access to news stories at a very quick pace. It has also left some of the local journalism in crisis, meaning that, a lot of people who traditionally would have, obtained news about science and technology from their local newspaper cannot do that now. So there are interesting changes in terms of the news related landscape.
Speaker 3 (09:01):
And then, we have these new platforms that act as knowledge repositories, as encyclopedia, online encyclopedia that supplement for many, especially from the younger generation, the information sources that would have been traditionally booked. So we have all those very different channels. A lot of us are blogging and reaching our audiences through some individualized forms of communication. And so there are these, there’s this wealth of sources, that people use currently to obtain information about science and technology that are really important. We don’t currently understand how science communication works on those platforms, at least not perfectly.
Speaker 2 (09:45):
You talked already about the difference between how science is basically consumed or distributed today versus maybe 30, 40, 50 years ago. Is there also a difference in sheer volume? I would expect that probably there’s much more to be reported now than there was before, but I’m not sure about. Is that something that you looked into?
Speaker 3 (10:06):
Absolutely. Scientific production has exploded. The growth in the number of articles that are being shared currently has grown exponentially over the past years. We have seen this trend increase. And so also the type of research that is being done is far more interdisciplinary. It’s far more team driven, meaning that the experts that are available to review this work are also stretched thin here. So that a quality, control is becoming an increasing issue with growth in science, with the growth in complexity and interdisciplinarity of the science that is being produced and with diminished availability of people to do that quality control.
Speaker 2 (10:53):
So with this paradigm shift of scientific communication, now that’s something that you call a post-normal science. Can you talk us a little bit through that concept, that idea?
Speaker 3 (11:03):
30 years ago, Robert and Futo had an interesting piece that prefaced some of the issues that we are seeing now as well. They talked quite a bit about how already at that time, science was becoming more and more different for the very traditional idea of science, the normal science that was very much centered on puzzle solving, guided by curiosity, it could be extremely productive. They were saying how climate risk environmental changes created a need to, to have a different approach to science that is more, aware of the fact that some values might be in dispute, that the stakes are high, that decisions need to be made very urgently, and that there’s some inherent uncertainty in these problems. When you look at this, and think through this, then you realize that the covid pandemic has been very much like this in so many ways.
Speaker 3 (12:02):
So that’s why all of us were going back to this, fundamental piece, because it also highlighted the importance of this tension between having to make accurate, urgent decisions based on lacking information where the different stakeholders had very different opinions. So, and this is what we call post-normal science, and this is what we are concerned with when it comes to research today. So not just the research that’s happens in a silo, in a behind closed doors, that is fulfilling for the scientists curiosity and personal development, but it’s very aware of the context, very aware of who the scientist is, what sort of questions, positions they bring to the table, and what type of problem they are trying to solve with broad impacts that goes beyond an individual curiosity or moving science for forward.
Speaker 2 (12:57):
You talked already about different networks and the role they play in, propagating science or scientific results, basically through the audience. Could you highlight a little bit the role that social media plays within this concept of post-normal science?
Speaker 3 (13:11):
So, social media has some, really important features that make it quite appealing these days. On the one hand, the diffusion of information can happen at a scale that, and at the pace that is unprecedented. In many ways, the sharing on social media is not the broadcast type model anymore, where there’s one central message source and everyone else is getting the same information. But we are connected with, our personal networks, with our family, our friends, our colleagues. So the, structure of these networks enables a very different sharing. And a lot of the work, in this space has focused on the role of echo chambers on the role of filter bubbles. And just to highlight the importance of realizing how information can get trapped in certain communities and how we can get into a situation where by the sheer sharing and the sheer volume of the information on social media, one can have a very interesting understanding of the state of affairs, um, in science or in a certain topic. So social media has this, role in picking up the pace of information diffusion. It has the role of changing the networks, the sources we see as sharing information. And then finally it has this, familiarity, this intimacy where I think a lot of us tend to trust some sources in our networks more than others. And those sources can of course, spread misinformation in, in many cases.
Speaker 2 (14:55):
Do you have some sort of quantitative data for how much researchers themselves rely on social media?
Speaker 3 (15:02):
We have seen literature on, increasing percentages of scholars who are using social media. So somewhere around 75, 85% of scholars have reported promoting their work on online platforms, whether on social media or some of the knowledge repositories. This is a trend that, has been increasing quite a bit, and the most broad research areas are, are engaging in this type of behavior now.
Speaker 1 (15:32):
So Ágnes, what occurs to me is that most people will probably be familiar with how very robust science gets published. And then unfortunately, someone who may be well-meaning and very influential, but they don’t necessarily have a good grasp of the underlying science can take something, put their own narrative on it, on social media, and then that almost becomes the standard, message that gets communicated. But that’s not the full picture of scientific misinformation. So perhaps you could elaborate on that please for us.
Speaker 3 (16:07):
One example that comes to my mind when you talk to Ion is that all of us have seen in the context of the Covid pandemic, once again, the anti-vaxxer community being very strongly present online. And they go back to this one piece from the nineties, a paper that has falsely claimed the link between autism and vaccination in children. And this research has been found to be flawed on many accounts and has been retracted in the scientific context, meaning that the research community said that this was a baseless claim given, the problems with the way the research was set up. Science’s way of self-correction then kicks in through this process of retraction where the journals retract or take the paper back, and they signal it very clearly on their platforms that this research is not valid. And this is a process that, I feel like we, have not studied well enough or in enough detail.
Speaker 3 (17:08):
And that introduces an interesting form of misinformation, where it’s not due to typically to political, motivations, but through false, scientific claims that we get a wrong picture about a certain problem. Retractions can be quite pervasive. So just to give you a sense, overall, maybe four out of thousand papers are retracted. So the sheer number is not that, impressive, but the reach through some of these new platforms and some of the social media platforms that, are highly spreadable can be problematic. So our research question, our main motivation was to try and understand how audiences that include scientists, but also the public interact with, retracted papers on various platforms on social media and news, on knowledge repositories and via blogs.
Speaker 2 (18:10):
You talked about harmful, scientific publications. Is the harm coming from misinformation that just keeps spreading and circulating? Is it basically what happens after the retraction, or is it that the problem occurs when papers are shared wildly before they’re even being retracted, and then that information kind of manifests?
Speaker 3 (18:32):
So one of the most surprising findings that we made in this research was that actually work that gets retracted in the future is shared more often on most of these platforms, including social media after publication, we did the extensive comparisons with comparable similar papers that came out, around the same time in the same venue, had the same number of authors with the same sort of prestige. And what we found was that papers that get retracted in the future have a wider attention or shared more broadly after publication, so long before we know that results are flawed. And if you think about this finding, on the one hand, it’s concerning because this means that people hear about problematic findings more, than potentially more than about correct findings. But also it talks a little bit about how the process is handled currently.
Speaker 3 (19:37):
So if you don’t have high visible science, the visibility doesn’t lead to the scrutiny that is needed to retract the piece. Retraction itself in academia is, this complicated process that scientists are trying to be very careful about. Also, because of the, the way, research is seen in the public eye, it would be very problematic if we would retract every paper that makes a small mistake for those types of errors. We have, error correction, but retraction is, is quite substantial as an intervention and also as a way to signal that this is, this is fundamentally wrong.
Speaker 1 (20:18):
I was just gonna ask, how long does it take a full retraction process? Typically?
Speaker 3 (20:22):
There’s no typical length. I I would say the range, goes from a couple of months to years. Sometimes these, false findings can, can survive for quite a long time. What’s interesting, and I’m hopeful that it can become useful down the line, is that on social media in particular on Twitter, we have seen signs of people being more skeptical about, research that is going to be retracted well before it gets retracted. And one way to think about this is that, there is some interesting signal in the community of users that talks about research on Twitter. Keep in mind this community is comprised of scholars on the one hand, but then also, the lay public science communicators practitioners. We oftentimes see bots involved in this, in this process and yet it’s, very interesting to see that there are some encouraging signals, some signals of quality that could guide us to, where we should look closer if we wanted to, speed up that retraction process
Speaker 2 (21:36):
And the actual event of the retraction. Does this reduce the attention that, publication gets immediately? Is that measurable?
Speaker 3 (21:44):
It is measurable, and unfortunately the bad news is that it does not. If you look at this carefully across social media news, you see that the trend in attention has been decreasing way before the retraction has been issued. So the retraction itself is not effective at curbing the spread of this sort of misinformation when it comes to scientific communities. Scientific communities can go to the origin, the publication site, and there typically they can see that the paper has been retracted. What happens on social media typically is that there’s no way to flag a tweet that mentions a retracted paper with, Hey, the research has been retracted. And I think this is actually something that social media platforms, Twitter being one example, could attempt to do a sort of flagging of the messages. But currently that’s not happening.
Speaker 2 (22:43):
And, historically, did you see whether there’s more retractions now than there were before? Is does this have something to do with the peer review process? Maybe there’s higher volume also.
Speaker 3 (22:54):
So retractions are on the rise. We have seen, many retractions related to the covid pandemic, which again, is not surprising given how quickly researchers, publishers had to move to get some of the most important findings in front of the decision makers. So there was an important trade off between accuracy and spending time on the research. So clearly this is an increasing problem. So we wanna be more aware and we wanna make sure that it’s not just the academic communities that can identify retractions in a timely way, but that’s also, the case with audiences that don’t have a formal training in science, but are interested in, learning about science and tech and, and do so via social media platforms. The challenges for peer review have increased, and we’ve been seeing that for a number of years due to the growth in papers due to the publish or parish culture in academia. So we are incentivized to publish more that’s, the most important survival mechanism for scholars in academia. And then it just puts additional burdens on, peer review that is, as we know, a process that’s based on volunteer work in many ways. So, scholars review others’ work as part of their service, almost always without getting major recognition, let alone pay for that. So it’s a process that relies on science’s ability to, to do this quality control and it’s becoming harder and harder.
Speaker 1 (24:38):
Could it be argued that social media is actually speeding up the discovery of problematic research because it makes it into the public domain earlier? Is that something you’ve seen?
Speaker 3 (24:50):
Yes, absolutely. So social media is important in the sense that it speeds up, this sharing of opinions and it can involve, audiences that typically don’t have anything to say in this process. However, there are very important caveats when it comes to using social media as a red flag system, because we know that social media is, is far more prone to spread false information than through information in various other contexts beyond science. So I don’t think at any point in time we’ll wanna have social media adjudicate the veracity of a paper or the trustworthiness of the paper. However, there are voices on social media who can do some important signaling that we might wanna consider in addition to other types of evidence. And in addition to taking a very careful look at each individual case, we know that these platforms have not been created with some of these important functionalities in mind that we see them fulfill. So one of them is now sharing science, social media, Twitter, Facebook, Reddit have not been created with these functions in mind, and yet they are now taking on prominent roles in the dissemination of science. And I think we are in a really interesting time, where we have to negotiate this relationship between what are the potential benefits and what are the many very apparent, drawbacks of these platforms.
Speaker 2 (26:28):
Which kind of data sources do you use for your research and what kind of methods do you apply?
Speaker 3 (26:33):
I spent quite some time building collaborations with companies or NGOs that would have, data sources that are relevant to some of these questions. A big partner for this type of work is Alt metric. A company that has been collecting for the past 10 years mentions of research articles on various digital platforms. So they have a really comprehensive data set that is looking at how, uh, articles are mentioned on social media, in the news, on YouTube, in policy documents on patents. So they are doing this, important work that, lays the groundwork essentially for, some of the explorations, some of the descriptions and modeling that we are doing. Then for this particular project, we also had a very valuable collaboration with the Center for Scientific Integrity that maintains the retraction watch database they shared with us, and they continue sharing their updated data sets with us, which enable us to connect the retraction information with some of the online attention trends, so we can do these comparisons at scale and we can break these down by what was the reason behind the retraction, what journals were affected, which scholars were affected, and so on.
Speaker 3 (27:59):
So there’s a really important wealth of data underlying this. We are also working with the web of science that’s another data source that, lets us look into citations topics, collaboration patterns over time. Important piece to consider when we look at science as a team effort and guided by the roots of co-authors behind some of these projects.
Speaker 2 (28:25):
And if somebody would like to play with the data themselves, is there any, places or sources where they can find those?
Speaker 3 (28:32):
So most of the data sets you need to pay to access them other data sources. You just sign a non-disclosure agreement and say how you are going to use the data, and then you get access to it in terms of methods, because, I think that’s another component we always think about a lot. We try to, to go for the tools and approaches that, will give us the level of accuracy that is needed for us to make, confident claims about this. So whenever we typically, of course, would like to do things like determination of, skepticism automatically because that’s easier at this scale, we oftentimes see that that’s not possible. So, we tend to do a lot of data cleaning, a lot of manual coding, a lot of, curation, a lot of checking between sources.
Speaker 3 (29:33):
So I think as data scientists that’s still top of mind. Despite some of the advances in in automated tools to clean our data sources, there’s still a lot that, needs to be done in this semi manual fashion. We run a lot of machine learning models and oftentimes regressions just to make sure that we are, speaking to different audiences and, do so with the tools that they are most familiar with. The experience as always is that if you have quality data and you have a robust trend in the data, then regardless of the method, you will find, identify that trend. So typically we try to stay away from models that are, overly black box, especially in the social sciences where you wanna identify mechanisms, you wanna, know why you found a certain outcome, a certain score, then it’s something that we, we are yeah, very aware of.
Speaker 1 (30:35):
That’s fantastic. Ágnes, would you have any,, final comments? Maybe perhaps some advice to both academic researchers and the lay consumers of, scientific research about how we together tackle or reduce the problem of miscommunication?
Speaker 3 (30:55):
Million dollar question <laugh>? So yes, I think there’s, a shared responsibility here. All of us can do a little bit more. On the one hand, in terms of scientists, scientists have a responsibility in making sure that their, scholarship is not misrepresented. That they, look online how social media audiences are talking about their research and speak up whenever that’s not in alignment with the intended message and with the actual research. In terms of the public, I think it’s very important to be aware that that science has its own problems. It’s not easy to know where those problems are, if you are not within the hybrid tower and, in academic enterprises. But I think just having an open mind to, and, probably even a critical mindset even when it comes to research is helpful following best practices and educating ourselves in terms of how digital spaces can inform and misinformation us is essential.
Speaker 3 (32:09):
Just staying informed is probably the best we can do. And then I think there’s a big burden in terms of the institutions and decision makers who, need to step up and rise to occasion and do more in terms of switching through the information sources that are out there that we know that in the current information ecosystems spread a lot of misinformation. I think it’s up to them to solve the problem at the more systematic level. And the solutions are not easy. If they would, would be, we would have figured this out already, but this is work in progress and I think all of us, regardless of our role, have to be doing our share.
Speaker 1 (32:48):
I think that very nicely brings today’s episode to its conclusion. Ágnes, thank you so much for joining us today. It was an absolute pleasure talking to you, and we wish you the best with your research endeavors at Northwestern.
Speaker 3 (33:04):
Thank you for having me.
Speaker 1 (33:06):
And thank you also to my co-host, Philipp Diesinger and of course to you guys for listening. Please do remember to check out, the show notes and our other episodes@datascienceconversations.com. And in the show notes for this episode, you will find the GitHub repository for the research that we talked about and also links to the publications relating to some of the fantastic work that Ágnes and her team have done done. We look forward to having you with us on the next show.