DataEthics4All Live: Bias in AI in COVID 19 with Toju Duke, Product Lead at Google
“It’s a well-known fact now that AI data sets are really rooted in bias.”
– Toju Duke
“If the systems are very inaccurate right now it’s probably best not to use them – we’re supposed to be saving lives, not allowing more people to die. The problem though is that we are dealing with huge numbers of people; humans alone cannot address those issues, so we do need some form of automation and AI system.”
– Toju Duke
I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Expand
About the Talk
With Coronavirus and the destructive path it’s forged across all countries, communities and economies, AI is being adopted by private and public entities globally to develop and deploy solutions to track and predict its spread amongst other solutions.
While AI could help reduce the overwhelming strain on health care systems, its algorithms are subject to biases that further deepens societal injustice and inequities across the world.
In this episode of the DataEthics4All Live Fireside Chat, Shilpi Agarwal, Founder of DataEthics4All and Toju Duke, Product Lead at Google discuss the current applications of AI systems on Covid-19 contact tracing, patient diagnosis and health progress across race and diverse groups and the effects these applications have on communities of color which have seen 2-4x more deaths from Covid-19 than what has been expected.
They also discuss how authorities and businesses should approach bias in the the datasets and algorithms AI systems are built on to reduce the systemic injustices being addressed today.
Transcript
0:05 Shilpi Agarwal:
Hello everyone and welcome to DataEthics4All Live, where we talk with data ethics trailblazers about the future of data and AI. I’m your host Shilpi Agarwal, and today’s topic is particularly relevant right now: we are going to talk about bias in AI and in Covid healthcare solutions. It gives me great pleasure to welcome my guest, who is a Google Product Lead for Europe, the Middle East and Africa; she is also the project manager for Women in AI Ireland and founder of VIBE. Please welcome Toju Duke! Welcome to the show!
0:51 Toju Duke:
Thank you Shilpi, and to everyone in the DataEthics4All group, happy to be here!
0:56 Shilpi:
We are so excited to have you! This is a conversation that’s at the top of our minds these days; coronavirus has forced its destructive path across countries, across communities and across economies. This pandemic has taken lives, it has damaged health and produced a historic economic decline. But it seems like Covid-19 has a disproportionate impact on communities of color. Can you elaborate on that a little bit?
1:52 Toju:
Yeah, happy to. I was just thinking about this talk and thinking about how on 31st December 2019 everyone was singing happy new year, there were lots of recounts of what happened in the past decade – the triumphs and the trials, people who have passed away, the companies that have made the highest turnovers and all of that. Everyone was excited to go into the new decade, and come 2020 now – we’re just in the second half of the year – I personally can’t wait for 2020 to be over! A lot of people echo the same sentiments – it’s just been very interesting; there’s been lots of things going on in 2020. And that goes back to the racial injustice and systemic injustice that we’ve seen; and now it’s resurfaced in 2020 with all the things that have happened with Covid-19. The recent Black Lives Matter protests due to killings of black people in the US; racism and systemic injustice have been brought up again, there have always been conversations around them, but I think now more people are beginning to understand what people of color have been talking about for a very long time now. Something struck a chord across everyone because we all watched the video of George Floyd. And I found it really interesting to see how Covid-19 is still a racial problem; it’s almost like people of color, minority groups, disadvantaged groups always get the brunt of anything bad that happens in society, including Covid-19. When you look at data right now
Toju: People of black descent or Latino descent in the US are three times more likely to get the virus, and they’re twice as likely to die.
4:23 Shilpi:
So is there a particular reason why it has impacted minority communities more; is it education, the way we live our lifestyle – what is affecting one versus the other more than usual?
4:41 Toju:
When you look at the data in terms of incomes; people from minority groups are earning half the national average income, and that means that they cannot really apply social distancing measures in their communities – all sharing apartments and sharing houses.
Toju: Then when it comes to their jobs, their jobs are mostly frontline jobs, so even if some of them are middle class doctors and nurses, they’re still jobs right there on the front line, and because they’re already in a disadvantaged community, they need to go to their jobs; most of their jobs they can’t do remotely. So we have jobs like doctors, nurses, bus drivers, shelf stackers, technicians, production line jobs, services jobs – in all these jobs people need to be out there on the front line on public transport, rubbing shoulders with everyone else and being exposed to the virus. I think along those lines as well, there’s been some form of unconscious bias in terms of the treatment that they receive, in terms of doctors showing some form of discrimination to people of color and even in terms of prioritizing who should be treated. I’ve heard stories of people of color being sent back home even when their coronavirus case was severe; those people were not treated, and they died. In a time of pressure, the things inside you come out the most, including all forms of biases and unconscious biases. Doctors are under so much pressure right now; they’re like our soldiers right there on the front lines, taking the bullets of this virus – in that situation it’s very hard for them to make calm, objective decisions when it comes to who they prioritize.
7:53 Shilpi:
So I’m really appalled to hear that about the doctors – they are making the best judgment as much as possible, but we have heard stories, not just sending people of color home, but we have seen that they also send other people home; they treat a younger person more than a senior citizen because they have to make a judgment. I don’t know how they are making this humanitarian judgment, playing God here – but it must be very difficult for them as well, I can imagine, playing God is not easy. Taking the decision of who gets to be treated versus who gets sent home and for how long and and all of that must be hard for them as well. But if they are consciously making any kind of biased decisions then of course that’s not good.
Shilpi: Let’s talk about the apps that are being developed – there are just so many! App-based contact tracing is appealing in part because the coronavirus spread is so stealthy; infected people transmit the virus for days before they develop symptoms, and it can take several more days for public health investigators to learn about the case and confirm it with the testing. These teams have precious little time for traditional contact tracing where they go and interview an infected person, track down everyone who has come in recent contact with them as much as they can recall – I mean it’s hard if somebody asked me what I did over the weekend, I sometimes forget, or even yesterday, right! We do so much, it’s hard sometimes to recall, and then getting those people to self-isolate before they pass on the virus is difficult. By the time you get the data, you have a couple of days to chase people down. So if smartphones could detect the two users who are close enough to share the virus with an app that could alert one person as soon as the other person gets sick, it looks like a promising solution. So how has AI been applied to Covid-19 contact tracing applications and healthcare in general?
10:22 Toju:
I think with contact tracing there’s still a bit of a controversy there, but I’ll just touch on it a little bit. It uses location tracking capabilities on people who have caught the virus, and it helps them determine where they have been and who they’ve been in contact with by looking at their location.
We have a few countries who have adopted it, like South Korea, Taiwan, and of course China integrated it into some of their main apps like their Alipay apps and their recharge apps which are major apps used by lots of people in China.
We have countries in Asia all using these contact tracing apps, we see Europe slowly adopting it as well – but the Norwegian government for instance have decided to drop it mainly due to privacy issues. It’s collecting people’s data, amassing it in a database and AI technologies are meant to be using that data to help provide solutions for Covid-19, but then there’s a big question around regulations of privacy and user data – if I had Covid-19, do I even know where my personal identifiable information is being sent to? I think it probably would be sent to the Irish government, but I have no idea if people in China already have this information because they could be selling it off to third-party apps as well. That’s the major controversy there – if we do not have any proper global regulation around privacy and people’s data, especially around Covid-19, how much do we allow all those countries and companies to use this data? Google and Apple decided to come together and develop some APIs and some technology and said they’re actually going to work together to help improve the contact tracing app to make sure they protect user information as much as possible. So the big tech companies are trying to provide some solutions to this problem, but it’s a major problem. In terms of healthcare overall, there are lots of things that AI is doing, like tracking surveillance around areas that have the most Covid-19 cases; we have chat bots that help answer questions around it; helping with drug development as well. There’s a lot of work around data mining as well, helping to build the data sets that we have and build predictive computer models; there’s lots of literature searches digging deep into all literature to see if any predictions can be made. So a lot has been going on with AI in the background, which is all great news.
14:14 Shilpi:
Today 50 countries including some of the countries that you mentioned – New Zealand, South Korea, Vietnam, some parts of Africa and Europe, Russia, China, India, Australia, some states in the US including North Dakota and Utah – have all implemented contact tracing to some degree. Some have made it mandatory; some have opened their APIs to satisfy the more privacy-conscious citizens, but at the same time you also mentioned how Google and Apple have come together and are trying to provide some guidelines around what the contact tracing app should or shouldn’t do; what data it should or shouldn’t collect. What is disturbing is that 50% of these apps don’t even have any sunset policies or even transparency around what data they are collecting, how long they will be collecting it, even when the pandemic is over. All those are really questions of concerns for privacy-conscious citizens and it has led to a very big debate. And then there are also questions regarding the effectiveness of the measures of contact tracing – do you think it is a viable solution to help with the Covid-19 crisis?
15:42 Toju:
I think it’s a solution that will help reduce the spread, ideally. I’ll give an example from here in Ireland – the numbers were contained a little bit and it got to the stage where we stopped recording any new cases, and the main reason for that is because there was a lot of contact tracing being done.
We didn’t have the app at that time but between the healthcare sector and the police force, they were scrambling around trying to find everyone who the patients had been in contact with to contain the virus.
So I do believe that contract tracing can help contain it to a certain extent, of course it’s not going to provide a solution or any drug development to cure the virus or produce any vaccines, but it should be able to help contain it to a certain extent. In terms of percentage, I can’t give any numbers right now because I haven’t really looked into that and I’m not even sure that data is available yet, but it’s definitely a step towards the right direction.
16:47 Shilpi:
True, true. According to Douglas Leith, computer scientist at Trinity College Dublin, and computer scientist Stephen Farrell there’s two technologies – GPS and bluetooth beacon. Of course we have identified that GPS is not as effective but even with using bluetooth beacon technology, the signal strength is much lower – there are some challenges. When two people are sitting in a restaurant across the table from each other; if they have their phones in their pockets versus if they have their phones on the table the signal that the bluetooth sends is different, and so identifying who came in contact with whom is a challenge. Some of the other challenges are like metal reflection that plays a part in the sending of the signals; the strength of the signals of the bluetooth beacon. So do you know of any other current challenges in AI that could affect the predictions and the results in patient diagnosis and treatment?
17:52 Toju:
What I’ll mention right now is just the prevalence issue that’s been raised recently around bias in AI, because it still goes back to effects and the predictions that any AI system spews out in every industry, including healthcare and Covid-19.
It’s a well-known fact now that AI data sets are really rooted in bias. We’ve had recent issues with the MIT Tiny Images dataset for instance that was pulled a couple of weeks ago mainly because of the issues it had with bias.
Just to give some more context for people who probably don’t know the story; MIT had a data set built in 2008 that was helping spew out high-res images, and it was working with another AI data set called ‘what net’. Now the problem with that is when it assigns a text to an image, it could be very wrong derogatory words – for example it will compile a black man’s picture with a monkey and use the n-word to identify it; or with a woman carrying a baby, it calls her a whore. Because there were a lot of things that were unearthed by research scientists, some of them in Dublin, who actually tested this model and saw that these are sort of outputs that this model brings out, it was announced in the press and MIT pulled it back.
Toju: The problem with that though is a lot of companies have gone ahead to apply this dataset in their AI systems as well – so how much pulling back can we do?
We have Amazon and IBM who have pulled back their facial recognition software because we found out that it has an issue where it doesn’t recognize black skin color; it recognizes more white skin color.
Toju: We have something else around Pulse, which is another data set that was built by university researchers, and it still couldn’t identify black skin – it’s meant to take a blurry pixelated image and identify the person; they inputted Obama’s face into it and Obama came out as a white man, or they put in Muhammad Ali and he came up with blonde hair. It’s not just around the data sets, it’s also around the historical and societal inferences.
So if we have people making these data sets and they’re not exposed to any form of diversity in their thinking, in their societal upbringing, in their interactions with society, then it’s very likely that the data sets will be biased
Toju: In the sense that it’s not going to represent a diverse and inclusive range of people and ethnicities and cultures and genders. The problem with that is if we have any AI systems that spew out predictive solutions for Covid-19, once again we already know what that means – the black communities are being suppressed and oppressed by this. I think that’s part of what’s happening because we do have AI systems that are predicting the severity of the illness; most AI systems that do this work with image and computer visioning or image computing; and if they are using that, these are already rooted in bias because most of them were trained on Imagenet, which is another data set and another model that is biased.
21:33 Shilpi:
Very true. I’m going to pause here and take some live audience questions.
21:41 Terrance Johnson:
If I’m understanding the conversation properly, I don’t see how emerging technology is going to change long-standing and pre-existing human racism and discrimination. It seems like it more so would be used as more of an oppressive tool and weapon, since it primarily appears to be coming from the same bad actors. I kind of spoke to this in your last meeting and it seems to me concerning the whole digital divide
Terrance: the conversation should be talking about how to get more oppressed and vulnerable populations into technology, so as to positively affect their own lives.
23:10 Toju:
I totally agree; whatever is coming down the line is not going to address these issues, we have to take a step back to address those issues. One of the things I’m going to work on in the next few months is examining data sets, coming up with best practices that we can apply in the industry, looking at the amount of bias that we have in the current data sets that people are using; that companies and governments are utilizing, and making sure that there’s some form of transparency and extendability in these AI systems. We have a few tools already out there across the tech industries; Google for instance have something called model cards where you can actually document the different models that you have for your AI systems, and if that’s documented then if someone has a question around the platform or the AI, hopefully those model cards should be able to give some more transparency into why that AI system arrived at the prediction that it made. We also have something called What-If tool, where if that’s implemented at the start of building that algorithm, ideally it should be more transparent and explainable.
I think that’s where ethical AI comes into play, making sure that whatever new AI systems are being spewed out they’re following some form of data ethics, and for the current AI systems, making sure that we do examine them and reveal any biases and any inaccuracies that they have.
Toju: That’s what’s happening today, and that’s why we have Amazon and IBM pulling their facial recognition software; we have MIT calling back widely used data sets, because of these inaccuracies; we have university researchers calling back Pulse as well. So we’ll keep on seeing more of this, but I think what’s important is for all of us to come together; when we identify these things, actually make some noise about it, raise it to the right people.
If more people come together to talk about it, hopefully globally we’ll have some form of regulations that are not too intrusive and not too dependent on the tech companies
Toju: but at the same time signifying there’s still some form of guidance and some form of law and regulations involved. With these third party AI companies, there’s one that is actually being sued by the government now because it used data in the wrong way – that will be avoided. On the flip side as well, like Terrance said:
We need more diverse people working in these industries, working in these teams as AI engineers, data scientists, because if they have more diverse people, the new data sets coming up should represent what they believe in and societal changes.
26:06 Shilpi:
We have just a few more minutes, would anyone else like to ask any more questions?
27:30 Susanna Raj:
I have one question, Toju. Just this morning there was a news release about a data set; it was the largest ever, and they found the same disparity which we already know, but this was more documented. It seems like South Asians and African Americans are disproportionately affected in the health care system. So even though it’s all in the future and we are working on it, what do you think we should do now so that people get access to care in those communities?
28:13 Toju:
Well, if the systems are very inaccurate right now it’s probably best not to use them – we’re supposed to be saving lives, not allowing more people to die. The problem though is that we are dealing with huge numbers of people; humans alone cannot address those issues, so we do need some form of automation and AI system.
Toju: I think for a short-term solution, if we feel that the data sets that have currently been applied are not good enough and are really rooted in bias, it’s best to understand what proportion of these communities, the black and Asian communities, what proportion of them are really being affected – and if it’s a large volume, then we have to ditch the AI model in the short term until it’s fixed, because we can’t provide solutions by killing or oppressing people that have already been oppressed for centuries.
32:14 Shilpi:
In the end, this is an ongoing discussion, we don’t have a clear solution for this – but these are some of the questions we do need to raise, and we need to ask these questions so that we can come up with a better tomorrow. As Dr Maria van Kerkhove, technical leader at the WHO says: “asking the question ‘are we doing enough’, regularly and repeatedly is critical”. With that I want to thank Toju and everyone in the audience who joined us today and helped us with this important topic and interesting conversation. Thank you Toju!
33:06 Toju:
Thank you, it’s been great being here, thanks for having me!
Shilpi Agarwal – Founder & CEO, DataEthics4All
Toju Duke – Product Lead, Google