Panel Discussion: Bias in AI – How realistic are the mitigation strategies of today? AI DIET World
``I do think we are all born with our own biases. We have evolved with our own biases`` - Susanna Raj.
“Are we starting at the wrong place? When we look at only algorithms or you know, mitigation strategies?”
– Susanna Raj
“New Zealand where I am differs quite strongly from some others from say, the United States in the extent to which enforces protections against violence against women or against LGBT people”
– Professor Michael Witbrock
In this Panel discussion Susanna Raj [DataEthics4All Leadership Council]; Professor Michael Witbrock, University of Auckland and Raluca Crisan, CTO & Co-Founder, Etiq AI explore how realistic the mitigation strategies are today when it comes to mitigating Bias in AI.
0:09 Shilpi Agarwal:
Next up, we have a panel discussion that should be a fiery one. Bias in AI is what everybody’s talking about, but how realistic are the mitigation strategies of today? We have a very diverse perspective from the panelists. We have Professor Michael Witbrock from the University of Auckland and Raluca Crisan who is the CTO and co-founder of Etiq AI. Moderated by our very own Leadership Council, Susanna Raj.
1:07 Susanna Raj:
Welcome everyone, thank you for joining me on this panel. Today we are going to discuss bias mitigation strategies that are being used today and how effective they really are. I’m excited to have an academic perspective on this as well as a business perspective from Raluca, the co-founder, and CTO of Etiq AI, and Michael from the University of Auckland. I just want to start this with a very basic question. Because bias means different things to different people who come from different backgrounds, it means it has a totally different meaning for me as a social science researcher – so I want to start with a basic question: what does bias mean to you?
1:59 Michael Witbrock:
I think it means at least two completely different things. And I think this makes it difficult when it’s used. It can just mean the predisposition in data in some systemic or particular way – so the data suggests some sort of action or suggests some categories that cause the system to behave in a particular way, and that that’s built into the way the system is produced. Or it can mean the sort of invidious predispositions that some human beings have which cause them to behave in ways that we would think are undesirable or even evil. These two things are not exactly the same but they are used interchangeably, and not to our benefit.
2:56 Raluca Crisan:
Within the context of my work and the ethical field, bias is one of those things that is easy to recognize when you see but a little bit hard to define in a standardized way. It’s basically an outcome of an automated decision system which is somehow obviously unfair when you think about it. It could be that an overall system performs much worse from an accuracy point of view for a certain group; so it just doesn’t seem to work for that group. That’s something that if we see we say, ‘oh yeah, that’s a problem, that’s unfair’. Or it could be that we have two individuals who are very similar in all aspects we consider important for getting a loan or getting a job, but maybe they’re different genders – something we don’t consider important for that as humans. And then the system comes up with a biased decision – something we recognize as unfair. It’s hard to define but we can recognize it.
4:11 Susanna Raj:
And I do think we are all born with our own biases. We have evolved with our own biases, some of them have protected us, some of them I think we need to get rid of soon – but we are a work in progress, right? But the biases are embedded in our data as well.
4:27 Michael Witbrock:
One of the things which are so very interesting – the source of this debate – is that as we build systems with the data that human beings have produced, they make those biases. Whether they’re good or bad, biases manifest. They allow us to experiment on those biases. I have a little bit of a problem with this use of the word ‘algorithmic bias’ – it’s not that the algorithms have biases. Maybe the people applying the algorithms have biases… alright, it’s at least logically possible that the algorithms have biases based on the way they develop, but that is almost never what we’re talking about. The data that’s been selected is representative of a human group, and possibly all of the humans over some time period. The decisions that these systems make reflect those biases and make it clear to us, for example, that in our literature we are teaching ourselves to have biases that might well be expected to cause humans to discriminate against women, for example. And by having computers do this, we can point the finger at them and say “bad computer!”, knowing, I hope full well, that it means that we’re bad people too.
6:00 Susanna Raj:
I do agree. There’s this interesting debate going on whether the algorithmic bias starts in the algorithm or it starts in the people or it starts in the data. I feel like that’s a chicken and egg discussion that we will continuously have forever. So what do you think Raluca, on that note?
6:24 Raluca Crisan:
For sure, there’s this approach or mindset that it is about the data. And as I said, for sure it’s definitely a big part of the way things get decided and it’s reflecting our biases and so forth. But it can happen across a variety of things; so for instance, as humans, we make some decisions that are biased based on characteristics that are biased; but as an algorithm, I might just be picking up a proxy. So it’s not something that maybe as a human I would ever recognize to be associated with a certain demographic group, but maybe the algorithm somehow recognizes it and part of the algorithm using that feature is that maybe it has some real explanatory power beyond the bias, but part of it is the bias, and then which one is which is a very tricky problem to assess. So yes, it’s the data, for sure, but it’s also maybe the nature of the system that is discovering patterns in the data, and then how we interact with that system and how we try to shape it and control it. If we don’t want it to do certain things, we can affect that.
7:55 Michael Witbrock:
I think the fundamental difference there – when you say that a human being would never consider doing this, you never want to speak too soon on these things, human beings can be quite awful! But to the extent that human beings wouldn’t manifest the negative biases in the data that they’re learning from – to the extent that’s true, it’s because we have access to a kind of data that these systems don’t have; and ironically we have access to a source of inductive bias, which these systems don’t have, and that source is knowledge. Why can we read the same data that GPT-3 reads about suitable jobs for women and men and not reliably produce the same predictions (across many languages) that the engineers will be male and the nurses will be female? The data causes that to happen if you make a naive language. Why do we know better than to exhibit those biases? We know better because we know better; so we have a piece of knowledge that tells us that we should disregard that sort of information in many decisions because it will cause unjust actions. And as we move towards computers having access to that kind of knowledge, I think that the likelihood is that quite soon, we will be less likely to have decisions based on invidious biases from machines than from people. So I think a lot of the problems we’re having at the moment – not all of them but a lot – are very temporary, and they’re because of a weakness in our AI systems, namely the weakness in the inability to use not just data but knowledge.
10:24 Susanna Raj:
I think this follows very well into our next question – if you think it is so complex and so embedded in the data itself, where do you think, as a solutions-finder, while we are working on a solution to solve this problem, where do we start in the development cycle; where do we start in the AI life cycle? Where do we start solving this problem? Are we starting at the wrong place, when we look at only algorithms or mitigation strategies? Or should we actually be starting at the data collection or the idea consumption stage? So where do you think we should start?
11:09 Raluca Crisan:
Definitely as early as possible, but I think it’s better to start somewhere! So I think that would be my first thing, it’s better to start somewhere we can at least make some sort of an impact. But yeah, it’s definitely early in the life cycle, and there are just a few things we’re noticing from what we’re doing. One, it’s not about the metrics; we’re finding metrics not super helpful. It doesn’t seem to be an optimization problem. I think a lot of the literature is heavily around it as an optimization problem, which is natural because this is the world we all live in, but it doesn’t follow that the solution can necessarily be solved in this way. So what we’re seeing that is a little bit more effective is whether it is at the data or pre-data collection stages or at the output stage. It’s around that kind of interpretability layer, surfacing issues, and then the human can interact with those issues, and in a sense, trace them back. But just to give you an example, for instance, when we say a data collection; let’s say you’re a company that targets certain demographic groups, let’s say white males between 35-65 and a certain income bracket in a certain US state. It’s very hard for you to collect data; your whole product is optimized for this demographic. So when we talk about data collection, it can lead the company to start asking uncomfortable questions about what it is they’re actually doing as part of their business model, because if you’re marketing that your product is for this demographic then you will collect a certain kind of data.
13:09 Michael Witbrock:
I think that one of the things that we’re missing is any sort of clear agreement on what these classes of invidious biases are, from the point of view of people who are building decision support systems. We live in a sort of hodgepodge of ideas from different countries. New Zealand, where I am, differs quite strongly from say, the United States in the extent to which it enforces protections against violence against women or against LGBT people. And certainly, New Zealand differs very strongly compared to some other countries. Suppose we’re trying to build a system that identifies, for example, whether these proxies are being used. You could build a system that looks at your data and sees whether the data has features that can be used as proxies for identifying classes, such as likely vaccination status, or self-described gender. Suppose you build a system that was trying to see whether there are things that are used as proxies for that, it would be very useful to know what sort of things we should build in as required checks for proxies. It turns out, based on a probably to be found out to be bogus study, that your astrological sign predicts your vaccination status in the United States. There’s a news story about that today, I hope we find out that that’s not true! It was 46% versus 70%. If it is true, your posterior probability to astrology would be higher, so one assumes that’s not true. Suppose that was true and you can predict birth month from vaccination status, should we build into systems attempts to detect that, attempts to predict month or gender? Only if they can’t predict that using these sorts of proxies should we allow them to go forward. Only by starting to have a discussion about what the industry standards should be for testing these things irrespective of what the particular countries think will we be able to make progress using the techniques that we’ve got available to us at the moment.
16:54 Susanna Raj:
I think proxies by their existence itself actually stand for societal biases. I mean, all our proxies have a bias built-in, and that bias comes from prejudice; and historical discriminations are built into those proxies. I believe that even using a proxy for crime prevention, or anything else, we have to look at the knowledge behind it. How did we come to know this proxy is a good stand-in for that classifier? So to me that that itself is a dangerous zone to go into, but what do you think Raluca? You are actually in the industry itself, working on solving this problem, so I would like to hear your opinion on using proxies and what Michael was saying so far.
17:50 Raluca Crisan:
Obviously, it’s hard to generalize, but I would say that whether intended or not, it does happen that people use this. But of course, a proxy is never just a proxy, so it’s a very gray area. The problem is when people are not even aware; when they’re not really looking at this topic at all. I’d say that’s a starting point of a problem. If they at least have some understanding that their models might impact actual people in negative ways, at least they can start looking at it and treating it with a certain degree of caution. I think even before that there’s a step that needs to be undertaken. I’m not sure if this answers the question, but it’s very hard to regulate, right? First of all, what’s the threshold for it; how would you define it? There are a few ways to calculate basic metrics, just the basic metrics, so it’s very hard to regulate as such because you wouldn’t be able to define it in a standardized way. But if you look at the data, look at the model, look at the outcomes, you can pick up if something is wrong. And you should be looking at this and picking up if it’s wrong, right? So this is I think the dubious area. I’m not sure if I’m right, but if the people that are working on this are starting to look into it a little bit more then they will at least try to manage within their applied use case to make the decision less problematic, so they will say: ‘actually, income is a proxy for financing, it’s a general decision that I’ll use this. But if I show up with a certain brand at a certain time of day, maybe I shouldn’t use this because actually, it’s a proxy for something that maybe can be a demographic group’, right? These are things that happen, like: the example is clearly wrong, but if you don’t even look for it, you’re never going to find them, you think it’s all good.
20:29 Michael Witbrock:
Well, this case with income versus ethnicity or agenda is an interesting case, because they could show up as proxies in both directions, and the reason they’re a proxy is that they’re actually causally related. Right? People of certain ethnicities or certain genders have lower incomes because of discrimination. This isn’t an accident, so one could argue, for example, that by removing the knowledge of the effect of gender on income from your system, you’re likely to be preserving discrimination. This is actually related to the affirmative action question. You’re likely to be perpetrating discrimination by removing the system’s ability to notice that income is dependent on gender and therefore should not be allowed to affect the outcome for somebody. Of course, people have wildly different opinions about this. Their opinion varies wildly depending on which groups they’re members of or are not members of. Who this generation should be mitigated against also varies widely. So it’s much easier just to blame the algorithms.
22:23 Raluca Crisan:
I’m not sure this is at all helpful, but for instance, in some countries, you can’t use the demographic feature in the model itself. The regulator says ‘no’, for sure. So then people are using a proxy for it, but then the problem is then they get counterfactuals. This is what happens; either you have no counterfactuals, but your model is pretty bad for a group, or your model is actually great and you have loads of counterfactuals just moving around and potentially creating issues. How would someone regulate that and solve for it, it still has some question marks.
23:09 Shilpi Agarwal:
We do have some questions. Rakesh Ranjan says: “Bias in data (at least system of records, if not the system of engagement) may represent the state of bias and prejudice in our own society. So I’m interested in learning the panelists’ perspective on if mitigating bias in data would really solve the root cause?”
23:41 Raluca Crisan:
I believe you can. Not completely remove it, but I think we started at the point at which humans are pretty biased, we put these statistics in place to notice that humans are pretty biased, and now we’re like, ‘Oh, let’s fix this, we’ve made the systems more complicated now; it’s there but now we notice it’s reverting back to ‘oh, let’s just let the humans be biased’ – that’s probably not the way to go. There’s potential for this to make a difference for the good overall, maybe.
24:13 Michael Witbrock:
Yeah, I think it can certainly mitigate the root cause, if nothing else, by allowing us to build systems that can point out the causes and the likely effects of human bias as well as bias in the system from human-derived data.