Back To Top
AI-DIET-World-Speaker-Tulsee-Doshi-Google

Building Responsible & Inclusive Products: AI DIET World 2021

``We actually decided instead that we would remove all gendered suggestions, and we would only suggest things that were more generic in their nature`` - Tulsee Doshi

“We want every single user who visits a product uses an experience or engages in an environment to feel like the experience is built and made for them”

Tulsee Doshi

test

10

``One of the top 10 women in AI Ethics`` - Tulsee Doshi

test

“Human bias can enter this at any stage of the process, there isn’t a one size fits all solution to fairness”

Tulsee Doshi

Expand

How do you build a culture of Inclusive Products? Tulsee Doshi, head of product for Responsible AI at Google, will speak to her journey in building up a cross-company product initiative, and key lessons learned when building large-scale, AI-driven products…for everyone. Tuslee Doshi is the Head of Product for Google’s Responsible AI & Human-Centered Technology organization. In this role, she leads the development of Google-wide improvements, resources, and best practices for developing more inclusive & ethical products. Tulsee has been recognized as one of the top women in AI Ethics and serves as an AI Ethics advisor to growing Insurtech, Lemonade. She holds a BS in Symbolic Systems and an MS in Computer Science from Stanford University.

Expand

0:15
Hello everyone and now on to our next guest, which is a young and dynamic speaker. He is the head of product for Google’s responsible AI and human-centered technology organization. In this role, she leads the development of Google-wide improvements, resources, and best practices for developing more inclusive and ethical targets. So she has been recognized as one of the top 10 women in the higher takes and serves as an AI ethics adviser to growing to insure tech called Lemonade. She holds a BS in symbolic systems and an MS in computer science from Stanford University. Please welcome terrific Josh Doshi. Everyone. Nice to see all of you I guess good evening to those of you who are in the United States and may be good morning or afternoon to those in other places. Super excited to be here and to be chatting with you and it was great to hear Kurt I think so many interesting points there. So hopefully some of them will overlap with some of the things I’ll be talking about today and others will speak of fodder for more discussion. So let me share my screen let’s see if that works. All right. Hopefully, you can see my screen. I’m going to assume that as Yes. But someone paying the comments or the chat if you cannot see the screen. Yeah. The stream is so awesome. Thank you. Alright, so what I’ll do and I know we’re a bit over time, so I’ll try and keep this somewhat brief. So we can also and if you have any questions or anything like that, feel free to put them in the comments chat. We’ll try to touch on them as well. As Shilpi mentioned, I am the head of product for responsibly I Google which means that I work across our product teams to think about how do we build more inclusive, more responsible, more thoughtful products? And we’re talking about products that affect billions of users across the globe in many different ways. How do we think about users, creators, advertisers? How do we think about users from different communities and different backgrounds who might use products in different ways or have different use cases? And so for me, a lot of what I think about is how do we set up not just products that can build better experiences for users, but the processes such that we can continue to do this for every product that we build every product that we launch every time we make a change for improvement. So today, what I’ll do is set a little bit of the stage and then talk about kind of three high-level lessons that we’ve learned as we started to do this, and of course, we are still learning so much there’s still so much to view in terms of how we build more inclusive more responsible products. So you know, as a high level, starting with this example, because I always think it’s it’s, you know, important to grounded in what are we really trying to do, right, building inclusive products. fundamentally is about enabling every single user to feel seen and to feel right we want every single user who visits a product to use an experience or engages in an environment to feel like the experience is built and made for them and to be able to achieve their goals and to achieve their needs in that experience. And I really love you know, in this new line of Barbie dolls from last year as an example of that it’s a multidimensional view of what it means to be fashionable, what it means to be beautiful. And well you can see that manifested in this physical, physical product. That’s an experience and a reaction we want manifesting in our existing products. Right. And this is not just about technology or about new products being developed. I love this tweet from 2019 Because it’s about a man who is wearing a bandaid for the first time that actually matches his skin color. And it’s amazing because until I read this tweet, I didn’t even realize bandys were supposed to match skin color because, for me, that was never an experience I actually had. And so how many of us experience technology in a certain way without even realizing that it could work or should work better for us right? And you develop packs you develop ways of engaging the technologies that work for you. Because you have to, or you stop using it all together because it doesn’t work. So you know, the underlying mission behind all of this is how do we build products that evoke this sentiment in every single one of our users. And when you start getting into you know machine learning as Kurt alluded to and as others, I think have throughout the day.

5:04
The challenge, of course, is that the process is complex, right? We’re talking about data collection, data labeling, then training. Then there’s some sort of, you know, filtering or aggregation or ranking that leads to some sort of product experience. Users that see it. And then they have behavior that informs that collection. Right. So if you think about, for example, a recommender system, like YouTube recommendations or ads or something else, you might have a collection of data from how users engage with the product from whether or not something matches a search query that someone types in. That data then gets labeled based on that engagement. The model is then trained, then you have filtering or aggregation or ranking of that content. Users can see it. That’d be concurrent. And every step of this process, you can have bias enter the system. Right? You can have bias enter the system and the way you collect the data. What users did you collect it from? Where did you choose to collect it from? How did you collect it in a bias in the way that you label that data in the way that you train the model in the way that users actually see and perceive the data in the ways that they then engage with it, which then goes back into the way that you collect the data? Right. And so because human bias can enter this at any stage of the process, there isn’t a one size fits all solution to fairness, because there isn’t necessarily a single place in which concerns might intervene or a single place in which you might want to make a change. This is also true when you think about other responsible concerns like safety, privacy, security, right? These concerns can enter any part of the pipeline, and also multiple parts of the pipeline, which means that the way that they actually manifest it, and users might be different, and the way we might go about it might be different. And so we’ve seen that in our products as well. Here are three examples, but there are countless different ways you have to think about and address concerns. On the technical side. We’ve seen with a lot of our camera products, and especially ones that are focused on you know, base map or entering users like the nest hub max that we want to make sure these products work across diverse skin tones. We want to make sure that no matter who you are, you can easily access your device but also that your device is secure for you and actually protects you based on this. Well, we actually found that we had to sample more data we had to collect more training data, we also had to evaluate our model and improve its performance across skin tone and across perceived genders as well as the intersection of those two, to make sure that our product truly worked in a way that we felt proud of. But that’s very different from other products like female Smart Reply. In the case of Gmail, we find that we found that like we had these smart reply suggestions, right. So you would someone would send you an email and then we would suggest three quick responses that you could just quickly respond. And someone would get an email saying, Hey, did your engineer take a vacation? And the answer would be yes. She or Yes, sorry. Yes. With the assumption that an engineer translates to being male, now, the question is okay, is that a modeling bias? Yes. But is the solution to make the model better? Not necessarily. Right. In this case, we actually found that we weren’t even comfortable with the idea because gender is not binding. And just because gender is not just because, you know, you might see something more commonly or you might be able to figure out a better, more accurate model doesn’t mean we actually want to perpetuate ideas and binary gender. And we actually decided instead that we would remove all gendered suggestions, and we would only suggest things that were more generic in their nature because we felt like That was actually a more equitable product experience. So that’s an example of a policy and on the right, you see examples where we actually change things from a UX or UI perspective. So you might be able to make the model better or make things better from a data collection or technical standpoint, you may be able to change the policy. You may also want to see the end experience users very rarely interact directly with your models, right? There’s some wrapper around that there’s some user experience with their agent. And he’s a Google Translate. We actually found that that user experience is really what we wanted to Google Translate has long had concerns around gender bias and translation, especially between languages that are gendered and nongender, right. So for example, if I say PC is my friend, in English, the word friend is nongendered in Spanish, that would actually turn into right, qualified.

10:01
But what if we don’t actually know what gender is when such as in English and we found that often, it was perpetuating gender biases and stereotypes? Professionals, translated doctor to male nurse? And so when we think about okay, how do we actually make this more effective for users again, so we could make the model better and make it more accurate, and that’s something we actively are working on. But we also again, realize that because gender is nonbinary and also very personal, the name could be a woman. It could also be someone who goes by a nonbinary gender. It also could be someone who is male and so in that case, how do you translate this approach? So instead, we give users a choice. Here that mark on the screen, you see that the word friend gets translated both to amigo masculine and Amiga, allowing the user to actually decide what is the approach they want to take? What is the context that they have a maybe the model doesn’t? And the reason I’m walking you through these three use cases is because I want to highlight just how different the responsible AI approaches are. And two of the cases, are not actually even directly affecting the model that is being used. But affects the way that this user is against. So when we think about how do we build responsible AI, it’s about looking at that entire pipeline, starting from the beginning all the way to how we adapt. So it is so one, not one size fits all and so diverse in the way that we approach our products. How do we actually build that in? How do we set up culture at will starting to build these changes into our products? So there are three things that I’ll highly lightly touch on. One is building a shared vocabulary. The second lower barriers to entry and the third accountability. So what I mean by building a shared room well, three years ago, we released the AI principles which are seven principles of what we believe AI should be. And four principles of applications that we Google would never pursue and it’s interesting because if you read these principles, they both provide a valuable scaffold but are also very generic right. So you see things like v one V as socially beneficial to avoid creating or reinforcing unfair bias. Three we built and tested for safety. Be accountable to people incorporate privacy design principles. These global principles don’t necessarily prescribe exactly how these things should be done. But what we found that they’ve done is create a shared vocabulary around the organization of things that we value in our products, things that we expect to see what products go to launch, things that we want to be measuring and evaluating. And while we don’t have all the answers for how to do all of these things for every single one of our products, it provides motivation for our product managers or engineers or our research scientists to ensure that these are baked into that product development process. We’ve also found though, that these challenges can be rough. It’s not always clear what type of challenge or what type of improvement to make. Wow. And so what we found is that we’ve had to be incremental and iterative, to truly understand individual product needs and context. Often we need to do foundational research. We need to work with communities to understand real user needs across the different community groups, we need to understand how to think about concepts like skin tone, right? or gender. We need to also understand technical concepts, right? How do we actually make a model more accurate for different slices of communities? So we actually think about that foundational research and then try to put it into practice into a product. And often we find that when we put the foundational research into practice and product, it doesn’t work the first time. We actually have to iterate on it, make differences make changes. Once we’re actually able to land a change then in a product then we can say, Okay, how do we actually steal those insights? How do we take what we learned, and turn it into something that can truly lower the barrier to entry for our organization so that more people can build off this? Let me walk you through an example of how that actually works. So we have a bunch of classifiers across the industry at Google and across the industry at large that aim to identify things like age harassment, abuse, spam, right? You want to make sure that for example, you filter out horrible content that might be offensive or extremely low quality. And so we have this classifier that we released externally called the toxicity classifier, that prospective API. And what it does is take a sentence and it classifies the level of toxicity from a scale of zero to one. So if you have the sentence, what a sweet puppy, I want to hug her forever. That gives you a score of a point oh seven. If you say you’re the worst example of a puppy I’ve ever seen, really mean thing to say to the puppy. That’s a score of point four. Right? Seems reasonable to want to be able to filter out hateful content.

14:58
But what happens when that goes wrong? And here you see an example that a user uncovered. When he’s had the sentence I am straight, gave the score point oh seven. The sentence I am gay got a score of 24. And so the question we started asking ourselves was, how do we proactively improve classifiers? How do we make sure that we evaluate our classifiers for these problems, and we also proactively improve them to ensure that these types of issues can’t happen? Right, that we catch them and that we prevent them from our models. So that led us overtime to do this set of work, starting with foundational research, where we actually identified a set of model training techniques that could prevent a model from learning these types of associations towards a particular group and our research team worked on developing these techniques. We published research papers, developed metrics. Then we actually looked at our products and we said, Okay, can we actually use these techniques in our products? And we tried adversarial training and a number of content classifiers across Google. And we found issues with stability, we actually realized that the approach doesn’t work in all cases, and then it actually affected our overall product’s ability to perform. And we had to iterate on the technique and develop a new technique called the MINDEF loss function, which we found was much more effective at both improving our models and maintaining overall product performance. And so then we said, Okay, how do we scale this? How do we make sure that multiple classifiers across the company are leveraging these techniques or evaluating their models, especially if there are similar use cases? Can we apply similar learnings? And so we actually developed this into a TensorFlow library. And actually, when I developed the slide it was to be released, the library has actually now been released. And we’re actually now using it in various places across the company. And of course, this isn’t a one size fits all solution either. There are some cases in which we don’t want to use this library. And so we’ve also had to work very thoughtfully on guidance around when we’re comfortable using this and when we want to be careful and when we don’t believe it’s actually the right. Another example is model cards. model cars have actually taken on a larger role in the industry of broadly thinking about transparency, how do you actually document the limitations of your model as well as their value so that you can have an honest conversation with your team, with potential users, with academics and regulators? Being able to share what works and what doesn’t work so that we’re having honest experiences with our models. In 2018, the first paper was published around the model cards for transparency, and how to operationalize these methods for ml fairness, transparency, accountability. We then actually tried them in product and we found out the model cards were a lot harder to build. While they were very valuable as a concept needs to better understand how we support teams and actually get all of those different pieces of documentation in place. And so as we learn how to do that, and we got more clarity, we actually built the model card Toolkit, which simplifies the creation of model cars and we released that also externally and are using it internally as well to make it easier and easier to create these types of artifacts. Before I continue, I do want to touch on the comment in the chat around a shared vocabulary, which I think it’s very true, right? If you have terms that are too ambiguous, you may find yourself in a place where not everyone is actually sharing the same, the same language right and the same definition and can also modify those terms of their use. And so I think one of the things that we are doing is both trying to expound with case studies what we mean by these terms, right. So when we talk about fairness, what does that actually mean? Providing color through examples of scenarios, trying to do training internally to make sure that we share that language and not just the terms, but also what we intend for them to meet it with that should be in for product teams, and then working with individual product teams and organizations like YouTube or photos or pixel to say, Okay, what does this mean for our product? How do we take these high-level definitions and vocabulary and turn them into something more tangible? And that means also, you know, throughout the lifecycle, how we actually ask questions, right? We want every team to be thinking through both from starting with product definition, to eventually deployment and monitoring. What are the problems we’re actually trying to solve? Who was the intended user? How was the training data collected? How was the model trained? How was it tested and evaluated? How was it deployed in the monitor? And what are the limitations of the model? Which goes back again to this concept of model cars and really documenting those means? So throughout that process that hopefully helps even continue to clarify what right what questions should be asking, how do those questions map to the overall standards.

19:58
The last thing I just want to touch on is this idea of accountability. You know, we want to make sure that we lower the barrier to entry that we think about how to partner with our product teams that we better understand how to solve these problems. But we also need to be able to create processes that allow us to support teams and understand them. So we think about how do we triage requests for review and identify the relevant AI principles? Then we look at precedents, we talked to internal experts on privacy, security, fairness, we then actually conducted valuations, right we evaluate metrics. We will talk to external advisors as we make adjustments and mitigate. And then we might approve, or we might block and we based on that we may decide new precedents and new approaches that we may take moving forward. And so this is both a combination of being proactive in setting the vocabulary and building the techniques and the approaches, and also being reactive and saying, Hey, we want to prove our launches and we want to make sure they go through a diligence process of being evaluated, and where we can make thoughtful decisions about whether or not certain technologies should be launched. Or not. But most importantly, right throughout all of this is about how we be intentional about what we’re building through every part of the product design and development process. And I did want to share this because you know, it’s an exciting launch that we had yesterday for Google, which is the pixel real tone, which is an effort credit across the company to really build a more inclusive, more equitable camera experience. And what we found is that we had to do that in a number of different ways. We had to improve face detection performance across skin tones. We had to improve auto white balance our auto exposure models, blurriness. And so throughout the process, this was developed in partnership with the community and I think what we were so excited about was, we pulled in a bunch of image experts who are celebrated for the kind of imagery that they they do have people of color and those individuals gave such thoughtful, thoughtful, actionable, engaging feedback, we were able to take that feedback and actually leverage it to make sure we were making the right improvements to truly hammer in and hone in on where the camera was failing, and how we could make that. And the reason I give this example is not just because I think it’s an exciting push forward for Google and I’m really proud of the team has done so much of this great work, but because I think it really shows that building. This is not just about throwing on a particular model change or assigning a particular movement. But the way you think about the process building in the right expertise and the right individuals, making sure that you’re doing the due diligence with communities and being thoughtful about where in the process those changes need to be made. So with that, I’ll stop there and no, we’re way over time. I apologize. She’ll be but you know, we’ve made a lot of progress, I think, hopefully as a company and as an industry, but we’re just getting started. And I think there’s so much work that needs to be done to improve the culture of our companies and our communities to make sure that more voices are included in the room so that we are making products that are thoughtfully and intentionally designed. But I am excited that if we can set up some of the processes of sharing and building this knowledge together, that we can get to a place where more and more products are building this into their process, and we’re seeing more and more products have an effect. I love that. You just nailed it. Such a great speaker. I loved your process. I love the thought process behind you know, making a change is not easy and to be actively involved in the design and the thought process and to N getting all the teams involved not working in silos just like you said like talking to the product. Teams and talking to the design implementation teams. And it’s, it’s yeah, you got to change. It has to start from the culture within the right. And it seems like from what you are sharing, Google is doing a great job at that. I mean, there’s always more that you can do but you got to start somewhere. Right so definitely much more to do but I think we’re trying to push forward and awesome baked off the sea and do we have any special? I know you answered the shared vocabulary question. So thank you for that. And I do not want to bring up that controversial topic right now. So we’ll let that one go. And there are more comments. People are going to be engaging with the asking questions. So we’ll see. We’ll be around and she’s on LinkedIn and good luck. I invite you to even join our community. We’ll see we have an actor. I know we talked about it, but we would love to have you and

25:00
I would love to please do anyone has questions or wants to talk more please do reach out to me on LinkedIn. Always happy to connect and show people talk more. I’d love to be more part of the community. So looking forward to hopefully talking to you all. Thank you. Take care. Bye-bye. 

Tulsee_Doshi-AI DIET World Speaker DataEthics4All

Tulsee Doshi, Head of Product, Responsible AI & Human-Centered Technology, Google

AI-DIET-World-Hero

DataEthics4All hosted AI DIET World, a Premiere B2B Event to Celebrate Ethics 1st minded People, Companies and Products on October 20-22, 2021 where DIET stands for Data and Diversity, Inclusion and Impact, Ethics and Equity, Teams and Technology.


AI DIET World was a 3 Day Celebration: Champions Day, Career Fair and Solutions Hack.

AI DIET World 2021 also featured Senior Leaders from Salesforce, Google, CannonDesign and Data Science Central among others.

For Media Inquires, Please email us connect@dataethics4all.org

 

Come, Let’s Build a Better AI World Together!