Back To Top

Sources of Truth in a Dangerous World: AI DIET World 2021

``We need remote work for attorneys and civil rights workers.`` - Golda Velez

“We all would love to be a part of something like this and bring up a project and make that impact.”

Golda Velez

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

“I feel like organizations over time can be trustworthy”

Golda Velez


Up next. We have Golda Velez. Golda Velez is a senior software engineer in risk. Hello, my title is Senior Software Engineer dangerous Caltech graduate, and human rights advocate. And I know she’s so much more than that what she has, you know, briefly mentioned here I’ll invite her Hi hi. How are you? I love your very brief introduction. I know you’re so experienced and you’ve got so much work and you are a leader in your own day technology. And you have some very interesting talk today that you are going to share with us I’m really excited.

Thank you. So there’s going to be more bottom-up. It’s not going to be rigorous scientific because I didn’t have time to do a scientific experiment based on the proposals I have. But yeah, let’s jump into it. I will share my screen Sure. What sees if that works. Okay. If I have to run to my second monitor, I will do that. And see, okay, great. So I’m just going to go ahead and present this. And can you see all right yes, we can build up all the best.

Thank you so much. Alright, so I want you guys who are listening to be ready to take a little bit of a different mindset. I’m going to shift your framework just a little bit. Because a lot of times when we’re talking about ethics in our eyes, we’re really thinking of ourselves, and most of us are high-end executives or software engineers in US and European countries. We have a certain life that we live in certain pressures that are on us and certain considerations we have but I want you to think a little bit about being connected in this larger world. We’re actually a very large number of people in the world that is not their situation. They may have similar skills, but they are actually in a world where they’re subjected to arbitrary violence. And that’s true in many places in the Middle East. It’s true in places in China. It’s true in places in Russia, is true in places in Africa, and South America. And in some cases, in some of the countries that we think of as the developed world. There are actually some instances of this, but mainly I’m talking about places where everybody knows that you can get hurt if you are caught saying the wrong things and there’s a lot of those places in the world. We are all connected. So the other shift I’m asking you to take is from Oh, I’m reading the news. Oh boy, that happened over there that has nothing to do with me but has to do with me. You know, I work at Uber, one of our board members happens to be a fellow who happens to be a Saudi guy who happens to be connected to people who he could have a very strong influence on some of the things in the Middle East if he wants to. And I am taking that money. So I am connected to that. I think a lot of us are more connected than we like to think about that how does this relates to AI? And this is why I said sources of truth. So if you want to take AI and try to solve ethical problems, what are the ethical problems we want to solve? How do you make models? How do you make you know we use the sources of truth to train those models is everyone who does AI you know that the source of truth means that’s the thing your model is trying to produce. You have a training set. Where do we get that training set for this kind of serious problem? And just kind of looking at the bigger picture of the problem. What problem am I talking about? I’m talking about the problem that disinformation, defends violent acts. I’m going to make the hypothesis as a hypothesis that entities that commit violent acts, not through a justice system, but just arbitrarily also tend to lie about them. And they tend to smear the victims of those acts. They tend to create a story that defends their acts, which may be provably not true. And they then promote that story on social media, and they have a lot of financial resources to cause that story to be promoted. So that’s what I’m talking about combating is the use of entities that have large amounts of financial resources to defend their violent acts by disinformation campaigns. So I feel that AI can be used for that. Now, some of you have, you know, when you think of AI use in social networking. You might think of things like Facebook, where we flag hate speech. We’re flagging those things and sometimes it’s individual hate speech, but often it’s actually troll farms and coordinated campaigns. And of course, there’s a consideration for free speech. You can’t just tell people they can’t say mean things because they can say mean things. But you have these kinds of things that have happened that were in some cases demonstratively. Supported by financed heavily financed disinformation campaigns. You can measure coordinated, inauthentic behavior, that’s something that’s that can be analyzed. You have all these different sites. This is a particular one, which was in the Middle East. But you know, it’s demonstratively a cluster of coordinated really posted releases about a particular subject. So this can be measured in those posts can be taken down. So a lot of places or Twitter or Facebook, do have algorithmic methods of finding a coordinated inauthentic behavior, but because it’s just an algorithm, they tend to be careful when you penalize it whenever you’re trying to set up a fraud response or risk response. You have to be careful with false positives. So you have to prioritize we’re going to penalize you can’t just strongly penalize everything that looks like inauthentic behavior, because at some point, you’re going to catch real users. Who is just going to tell their friends things?

So you have to kind of be able to prioritize those and that’s what we’re going to get into. Also, I want to emphasize that I’ve spoken to people who work at Twitter and work at Facebook in these areas, and they have told me that without humans, without the human operators of finding those sources of truth, they would fall down in a couple of days and just be overrun by spam. So we’re still really dependent on human judgment, for the source of truth for many things, the coordinated, coordinated campaigns can be detected algorithmically, but we’re really dependent on human judgment to create the source of truth for the models to ban the disinformation. Now, those human operators are low-paid people who may be in the Philippines who are scanning through all these violent images using some heuristics. And, you know, that’s what we have. That’s our defense is there but there are other things okay. So that’s, that’s what I was also going to talk about here is that, you know, we have all these kinds of human operators quickly scanning through works pretty well against spam because the incentives to spam are financial, and so you can kind of combat them either in a blockchain kind of way or just pretty simple, but if the incentive for the disinformation is outside the network, and it’s not just financial, it’s a dictator who’s in control of a lot of money, doesn’t want to have something exposed that he did, and he’s going to use all his resources to prevent it from being exposed. There’s just kind of overwhelming financial resources being set on the disinformation side of the equation. And then we have you know, these people trying to find it based on some criteria. They’re scanning through all this violent and explicit sort of stuff, and they have to be clicking on it all the time. So that’s one method that’s being used right now. There’s something else that’s happening that I think is not being tapped into. And what that is, is that there are already organizations that are what I call the high lift, but they depend on their credibility. The value of these organizations relies on their credibility. They are very careful before they accept cases. I know this because I’ve spoken to family members of Italians kidnapped, and I asked the Committee to Protect Journalists, will you add him to your cases and they’re like, Well, you can’t add him unless you translate everything he ever wrote. And we make sure he’s not a terrorist. So they really spend a very, very high lift before they will assert this guy was kidnapped, extra-judicially, you know, and injustice that he was kidnapped by his government. They, they really have a high lift before they take on that case. So there’s this high effort being done. You know, Bedouin camp has these volunteers that are really spending more time than the other human operators being hired by Twitter and Facebook carefully, logically analyzing photos. You know, in places like the New York Times and Washington Post these people spend a tremendous amount of effort. And we are not capturing that effort into the sources of truth except sometimes through the human operators who might look it up on these sites. But isn’t there an automated way that we could capture this stream of high lift high effort, kind of anchored assertions that we could then use to prioritize our models and interactions of our models? So that’s really my question. Can we if we set a clear goal, and more clear goals, we want to reduce disinformation that leads to violence. We have to have a way of measuring that or that’s associated with violence. And can we use these highly credible sources to maybe increase the penalties? So we’re not just saying oh, here’s some coordinated actions that happened, but it was all just to sell potato chips. So here’s a coordinated action that happened to smear the wife of a murdered person, and let’s prioritize that one. And on that one, let’s have a higher penalty that we can do to those devices in those accounts because they not only undertook a coordinated action they undertook the harmful coordinated action. And so you’re willing to set a lower bar or a higher bar as you will a higher penalty to lower bar for the penalty because they were so harmful, we’re willing to risk for false positives to eliminate this network that was extremely harmful, as opposed to the one that was trying to sell potato chips if you might let it sneak through.

So how do we do it? Um, I’m an engineer, but I’m not saying this is a finalized thing. I just wanted to put something out there. So it can be iterated on. You have to have some kind of machine-parsable format you have to have signing and credibility sticking because you can be guaranteed that the bad actors will attempt to put assertions into your chain of assertions, they will try to influence it. So you have to have credibility, staking by whoever is allowed to include reputation assertions, and you have to be able to either have a trust where we assign credibility for good reasons. Or have a way for people to lose credibility, because you have to really keep that a clean schema and it could propagate from those trusted organizations like CPGA, like Bellingcat that are already doing the highest level. I’m going to say I trust them, and I trust what they say. And we even pay them for it. And we have to be able to talk about entities. So there’s a couple of challenges here. But I think that we could do some of it quite simply. So here’s like a simple format. You have some subject that we’re talking about here, some assertions will have some qualifiers, and here’s who said it. And I would like you in the real world, we recognize the difference between me saying, I saw this with my own eyes. I was there I saw a policeman grab her by her hair and pull her down. I can say, I spoke to the sister of someone who was kidnapped, and she says he’s gone and she says that she can’t reach him. So I spoke to her. I might speak to someone whose name I cannot reveal because they would be endangered that I can assert I can stake my credibility that I spoke to them. Sometimes there’s been a thorough investigation. Sometimes you read an article and you’re simply sourcing from an article it’s on is it just your opinion? And I think we need to distinguish between those things, and create some kind of UX that allows people to easily make these rich assertions and not just click like thumbs up, thumbs down. It’s kind of like that reporting interface that we get. But I think that these things are so valuable in the real world and yet, we have no UX to capture these richer assertions. It strikes me that there must be some low-hanging fruit left on the table here that has a spot on the table with the thinking that there’s something that we’re not using in our models because this is what happens in the real world. We investigate things. So here are some sample assertions are position CPJ. And you might just be using a string, you might be better if it was an entity, but a string is somewhat useful because the strings are what you’re going to find on social media. You’re not always going to find social media referring to the entity, we’re just going to use the string. So here’s the verb. Here’s how we know about it. And here’s who. And then here’s one, there’s two of them. So CPJ did do an investigation says he was kidnapped. I also talked to someone who said he was kidnapped to know Him who would talk to people who do so I have secondhand information they have and so so you can then roll these up and give different levels of credibility. Now, how we’re going to how does this relate to the automated bot networks? One thing that happens quite frequently is you can use negative sentiment analysis to see when an automated bot network is creating negative sentiment towards someone who was actually initially so there was a smear campaign against Jamal Khashoggi visit, he was a terrorist and he was a terrorist, you should bring him to trial not kill him in some, you know, hidden way. So if there’s an audit if there’s a smear campaign against someone who was also harmed and in an extrajudicial manner, I think it is legitimate to backpropagate and say whoever is coordinating a smear campaign against someone who was imprisoned without a trial. That’s a problem and we should be able to eliminate that without being so afraid of the freedom of speech because it’s associated with harm. It’s not just that we’re saying go this guy’s saying dumb things is that we’re saying bad things about him and also we could not that’s a different thing. So you could use something like a box that you can find coordinated disinformation, your sentiment analysis, and then and then associated with those assertions, so it’s a little bit of a heavy lift, but I think it’s quite doable. I think it’s important to note that it should be done. There will also be other uses of reputation fees. This is a proposal I made in the Twitter blue sky project. It’s actually in a different order. So don’t get confused. Here. The user ideas first, and these are more granular assertions. Maybe here’s a journalist who says this photo I took it here’s some other guy who also says he took that photo. How do we know which ones which this guy says that I did? But maybe, if we trust Bellingcat, and Bellingcat says that this photo was adopted.

I’m sorry, he said he took a different photo, that if we trust Bellingcat, we could propagate the trust and then have low credibility for this guy who’s associated with a photo that built-in Kazakh was doctored. So these things would be higher lift. I know they may be higher and harder to scale, but I think there if we allow a UX with somewhat rich assertions, we get a lot of data. Because there are people who have a stake in asserting those things. And for example, the Bellingcat volunteers do a lot of granular assertions all the time, but it just comes out as human-readable text. And if we could get that into machine-readable assertions, I think that’s just a really valuable data source that’s being accommodated. So really, what I want to say is that you may not have the full solution, but we have to be working on the right problem. And I know privacy is important. I know other things are important, but you know, she’s dead. He is up for 20 years. He was on Twitter. He had a blog on Twitter with a lot of followers. And they hired someone at Twitter to find his IP address in his location and dragged him away things of the Ramadasa done. This is terrific. He spoke up for his friend who had been kidnapped and the next day he was kidnapped and that was a little over a year ago. He’s still gone. He’s in Iraq. And actually, this elimination forgot her name, but she’s in Russia. She is a journalist who was also killed. So we’re not really making a systematic effort to stop this. There are news articles and everybody goes on with your day. But we’re not going to a system of experimentation and outcome measurement to stop this from happening, and I think we should. So thank you very much for listening. I would love to take some questions. I have a lot of anecdotal burns I could share with you just to chat a little, Golda, great talk. I know we have some great comments here. So what data let me show this what data do you input in your first training model?

Right. So I think that we have to first allow for these granular assumptions, assertions to be made. And then I think you have to allow for entities from trusted to decide on an initial class that you decide maybe there is and that’s a really good question. I see someone saying how do you measure trust that you have to initially, you know, have people talk about it. I think that that having the initial conversation of who’s trusted is an important conversation to have. And I think that there should be a consensus about these organizations which are trusted by our government. You know, by the United States government. They’re trusted by the United Nations, the Committee to Protect Journalists, Human Rights Watch. There is no disagreement on that universe, always a political consideration. But I think that there certainly Snopes, you can name maybe eight or 10 organizations that are high trust and assign certain initial customers, that’s fine. And then you can include things like Sarah who has a human jury on is this human. And if you allow a lot of granular assertions and think that you are going to get things that are provably wrong, a person will assert that they are human when they’re not, a person will assert that they’re in the United States, not in the United States. And some of those things could be tested in a granular way and you can backpropagate the errors. So, I don’t think that I have the entire model completely sketched out, but I think we need a rich data source. And we need to recognize that there’s credibility staking, we can have initial portability and there has to be a way to adjust that and that we need to start those models.

And I also like your thought on credibility is taking from the number of degrees apart, right? If it’s the first person who has direct experience, versus a third person third-degree apart was just read it in news versus second person, like, you got the news from the sister who was kidnapped, right. So I mean, it would be hard to put those kinds of I mean, it makes sense and I can see some challenges and how if he created a public network like that, how we would put weights and parameters on those weights.

The assertion is that I say that I heard the second you know, someone says there was an investigation, so that started what data and then you can have different models that put different weights and you can start testing them with their outcomes, but you have to have the rich data for them to model alright. And that brings us to the point that people are honest and when they are saying that this is what I heard from someone nowadays, people share information as if they have first-hand information on it as if they are the authority on it. So it’s very hard to understand if this is hearsay or if it is this is coming first and I want to think

We can actually, I mean, we could have a view that depends on my point of view, I could say I trust TPJ and anything that they you know, say I want to be propagated strongly to me. Um, so I could say in anybody that they trust, I also trust and you could have fast web. So then the people who are kind of tools are not really going to be kept in that test. They’re going to be in their own bubble and those that trust each other. So I think you can address it

Might be easier for organizations than for individuals. I feel like organizations over time can be trustworthy and we can give them extra ways that they are trustworthy. For individuals. It might be a little harder to implement, but certainly, something to consider. And I think they’re just so many questions. I think God has another question when we talk about long-term democracy, democracy infrastructure, in fact, scientific evidence, and the dissemination of such must be funded. It cannot be a ragtag group of volunteers. If we aspire towards democracy, we must make long-term infrastructure investments.

Yeah. I mean, I would definitely agree with that. I think we’re getting somewhere. We’re getting somewhere with the dowel model of voting on using funds. I think that’s positive. I think that there needs to be a little bit more delegation and a little bit more conversations in the down models, but I think that they’re a good start to having democratic control over funds that maybe could be directed towards that. I would definitely like to see governmental support. Of those. You have to be careful with organizations because they can be bodies of longer than people but they can be taken over to make sure the organizations themselves have an infrastructure that’s not easy to simply take over by somebody purchasing the organization or buying a new board seat.

Lakisha says always showing my face.

Absolutely. I mean, sometimes it has to stop with a second-hand reporter because I can’t say who told me, but I’m willing to stake my credibility. on it and lower my credibility.

When he says however if the information is coming from reliable sources and then corrected based on what we knew then versus what we know now, then they lose credibility.

If I’m asserting that somebody was murdered then they are alive, I think I should lose credibility because I was wrong. You know, it depends on what you’re observing. You should be careful to assert only things that you actually know or that you’ve investigated.

Darren says I bought us a US historical model to predict the probability of trusted information data not perfect but based upon prior experience with souls.

Yeah, and I think that if you use them over time you build up more of that historical data that you could use in a machine possible way. Initially, human experience because we don’t have it.

So I know that goal, you said that this is not a fleshed-out model. This is something that you are thinking about and doing and I would love to get you more involved. We have an active community and we always know you work with blue Stein, you work with a lot of other organizations as well. I would love to continue the conversation and see how we can actually build something.

To do it, I’d be happy to advise people to make suggestions and mentor and bring them, developers. And I want to just mention really quickly, I’ve one minute left. Making direct contact with people is so valuable. And there are developers in Myanmar. There are developers in Afghanistan, that we’ve actually been hired as women in Afghanistan. I have the CDs of about 20 or 30 women in Afghanistan right now who need remote work for attorneys and civil rights workers who could maybe do some of this work, and they don’t need a lot of money, but they need jobs quite badly. So if there’s any funding of $1,000, you can actually hire 10 people and transform the life of 10 women in Afghanistan right now. And I’m in touch with America, the slack group. And there’s just so much that you can do when you get in direct touch with people and sure I would love to participate.

Yes, absolutely. I’ll be in touch. We would love to be data ethics. We’re all would love to be a part of something like this and bring up a project that helps and makes an impact globally. So I’ll see you later. Thank you.  Take care. Bye!

Golda_Velez-AI DIET World Speaker DataEthics4All

Golda Velez, Director, Raise the Voices; CEO What’s Cookin’


DataEthics4All hosted AI DIET World, a Premiere B2B Event to Celebrate Ethics 1st minded People, Companies and Products on October 20-22, 2021 where DIET stands for Data and Diversity, Inclusion and Impact, Ethics and Equity, Teams and Technology.

AI DIET World was a 3 Day Celebration: Champions Day, Career Fair and Solutions Hack.

AI DIET World 2021 also featured Senior Leaders from Salesforce, Google, CannonDesign and Data Science Central among others.

For Media Inquires, Please email us connect@dataethics4all.org


Come, Let’s Build a Better AI World Together!