Keynote: Empowering Individuals to be Active Participants with Marie Wallace.

❝Patients can actually disclose Data without disclosing everything❞.

❝Small companies have a huge amount of access and control over Data across every single industry.❞~ Marie Wallace



❝In telehealth... visits in March 2020, compared to the corresponding period the year before.❞ ~



❝Florida-based AdventHealth plans to deploy Epic’s electronic health record system across 37 of its hospitals❞ ~

❝It takes 2 billion a year in admin costs just to keep provided directories accurate❞ ~ Marie Wallace.

In this Keynote, Marie Wallace of IBM talks about her concerns around the way AI is being used to monitor and track users, an advocate for data privacy, data portability and end to end encryption, join Marie for a thoughtful and engaging discussion around the potential solutions that will enable individuals to become active participants in their data future and move towards more transparent, safe and ethical AI.



00:00 Hello everyone, I am Susanna Raj and I am beyond thrilled and quite honored today to introduce you to our next guest speaker, actually our keynote speaker Marie Wallace who is our very own data ethics advisory council member, she is also a data strategist analyst and scientist at IBM Watson health a lower of technology with a fascination for all things analytics with a particular interest in the role of analytics in transforming how we live and work with a long-standing love for the humanities

00:50 I think she’s kind and she’s the kind of scientist the world desperately needs now right so here uh Mary is also on the advisory board for the digital repository of Ireland at the Royal Irish Academy and the digital arts and humanities Ph.D. program at Trinity College Dublin she  has been working in data privacy for AI long before it was even cool to do so her decades-long advocacy in decentralized identity has was started long before we even realized the dangers of personally identifiable information in covid 19 contact racing apps

01:35 so it is indeed our privilege today to hear directly from such a pioneer in data privacy and she is going to be telling us all how to empower us to be active participants in the global data, ecosystem let’s give a rousing welcome to Mary Wallace, Marie the stage is all yours now

01:56  thank you so let me show my screen I’ve got a few slides to share with you so yeah okay okay so the global data ecosystem so why um of all the topics I could have chosen to talk about um why do I specifically want to talk about the data ecosystem well I think there’s it’s not going to be a surprise to anybody for me to admit that it’s currently broken is one of the reasons

02:36  so if we think about AI we’re only as good as the data that feeds us so the reality is for AI to work at any level you need a vibrant diverse data ecosystem and this is the one thing that we don’t have today and during In this presentation, I want to talk a little bit about why I think we have a challenge today with our data ecosystem, I want to discuss what I think needs to change and I’ll talk about how it needs to change and then some of the challenges and opportunities if we do make that transition

03:06  so the current system we have today as I indicated it really isn’t as vibrant as it needs to be and some of the key reasons for this are one is that it’s actually controlled by a small number of companies now when I say these people are probably going to think about the big social media companies that have a lock on the data that goes around social networks but I think this is happening across the board if you look at the financial services sector and the companies that do the credit scoring it’s a small number of companies that basically control everybody’s credit record, if you look in the healthcare space it’s the same thing you’ve got a small number of companies that um control the majority of the electronic medical records systems

03:44 so this notion of small companies having a huge amount of access and control over data is across every single industry the second thing that I believe is an issue is that today the generation of data the collection of data and the exchange of data is very opaque people don’t really know exactly what is being collected and the other thing is as well as most if not about all of the data exchange that happens today predominantly it circumvents the individual so if you think about any company that’s collecting data about you if they know something about you you absolutely, in my opinion, you absolutely have the right to know what they know

04:24 and I think there are a few significant benefits to leveling the playing field and allowing individuals to be equal participants and one part of it is this transparency and it has a number of knock-on effects so not only does it mean that we would get access to the data that people have about us but also the very act of us being able to see the data puts the pressure on the people generating the data to make sure they’re doing things that are credible, that are ethical um that are reasonable, and there’s an old quote from um uh what oh god his name escapes me now um a federal uh judge in the UK in us which was that sunlight is the best disinfectant

05:05 and if we want people to be ethical about data then let’s open it up we’re not saying open up the algorithms necessarily but at least let every individual see the data that a company has about them so that’s another aspect that I think really needs to be changed so what we have today this this this data ecosystem that’s predominantly controlled by a small number of companies where the individual is excluded from the actual flow of data this is fundamentally not sustainable in the long term and it needs to change

05:36 the other reason why it needs to change before I move on is that today, because there’s this imbalance in the network people, are nervous about data and all the conversations about data privacy it’s all about how can I stop data being generated about me how can I stop data being collected and I would argue that we actually want more data isn’t the enemy what’s the enemy is what is done with that data

06:03  so if we could create a more equitable ecosystem where the individual had actually control and access and a say and involvement in the flow of data I believe that in the long term this would actually generate a much more vibrant ecosystem with much more diverse data passing around it

06:22 so what exactly needs to change in order for us to realize that,  well the first thing we need to stop this tug of war between organizations on the one side that wants to keep control of the data an individual on the other who are nervous or scared or worried about what a company has on them,  so we need to start by firstly making the individual an active and equal participant, the second thing we need to do is let them have a say in what data is generated about them and to have access to that data, and the third thing is we need the ability to choose who with whom they’re going to share that data and what data they’re going to share and with whom and that needs to be their choice and if we do this, I fundamentally believe that this will start to build trust between all participants in the ecosystem, particularly if this exchange is managed through a kind of a transparent privacy-preserving, auditable but verifiable data exchange

07:26 and while this all sounds very you know interesting and exciting and most people think of this as it’s really challenging because how in hell are we going to actually make individuals equal participants when today they’re so far excluded from the data exchange the best thing they might be able to do is give consent but once they’ve given consent then you’ve got these companies that are just passing the data around the place so how are we going to change this?

07:50 well the good news is that this isn’t a new problem this is a problem that um technology innovators around the world have been studying for probably upwards of maybe longer than five years but at least for the last five years there’s been a vibrant ecosystem around something called self-sovereign identity, which has been exploring this idea of allowing the individual, the holder to actually be in control of their own data

08:17 and the difference in a decentralized or self-sovereign identity ecosystem is that as I indicated instead of you having two companies an issuer who has data about an individual and a verifier who wants access to that data for a particular business process, instead of them passing data back and forth without our involvement what this does is it allows me as an individual to be issued data by an organization, so I can see what data they’re generating about me, I have access to that data and then if the third party like a verifier wants access to my data they don’t go to the issuer to ask for the data, they come to me it’s my data they come to me if they want access to it

08:58  now this isn’t necessarily a novel idea but one of the challenges today like there are a number of systems out there where you have digital wallets and you have the idea of people having access to their own data but one of the fundamental challenges is this concept of trust so if I’m a verifier if I’m an organization and I need to know something about the holder I might need to know are they over 18 because I’m going to serve them a drink or I might want to know if they’ve got a driving license because I’m going to rent them a car

09:26 how do I trust that the data that they’re sharing with me is indeed valid, that it is data was issued to me I am who I say I am that the person who issued the data is indeed the organization that I believe they are so if it’s a driving license it is indeed the DMV that issued that driving license and more importantly that the data hasn’t been changed so the DMV didn’t issue me a driver’s license a year ago I subsequently lost my license um but I either don’t have an updated um judicial driver’s license or I’ve modified the license I have so if you could actually fix this problem and allow the verifier to trust that the data they’re receiving from the holder can be trusted and they don’t need to go back to the verb to the issuer that does a few really important things

10:12 one is the obvious thing which is it now allows the individual to be an active participant but there’s actually a much bigger benefit I believe from an AI standpoint and from the standpoint of the entire data ecosystem what it means is that many many small companies are it a mom and dad restaurant and you know can basically ask somebody to prove something they don’t have to be integrated with a large organization has the data they can literally connect directly peer-to-peer with the hold with the person that’s coming into the restaurant or the person that’s looking to rent a car they can exchange data and then they can trust that data

10:50 so what it actually does is a few things it, first of all, allows data to flow much more easily because now the holder can flexibly choose who they want to share data with and no big integrations have to happen and you don’t have things like data blocking or anything like that between large organizations, and the other so that a lot more people can have access to the data the individual controls it and then the other really nice aspect about self-sovereign identity is this notion of the cryptography aspect of it, one of the key aspects to this type of infrastructure is that all data is encrypted and there’s heavy encrypt cryptography both in terms of an individual’s secret public key pair so the data that I have I can encrypt, I can sign my signatures can be verified I can mask what day I want to show I can choose what data I want to showdown to the field level

11:44 and also decentralization and what this does as well and specifically when you look at the data being on the edges peer-to-peer this gets rid of the centralization of personal data so it removes the central target for cyber-attacks and it also for companies, it reduces a lot of technical and regulatory expense because they’re not managing large quantities of personal data the data stays on the edge with the holder and then they choose when to share it

12:04 so just in terms of who is kind of buying into this concept of decentralized identity, um well there’s a lot of different organizations that are working in this space if I just look at the standards bodies to start with what we have is we’ve w3c that as I indicated about five years ago they started working in this space, and they’ve at this point in time I think maybe about a year or so ago released the first two standards in this area

12:38 so there is a did specification for how decentralized identifiers should be constructed and there’s also another standard around verifiable credentials so this is the structure around which people can exchange these verifiable credentials so these data artifacts that people can exchange and they can verify there’s also another um relatively new open standard the initiative which is called the trust over the IP foundation is under the Linux Foundation and they’re also looking at how can you build out an architecture which distributes trust across the machine layer, which is where the cryptographic aspects of it are and then the human trust layer which is where you have the business legal and the social layers

13:19  so these kind of frameworks are really really putting a lot of structure around people working in this space  we then have just numerous open-source projects so this is only a tiny tiny fraction and there’s the decentralized identity foundation which is building some of the keys um foundational components of the dod ecosystem things like the resolvers for did and then you have got um I’m working in the blockchain so a blockchain implementation for this so hyperledger has a lot of projects that are focused on this specific area hyperledger indie being for the actual ledger itself and then you’ve got areas for the exchange of credentials and that Ursa is the cryptographic libraries associated with it

13:59 then finally just this is a small sampling of some of the companies that are actually building out these decentralized identity solutions and you’ve got such a diversity you’ve got some of the big players like Microsoft and IBM and cisco and you’ve got more of these like sovereign and everyone and some of these smaller companies member pass so this is just a large variety of companies so I think this is a technology that um has been bubbling away for years um it’s for the last 45 years,  anyway but I think in the last really actually since covert I hate to say this it’s kind of really focused everybody’s attention around data so this is really really um the interest of this has increased you know tenfold I’d say probably in the last six months

14:48 so so challenges and opportunities um so there are some big challenges um but there are endless opportunities just to quickly touch on some of the challenges um, to be honest with you I think there are obviously technical challenges I’m not going to downplay the fact that there are some real technical challenges that are making this happen but I’ll be honest with you my opinion is the biggest challenges aren’t actually technical they’re organizational, they’re legal um there are some really big companies and that have a real stake in the data ecosystem

15:20  and they do not want things to change and there are governments who aren’t sure about this new way of doing things um so there’s I find I spend a lot of time trying to help people feel not to feel scared about this, this is a change but actually, it’s a change for the better so I think a lot of our challenges are really related to um kind of organizational legal and business components the technology I honestly believe is made leaps and bounds over the last several years um and it’s and it’s really ready to do the job um if we actually can get everybody brought into to the idea

 15:55  so that’s the challenges now let’s look at some of the opportunities so what I wanted to do am I wanted to just talk about a few different use cases um as was indicated earlier I work in IBM and I work specifically in healthcare, I have I’ve spent a long time working in different industries um but in the last several years I’ve been working in healthcare so the use cases that I’m going to highlight are healthcare use cases, but I think it’s fair to say when you look at them you’ll see that these are fairly generic problems and they could be applied to any um any industry, so let’s look at the first use case so the provider credentialing and also what I should say these use cases, as well as they, do have a covert dynamic as well so let me just look at the idea of provider credentialing and let me first of all start by talking about a concrete problem that we’ve had in the covid pandemic

16:46 so one of the big issues we saw in the earlier days is when we had the situation in new york where we had really really um growth in the numbers of covid and they had a real shortage in hospitals hospital beds a real shortage of doctors actually to treat patients and there was a lot of doctors around the world that wanted to come to new york and help the doctors in new york which is a wonderful thing to want to do. The challenge today though is doctors aren’t actually that portable because in order for you to go to another hospital and actually practice in that hospital you have to be credentialed, which is completely reasonable but credentialing is a hugely complex,  really really time-consuming process, so this is one of the areas where something like decentralized identity and verifiable credentials could make a massive impact so let’s just for a second think about the problem

17:37 so every clinician has to credential with every single organization to which they’re going to onboard so for example if I’m a doctor and if I want to work in one hospital and if I want to have secondary rights to another hospital I’m going to have to be credentialed in both of them if I want to be covered under one insurance plan I have to be credentialed with the payer if I want to be covered by another implant I have to be credentialed by another pair and every time I have to be credentialed it’s a massive amount it’s like a stack of papers a stack of data and some of a lot of its paper some of its digital

18:11 and that data has to be verified every single time for regulatory reasons course and the verification process is very very involved also we have the data coming from multiple different sources so while the individual is exchanging the data so the provider or their admin is providing a stack of data about themselves the source of the data could be all over the place so you might have a state license board you could have medical associations um universities and governments um, for example, your mpi numbers and you have insurers to look at your insurance coverage um you might have law enforcement um to see if you’ve been struck off for any reason or you’ve got you to know any DUI’s or is there any other reason why you shouldn’t be hired

 18:52  so so there’s lots and lots of data come from lots and lots of sources provided credentialing can take upwards of six months I mean In best case scenario you’re looking at three months and but it can take six months and even a year in some cases for providers to be credentialed so in and in that meantime, they’re sitting twiddling their thumbs they can’t actually practice while they’re waiting

19:12  um so that’s obviously not a good situation to be in and the other aspect of it as well provides a directory so um  us the government has put in some regulations around the need for provider directories to be accurate which makes sense when you’re a patient and you want to see what sort of doctors are there that you can sign up for you want to make sure that the information is accurate, unfortunately keeping provider directories updated with all this plethora of data, it can be extremely difficult

19:37  and it’s estimated that um in us they spent about 2 billion yes billion with the b 2 billion a year in admin costs just to keep provided directories accurate, on top of that cms had estimated that in medicare directories alone about nearly half of all provider entries have errors in them and those errors can actually make people liable for fines so this is a huge problem so what why do we believe or do i believe that that something like decentralized identity could actually help us well one of the reasons is that just imagine this scenario, so today I’m a doctor and I have information about my medical license I had a I have a medical license from California, and what I have to do is I have to share their medical license whenever I’m being credentialed um if that medical license data was a verifiable credential that I had on my phone, that could be verified so that they didn’t have to verify it nobody had to verify it, what that would mean is on my phone at any point in time I could share the credentials needed for me to credential and I could be actually credentialed in real-time,  because the verification process the cryptography the reference on the blockchain ledger everything needed an order for the receiver so the insurance company as an example to trust yes Marie she is who she says she is and yes this is a medical license and it was issued by the state of California they could trust that and they wouldn’t need to do any verification

 21:04  so all of a sudden you could start to take two billion dollars of administration costs out of the system two billion that you could actually spend treating patients um so that’s obviously significant the additional aspect as well is if you could make the credentialing real time you can now start to increase the portability and the flexibility the movability of providers, so it means that a provider a doctor would be much more able to move between plans and between employers and if as the example of covert if you had a doctor in Texas, for example,  that wanted to go to new york, they would at least have a very very easy way of getting credentials in real-time with the hospital in new york so they can start to treat patients

21:45 so that obviously the portability question and the other thing is as well as transparency so if we think about it from the patient standpoint and if a doctor had verifiable credentials on their phone um or on there, you know their laptop or wherever that they could share with me at any point in time this all of a sudden now a lot allows me to get much more information about adopters credentials and its information I could trust it’s not checking on social media to see what you know you know post they had or checking you know 50 000 different places you could absolutely reliably get data about an individual and know that it’s verified and it’s trusted

22:23  so this I think would also allow um trust between the patients and the providers the other the thing it would also potentially do is avoid billing surprises one of one other the issue that can frequently happen is if a provider is in the credentialing process and it’s not completed yet and or they think it’s completed but it’s not and if they um treat a patient and the patient then goes to claim obviously to get um refunded or for that treatment or the doctor is looking to get paid for that for treating that patient they would actually not be paid

  22:56 so there are cases where either you get doctors thinking that they’re going to be paid for treating this patient and they won’t be paid or you get patients who could think that a provider is in their health plan and it turns out that they’re not so then they get a billing surprise at the end of the day so again back to the data ecosystem if we have now a doctor able to share verifiable data with the patient now all of a sudden data is flowing between these two individuals and everybody’s benefiting so again we’re increasing the vibrancy of our data ecosystem

23:25 if we look to another use case and this is also somewhat relevant to covert if we look about patients so let’s just take the For example if there’s been a lot of talk about the idea of immunization passports or things like that and there’s been a lot of fear and I can completely understand why but just think about it a little bit differently think about the idea that if I had control over my health data and if I could choose what data I want to share when and I could also share it anonymously so, for example, you might have a restaurant that wants to ensure that that they don’t have anybody who’s been tested covid positive coming into the restaurant um now they don’t it’s like somebody proving that they’re over 18 to be able to get a drink you don’t need to see a driving license you don’t need to see their age you don’t need to say their name you don’t need to see anything about them you may just want to know are they coded positive or not

24:22  so with something like decentralized identity it allows real data minimization allows the basic amount of data to be exchanged which is yes or no without exchanging any other information about that individual so that I know is maybe a little bit of a trivial example but maybe if you’re flying between countries that might be more relevant so so how does it work if patients have control over their own data so

24:42  so today what happens is patient records are spread all over multiple institutions um there’s there while there do you think the practice of data blocking or information blocking where you’d have um EMR vendors being very slow to release data to hospitals that weren’t part of their the network now that’s been outlawed since the cures act came in in 2016, um I think 2016 / 2017 um but there still can be significant latency between the movement of data between emr systems which can impact continuity of care and most importantly could significantly impact patients during an emergency situation so if you have to wait a long time to get data about a patient and they’re in the emergency room this obviously is not a good situation to be in

25:23 and also there’s a lack of transparency because I as a patient don’t actually know all the information that is out there about me across these multiple emr systems and another thing is as well as emr systems today are designed to exchange data between other emr systems but if I am an individual and I want to share health data for some other purpose it might be for health insurance it might be the covert example of going to a restaurant, it might be for some other reason emr systems aren’t inherently designed to allow me to share health-related data in a high privacy-preserving way so for example with anonymization and things like that so what decentralized identity does is it puts the patient in control of their clinical data, it allows them to have it in their hand so that they have it whenever they need it so if for example they were in an emergency the situation they would be able to exchange data with the hospital without any latency

26:17 and because it’s trusted verifiable the clinical setting would be able to trust that the data was accurate because today if you just get information the doctor can’t be sure that it’s correct whereas with verifiable credentials there’s the trust layer so the doctor could trust that the data being shared with them is indeed valid and could use that data in terms of treating a patient

26:36 so the other thing that’s really nice about decentralized identities as I mentioned a second ago is that that now the patients can actually disclose data without disclosing everything, so for example you could disclose data anonymously, so you could disclose that you’re covered positive or you’re covered negative without disclosing any identifiable data and it can still be verified which is really a cool thing

26:59 so I think it fundamentally puts the drive the patient in control of their own data um I’m just conscious of the time so what I’m going to do is I’m going to quickly and move on to the final slide because I just wanted to have a few seconds just in case there’s any q&a um so let me just talk about how to get started so what I would suggest for anybody who’s kind of interested in this topic um there is there’s some background technical information that you might be interested in, there’s also I would strongly recommend taking a look at w3c there’s some really good information about the specifications that they’ve released and there are some good use cases that have been described as well which will kind of giving you a sense of the variety of ways that variable verifiable claims can be used and there’s a link to the trust over IP foundation and the link to the decentralized identity foundation which is also a really useful resource and I’ve also included my own blog I write extensively on this topic and so if anybody’s interested to hear my ramblings further on I’m always more than happy to to share my opinions so um I just want to share if there any questions um Shilpi

 28:06  yes Marie there’s um there is a question thank you Sudha is asking I heard about privacy by design architecture being set up in large companies to limit sharing of data within the companies tied to end-users data constant are you seeing this working in balance with balancing between consumers and companies

28:31  um so in IBM, for example, we are everything we do is is is privacy by design it’s the very very first step that you do what I would say is there can be a challenge in certain cases because privacy by design to some extent can be perceived to constrain a project particularly from a data science standpoint

28:53   in in in a world which we have today where data is flying all over the place and people can pick up data from anywhere,  the idea of being more controlled, being more structured having to track consent and that that sometimes can be perceived as a roadblock, the thing I will say is in the projects that I’ve worked on whenever you start with privacy by design, at the center, a couple of things happen,  first of all, you actually build a solution that is going to be compliant from a regulatory standpoint because what sometimes happens is you might build a solution, and then you look to release it, and legally you’re never going to be able to ship it because it’s non-compliant

29:31 so first of all it gets rid of these surprises at the end of the day that you have a solution you can actually sell but I think what’s more important is I found consistently every single time I put the individual’s needs ahead of the product needs some new value has been generated that I haven’t even thought about so there’s always been a greater return on that investment sometimes it might make things a little bit tricky from the product development standpoint but there’s always been an unseen actually benefit that we’ve started to realize as we’ve gone through that process

30:07 so I  think it is an investment that you have to make upfront of a project but I’ve not got a significant return on that investment that’s at least my experience I see yeah no that makes sense any other questions for Marie thank you Marie this has been an awesome presentation and talk I really enjoyed uh everything that you discussed and I love that you have these uh you know resources at the end of your slide where you know people can learn how to get started I think that’s really valuable and um I want to remind everyone that Marie is not only an advisor she’s also on our data text for all community platforms so we can continue the conversation there as well as feel free to reach out to her on LinkedIn and Twitter and other places for sure I’m always happy to talk about this thank you, Marie, thank you so much.

