Ethics as a Preprocessing Task: A Teaching Experience.
❝Data ethics should be part of the data pipeline and not an afterthought❞.
❝As I am a Data Scientist I believe that there are technical solutions to bias and racism which are exhibited by machine learning models❞ ~ Brett Drury
Ethics as a Preprocessing Task: A Teaching Experience.
The DataEthics4all hackathon had committed itself to provide training as a way of leveling up some of the less technical participants. And because hackathon enrollees were distributed around the world, the delivery of the training had its own unique challenges. My main concern was that I wanted to make the training interesting not only for the students but also for myself. I have taught in a University, and often lectures I gave were one-sided, where I talked, and the students listened. This is something I wanted to avoid because I wanted the course participants to be engaged and exit the course being more knowledgeable than when they entered.
Bias in models is a result of badly or unprocessed data, and correctly processed data can mitigate the bias in the underlying data
The first challenge that I had was what to teach. And as I am a Data Scientist I believe that there are technical solutions to bias and racism which are exhibited by machine learning models. The premise of the course was that bias in models is a result of badly or unprocessed data, and correctly processed data can mitigate the bias in the underlying data. The course material was designed to draw students’ attention towards areas where bias can creep in such as labeling, and unbalanced data. I also wanted to make sure that the students were aware of the practical implications of biased data which can be very large fines levied on their employers.
The second challenge was how to present the information. Ethics in AI touches on very sensitive areas, and it would be easy to lapse into tired stereotypes to highlight issues of bias. And as the audience of the course was likely to contain members of groups who are affected by bias, it would have been easy to alienate large members of the potential audience.
And the final challenge was how to make the course engaging through a passive medium such as video conferencing. This was the most difficult challenge, and I hoped that I met this issue by providing breaks in the lecture where students could ask questions as well as providing the slides, and Jupyter Notebooks with code ahead of the training course. The Jupyter Notebooks held code for practical problems that the course participants could solve at their leisure before, during, or after the training course.
The day of the training course arrived, and I hoped that my pre-course activity and publicity would attract a sufficient number of students to make the effort of creating the training course worthwhile. I was not to be disappointed as a larger number of people than I had expected attended the course.
The course passed without incident. I hope that I succeeded in passing the message that data ethics should be part of the data pipeline and not an afterthought. And the policy of leaving gaps in the course for students to ask questions worked reasonably well. Although the quality of the questions was high, the number of questions was lower than I was hoping for.
I suspect that the students did not feel comfortable asking questions, and that was probably my failure. The Jupyter Notebooks were a partial success, as they gave me time to demonstrate practical concepts, but it was not possible to get course participants to attempt the exercises during the course. If I was to do this again I would split the training course into three, a formal lecture, a practical session, and an interactive tutorial. The framing of each session would shape the expectations of the students and hopefully, they would react accordingly.
From my perspective, the course was a qualified success, and from the experience of teaching on the day, I hope to improve the course for the second iteration of the Hackathon in 2021.