Back To Top

The Case of the Disappearing Data Scientist: AI DIET World 2021

``Going back to Spider Man, with great, great power comes great responsibility`` - Kurt Cagle

“We are now becoming increasingly responsible for the impacts that we are creating, for society, education for various disadvantaged groups”

Kurt Cagle

“The reality is that AI generally has only been successful in those cases where there is a significant amount of human intervention that is also involved”


``Marketing channels, statisticians in the whole, the field of sports, have become much more data driven just within the last 15 years or so`` - Kurt Cagle

“The reality is that AI generally has only been successful in those cases where there is a significant amount of human intervention that is also involved”

“The reality is that AI generally has only been successful in those cases where there is a significant amount of human intervention that is also involved”

Kurt Cagle


Ten years ago, there were no data scientists. Ten years from now, there will be no data scientists. This is not to say that the field of data science is dying, far from it. Rather, within the next ten years, the data lifecycle will become so pervasive within organizations that it is likely that there will be few, if any roles, that do not, in some way, depend upon a high degree of data literacy and competency. For those people going into the field today, understanding this will be key to survival in a digitally transformed world.

Kurt Cagle is a Managing Editor at Data Science Central. Kurt has been working in the data science and data information space for nearly forty years and has written more than twenty books on data formats, transformations, and encoding. He has also worked as a consultant with a number of Fortune 500 companies and US, Canadian and European agencies. He lives in Issaquah, WA with his family and two very curious cats.


Welcome!  My name is Kurt Cagle The managing editor for the data science central part of the tech target network. I talk the show be about a month ago about what was going on here. Specifically, looking at the role of ethics, specifically talking about AI deep science and how those essentially or affecting the way that we think about information about the careers and information. So one of the things I proposed to the top that I’ve had in mind for a little while talking about the disappearance of a scientist because I believe that this actually gives a bit of a roadmap, very, very useful, of where I think the technology is going, and where I think that ethics actually plays as part of that process. So can you see my screen? Yes, just bear with me, but bear with me. Yes, we can see your screen now. Yeah, okay. Alright, so small background I have worked in as the developers and Information Architect most recently have been working to help develop the offered through data science, machine learning, and kind of the general overall, the overarching rubric of what we call artificial intelligence. And unfortunately, I think that when, when you look at what exactly we think of data science and in the scientists, the question really kind of comes down to what the heck are they and why are they important? piece and more important question. If you are looking at getting into the field, what does that mean? What options do you have, and how does that impact? What you should think about in terms of your career developments, which ensures that the logger issues about how these impact not just technical fields, even nontech so one of the things I want to point out is that when we talk about data science, it’s been around for a while. The notion that data sciences this new field is something that I think that there there’s actually a certain degree of amnesia that occurs within the programming community. History. There are some wonderful, wonderful pieces, talking about the nature of information, programming then extends all the way back to the mid-1960s. When a lot of the capabilities for doing these kinds of processes. barely, even in their infancy, and yet the need to be able to manipulate data, we need to be able to find patterns and create models that have been around for at least that long, if not longer. So, in many cases, when we talk about data science, data scientists have actually been around in a number of gifts. Actually, more people that were responsible for looking at demographic data and trying to determine from that not only insurance information, but also establishing policy based upon the data that they were seeing more from, from surveys from census data from other similar types of information, marketing channels, statisticians in the whole field of sports, for example, has become much more data-driven just within the last 15 years or so. But even before then, the level, of skill necessary to be able to predict and determine the future of more sports teams are going to or which political teams

What is likely to be the future of elections and other things along those lines. Those areas were essentially in the state’s dish fight. And then now analyst is essentially someone who analyzes is not any big surprise. But at the same time, an analyst is someone who basically is looking at the information they have available to them. Tables are slowly moving into the computer age where they’re essentially trying to determine not only what are the patterns that have been in place, but can we use those patterns to determine the nature of the future in areas such as business, business analysts, technical analysts, the stock people who are working in research, most researchers, they simply need to have some kind of background in analytics, statistics and the ability to work with stochastics to be able to identify how particular populations of people or animals, businesses, essentially, so the notion of a data scientist has actually been around for a long time, however, or more to the point when we talk about this, even the tools that we have for data science go back for decades. A lot of people have largely forgotten about languages like Fortran. Fortran was really the first data science language out. It was a language that was essentially developed for able to handle not only basic computation but also to be able to handle additional work that came down to processing analyzing determining, creating models, and so forth, that created kind of a separate or away from the realm of business processes. Now, if you look at the evolution of computers, and computer programming, you find that Fortran largely kind of led itself into areas like saps and CRM system sophistical packaging. Eventually, you have programs like MATLAB and Mathematica, both of which were essentially designed to let you take the mathematics and the statistical information and statistical formula and then utilize steps to be able to produce and D juice from those models. An indication about how accurate information was and how readily be used to determine what future actions should take place. So with that, data scientists themselves and the term is not all that. In fact, if you look around the first usage of data scientist as a term only goes back to about 2000. And it was a different usage that you had data science is a concept. The notion of basically someone who’s focused purely on data science, purely data and data processing, is something that has only recently merged within about 2012. Interest doesn’t. And for a while, we essentially became kind of a point for a badge of honor. It was driven largely by the rise of computer language, our language, and our title. Our was actually kind of an open-source evolution of packages like SPS and SRSS. That essentially provided command-line utilities to be able to generate data to perform model generation found that Python originally started as a somewhat generalized language, but by the mid-2010s It becomes through the use of specific packages


To the others along those lines that have increasingly become the language for statistical manipulation to a significant extent, that is still true. And there is evidence that AR has kind of been declining in response to the rise of Python as a language. Now, what I can say from that is, you know, programming languages basically rise and fall all the time. Zeros, if you look at where certain languages are, strongly tend to cluster in specific areas Python, for instance, has become significant as kind of to the choice for working data analytics and his also by dint of the fact that it allows you to do certain additional mathematical processing has become critical in machine market chocolate into this. So in most of these cases, these particular tools, then essentially identified the data scientist as someone who was basically an expert in these tools. Now, there’s a what well known because it does determine the nature of the evolution of these particular fields and it’s an ongoing when you see there’s a distinction between a data scientist and a program a data scientist is essentially someone is creating models to be able to manipulate data or to to to analyze and more precise, that analysis process differentiates them from being programmers who are essentially building tools to be able to make this happen. So the tool builders are essentially programmers, the tool users are largely data scientists. That is also changing somewhat as we begin to move increasingly to models where machine language machine learning becomes prevalent. But for a lot of the purposes of distinction about what makes a data scientist you can readily say that a data scientist is going to be someone that is using statistical tools to be able to perform the analysis. Now, again, in addition to this, the notion that a data scientist is someone that basically is just a technical expert, these tools is something that is changing as well. So given that one of the key things that have evolved really within the last three to four years, although the foundations of it have been around since the early 1960s, has been the rise of neural networks and neural networks and machine learning are essentially two areas in which we are seeing the distinction evolve, arise between those people who are developing toolsets, or those people that are basically using statistical analysis to be able to determine behavior. And those people that are essentially attempting to create models, using the models themselves to be able to handle areas such as clustering for identification purposes, or identifying local minima. To be able, to determine where information or where you see the most optimal solution for particular jobs, more in areas where you’re dealing with sequential operations such as natural language processing, technology for text to speech manipulation, that effectively is a key part for any kind of natural language. Work. So those that shift from the statistical to the normal are basically changing.

The distinction between these two economists says one is old school data science and the other is new school science. And they do different things. I don’t think there’s any meal. You can say that neither one nor the other is actually less important in terms of their overall impact. It’s just that in one case, what you’re doing is looking at it from more of a statistical basis, whereas in the other case, you’re basically in the same high mathematics, some cases, even nonlinear mathematics to be able to build the models that determine behavior from that essentially evolve into the next generation of tools. And these are very exciting. I mean, when you start looking at this, you can basically identify things like image recognition, visual recognition, or video recognition, which is essentially a key quality to things like driving autonomous vehicles. Any type of categorization is increasingly done through natural language processing and the termination of things like chatbots are also being wedded to that. So in some respects, the field itself is already fractured, into distinct areas. So another phenomenon that basically has occurred is that we went from individual people coming in and being hired as data scientists usually pretty decent salaries. The development of the data science team and is actually a very interesting phenomenon that equation because if you’re looking for a future career in data science, you need to understand that it is increasingly something whether the specializations are becoming more important than the overall term. It’s kind of like saying in programming, I can call myself a programmer. But in point of fact, I happen to specialize in areas like user interface to side or back in graph data systems. Those specializations are basically what is determining the next generation of programmers but they’re also coming in the next-generation data scientists. Those include various data engineers, who are essentially the people that are responsible for taking the information or for gathering the information for processing, cleaning it, and putting it into forms that can then utilize more effectively. These are not statisticians, these, basically people that are much more interested in data quality assurance data, provenance, governance, but they are also people who are focused primarily on the pipeline of information systems. And you have the data analysts, these are the people that tend to think of as the model are the ones that are essentially generating the the the the models that affect how not only see the raw information process, but also that once they have those models in place, can be utilized to create new information to predict behaviors that can that in terms of not just okay, this is a number, but this is actually a program that is true in private other processes. So, those analysts are increasingly becoming important as people were building components in overall data organizations you have visualized. visualizers are essentially people that take data, which essentially are just constructed and make those constructs meaning, fortunately, an audience

Visualizer, visualizers are actually very important simply because we are reaching the stage now where it’s very difficult to be able to take a look at the data constructs that we have and make them meaningful to the average person. But if you have someone who can basically turn that information into a presentation, whether that’s his dashboards or other things like that, that’s becoming an increasingly useful skill. Additionally, you have areas like a storyboard artist or stream storytellers, who are responsible for generating interpretations of that data that in turn, will drive future business activity or future organizational activity. You have the programmers who are essentially building the toolsets, but they’re focused primarily on this region. So anyone centrally working on areas like GPT three, which is, which is Google’s next-generation, quote-unquote AI package for determining language processing. Those are people that are building out the tools, but they are working very closely with data teams to be able to determine what tools to work with. Finally, you are increasingly seeing the business strategist or the AI strategist who is essentially responsible for taking the nation and making sure that it works well to further the overall corporate aims for Global’s that are the data itself supports. And this is kind of important because if you don’t have someone that’s basically acting as an orchestrator and to be able to say this is the kind of data and this is how we need to get it this is why we need to get it and this is what then basically, all of the other processes are essentially just so you need to have someone who looks at this from a higher level and say, why are we doing this? And these are additional areas where we’ve been getting into data governance strategies or data emphasis. The notion that we actually have an ethicist as the position is actually one that I find vastly means among other things, that we now can actually only for us, for us to do work and anyone who happens to be a philosophy major, but a notion that you have to be able to look at the meaning of data and why you work dazzling, what are you going to use it for? And how can you determine there are minimal biases possible is going to become a very key area for organizations moving forward. There has been a lot of work has been done. To be correct. quite honest, it is cheap. You know, when you go out and say, Okay, can I use this information to affect social institutions or political institutions? Can I use it to basically change the way people vote or even to suppress the way that people do? Those are areas where you need to have someone who basically acts as the consciousness of an organization but also is someone who understands the meaning and purpose and value of that data and why you have to be very careful together. It is different from being someone that’s managing privacy, although there’s a lot of overlap. The data ethicist is essentially someone who is concerned about the information that is available but is also concerned about being available to different stakeholders. But is also concerned about is this something that you should be doing and this has become an increasingly important aspect of the sciences for So finally,

I think data science is a consequence. For dispersing the specialists, the subject matter experts. In the governor’s champions, those roles become very critical. Increasingly, there are very few roles within an organization that does not in some way, shape, or form. Get impacted by the utilization of sanctions themselves. We are all in becoming in science. Finally, to wrap this up, I think the future that we talk a lot about AI AI becomes the buzzword, but the reality is that AI generally has only been successful in those cases where there is a significant amount of human intervention that is also involved. So increasingly, I am seeing thinkers influencers, and those that are basically working with the future creatures of this particular recurring technology, thinking increasingly about AI not as artificial intelligence but as augmented intelligence and augment intelligence basically, that a lot of the issues that we are talking about a lot of the capabilities that this current wave of technology provides to us give us significantly more ability for that because we have more ability. We also have more responsibility. Going back to Spider-Man, with great, great power comes great responsibility when that’s happening. We are becoming more and more powerful in regard to gardening station data, can we are becoming more powerful to the extent that when we talk about that situation, we need to understand that every action will have an impact upon not just ourselves or our organizations. And so, when we get into this whole notion about data about the role of the data scientist and about augmented intelligence, we have to understand that we are now becoming increasingly responsible for the impacts that we are creating, for society, education, and various disadvantaged groups. All of these come into play for the environment. These are all issues. So I think maybe towards the end, presentation.

A presentation, Bert and I think you brought up some great points in terms of some Grayson’s passion on how the role of the data scientist is changing and how they are becoming more from the generalists to this specialized you know, different areas of specialty. And so, what is your advice in terms of you know, when, when people are getting into data science, should they first start as a generalist and then get into a specialization or start with a specialization all at once?

It doesn’t hurt to gain enough understanding about the process of science and other aspects and I don’t even necessarily call it the sciences. I think we’re in the intelligence realm. But anyone going into these fields probably needs to have at least an understanding of the various different opponents, standards, graph theory, machines, learning, neural networks, so forth, and so on. Make these up. That they can have a more effective decision as they enter their career about what they want because you can’t do it all. Absolutely.

Thank you, God. And I want to let the audience know that if you have more questions for Curt on LinkedIn, and we have this live stream there, so tag him and he’ll be able to answer any more questions that you have for him. Great points and take advantage of his experience and wisdom. And he’s the managing director of data science Central. He’s written many books so obviously, he is, is he is the big deal here. So thank you, God. time.

Take care. Bye.

-Kurt_Cagle AI DIET World Speaker DataEthics4All

DataEthics4All hosted AI DIET World, a Premiere B2B Event to Celebrate Ethics 1st minded People, Companies and Products on October 20-22, 2021 where DIET stands for Data and Diversity, Inclusion and Impact, Ethics and Equity, Teams and Technology.

AI DIET World was a 3 Day Celebration: Champions Day, Career Fair and Solutions Hack.

AI DIET World 2021 also featured Senior Leaders from Salesforce, Google, CannonDesign and Data Science Central among others.

For Media Inquires, Please email us connect@dataethics4all.org


Come, Let’s Build a Better AI World Together!