Learn Data Science Tutorial — Mathematics
Now, let’s say a few things about the math. You’re going to need things like a little bit of probability, some algebra, of course, regression (very common statistical procedure).
Those things are important. And the reason you need the math is: because that is going to help you choose the appropriate procedures to answer the question with the data that you have. And probably even more importantly; it is going to help you diagnose problems when things don’t go as expected. And given that you are trying to do new things with new data in new ways, you are probably going to come across problems.
So the ability to understand the mechanics of what is going on is going to give you a big advantage. And the third element of the data science Venn Diagram is some sort of domain expertise. Think of it as expertise in the field that
you’re in. Business settings are common.
You need to know about the goals of that field, the methods that are used, and the constraints that people come across. And it’s important because whatever your results are, you need to be able to implement them well.
Data science is very practical and is designed to accomplish something. And your familiarity with a particular field of practice is going to make it that much easier and more impactful when you implement the results of your analysis.
Now, let’s go back to our Venn Diagram here just for a moment.
Because this is a Venn, we also have these intersections of two circles at a time. At the top is machine learning. At the bottom right is traditional research. And on the bottom left hand is what Drew Conway called, “the danger zone.” Let me talk about each of these. First off, machine learning, or ML.
Now, you think about machine learning and the idea here is that it represents coding, or statistical programming and mathematics, without any real domain expertise. Sometimes these are referred to as “black box” models.
They kind of throw data in and you don’t even necessarily have to know what it means or what language it is in, and it will just kind of crunch through it all and it will give you some regularities. That can be very helpful, but machine learning is considered slightly different from data science because it doesn’t involve the particular applications in a specific domain.
Also, there’s traditional research. This is where you have math or statistics
and you have domain knowledge; often very intensive domain knowledge but without the coding or programming. Now, you can get away with that because the data that you use in traditional research is highly structured.
It comes in rows and columns, and is typically complete and is typically ready for analysis. Doesn’t mean your life is easy, because now you have to expand an enormous amount of effort in the methods and the designing of the project
and the interpretation of the data.
So, still very heavy intellectual cognitive work, but it comes from a different place. And then finally, there is what Conway called, “the danger zone.” And that’s the intersection of coding and domain knowledge, but without math or statistics. Now he says it is unlikely to happen, and that is probably true. On the other hand, I can think of some common examples, what are called “word counts,” where you take a large document or a series of documents, and you count how many times a word appears in there.
That can actually tell you some very important things. And also, drawing maps and showing how things change across place and maybe even across time. You don’t necessarily have to have the math, but it can be very insightful and helpful. So, let’s think about a couple of backgrounds where people come from here.
First, is coding. You can have people who are coders, who can do math, stats, and business. So, you get the three things (and this is probably the most common), most the people come from a programming background.
On the other hand, there is also stats, or statistics.
And you can get statisticians who can code and who also can do business.That’s less common, but it does happen. And finally, there is people who come into data science from a particular domain. And these are, for instance, business people who can code and do numbers. And they are the least common.
But, all of these are important to data science. And so in sum, here is what we can take away. First, several fields make up Data Science. Second, diverse skills and backgrounds are important and they are needed in data science.
And third, there are many roles involved because there are a lot of different things that need to happen. We’ll say more about that in our next movie. The next step in our data science introduction and our definition of data science is to talk about the Data Science Pathway. So I like to think of this as, when you are working on a major project, you have got to do one step at a time to get it from here to there.