1) Data Analytics/Data Science/AI (Career or Interest) Prep Certification [+]
2) Data Analytics Overview [+]
3) The Self-Directed Learner’s Guide (to Success) [+]
4) The Self-Directed Learner’s Guide to Success [+]
5) Introduction to Machine Learning [+]
6) Data Empowerment [+]

1. Pre Survey

You do not have permission to view this form.

2. Introduction and MindShift Review

Course Overview

It seems that every company is rushing to hire Data Scientists, Data Engineers, and Data Architects. Who are these Data people and what about the other loyal 99% of the company that is the heartbeat of the organization? This course was developed for you, yes the other 99%.

In this course, we will:

  • Demystify what Data Science and Artificial Intelligence (AI) is all about.
  • Explain the different jobs that reflect this new Data-Driven Age.
  • Give guidance on how to participate in this Data-Driven Transformation.
  • Discuss the ethics and potential biases of this new technology.
  • Delve into the world of programming to get a glimpse of how Data Science and AI work.

Why take this course?
To fully participate in this new Era of Technology, you need to have a working knowledge of Data Science and AI. This technology is growing at an alarming rate and a lot of people feel lost. This course will help you become familiar with Data Science and AI so that you can:

  • Enhance your career: Data Science and AI are here to stay. The more you know about this new technology the more options you have for advancing your career. How many VHS developers do you still know? Better yet, do you remember Blockbuster stores? Technology evolves rapidly.
  • Help identify opportunities for business improvements: By understanding the importance of data and how data can be turned into valuable information, coupled with your professional knowledge, you can be a corporate ambassador in demonstrating ways this powerful new technology can be applied. More importantly, you become more valuable to your company.
  • Gain innovative problem-solving skills: We can all use additional techniques to solve problems. As we examine the mindset of a data scientist, you will gain innovative skills in how to approach problems and subsequently create business solutions efficiently.
  • Understand the influence of technology integration: This new Data-Driven technology is powerful and has shifted the direction of entire countries including their interactions with each other. For example, the Russian influence during the recent US presidential elections has been well documented. Understanding this and other influences on data give you additional knowledge that empowers you to be in control.

As you take this course, remember that the idea is to give you a set of practical tools and to develop an understanding of the world of Data Science and Artificial Intelligence (AI) so that you can enhance what you already know professionally and continue your career growth.

Mindshift Minute

Snippets of thoughts about learning in this new Digital Age.

  • Building Mental Models – Build your own mental models of this particular domain space. Be guided by experts and revamp your models as you discover more information. As you learn more, think about why the new knowledge is important and what situations it applies to.
  • Progressive Problem Solving – You don’t always solve a problem in the first try. Sometimes it is about gathering more information and getting closer to a solution. You keep chipping away at the subproblem until the initial problem is solved.

Your Why

Video Study Notes - Hide/Show
Defining Your Why

Defining your why gives you more direction.
Once you have your first why go a level deeper and ask yourself, why is that your why and come up with a second why.

Video Study Notes - Hide/Show


You do not have permission to view this form.


Internal Feelings and Conversations

  • Emotions – Choice
  • Positive Self Talk
  • Not hard or easy but unknown to known

Learning To Learn How To Learn

  • Build Your Own World Model of What you are Learning (Why am I and how to apply)
  • Learn Functions not all Focus on Skill
  • Problem Solving (What I know and don’t know, take what I don’t know and turn it into what I know


  • How can what I am learning here apply somewhere else
  • How can what I know somewhere else apply here


You do not have permission to view this form.
Have FUN and CHOOSE Powerfully

3. Rating Introduction

You do not have permission to view this form.

4. Intro to Data Analytics

This lesson gives an insight into Data Science and Artificial Intelligence (AI) giving the definition, showing some examples of how it can be used, and finally explaining the progression of data analysis usage.

Data science is an interdisciplinary field that combines scientific methods, processes, business understanding, and systems to gain knowledge or insights from data. A data scientist analyzes data using multiple methodologies to help companies to improve business through insights, predictions, and automation. These data-driven processes, of course, start with data and apply various knowledge and techniques to bring about insights.

AI is the field of developing systems that can perform tasks that require human-type intelligence and this is what Data Scientists strive to accomplish within the new technology.

Below are some real-world examples of what Data Science and AI can do.


Deep learning can be used for a range of applications from teaching a system to playing a video game to driving a car. It has very powerful potential.


This video is important because it shows a breakthrough in AI where deep learning is used to learn how to play and win at a video game without being preprogrammed with the rules of the game. This is a step in the direction of AI being able to learn, grow and discover paths to a solution that even the developer of the game may not know exists. The important part of the video is the first 37 where you see the solution that a computer program developed over the course of a few days being developed by a system programmed by a single person using free software and an affordable computer that solve a complex problem without knowing anything about the environment. To understand more detail about how this works then feel free to watch the video all the way to the end.


Here is another example of how AI along with other advances in technology is being used to solve real-world complex problems. There are many components that make a self-driving car work. One of the core technologies is AI which takes a lot of different data (sensor data, vision data, driving rules, etc.) and makes decisions on what to do. The AI part is exposed to many human examples of driving and the more exposure the better the system gets. The total understanding of how a self-driving car is beyond the scope of this course but the video below gives you a flavor of how it works. Look at the video as an example of how the change in technology is creating new opportunities in the world we live in.


Data can be used on many levels which progresses in the power of the data along with the sophistication of the usage. Below is a chart that captures the different uses of data and how you can go from getting access (excel spreadsheet) to data to actually having data drive decisions automatically (self-driving cars)

Data usage spans the spectrum from gaining visibility into the data to using the data to automate the process. Companies, individual departments, and even individual people will use data in a spectrum of ways depending on the project, need and capabilities as defined below.

Data Access

  • Spreadsheets help process data, but they are limited in that it is still up to the user to interpret the data.
  • Spreadsheets are a good tool when looking for answers to already formulated questions

Data Insights

  • By progressing into dashboards, data became easier to manipulate. Dashboards provide visual representations of data, which helps the user more easily interpret data.
  • Dashboards allow for a more interactive way to analyze data.
  • Using spreadsheets and dashboards as a means to analyze data is user-driven.


  • Machine Learning allows a system to identify trends and insights from processed data without significant influence from the user.
  • With a goal in mind, the user can receive unsolicited insights from the system.
  • Contrasts with spreadsheets and dashboards which require the user to interpret the data and develop insights without help from the system.


  • Deep learning allows the user to provide high-level goals. In addition to helping the user develop insights on the data, the system is now able to make decisions based on the data input in order to achieve the high-level goal.
  • For example, a good application of deep learning would be a self-driving car. The car is given a destination (the high-level goal) and is able to follow the path from start to finish based on the data it receives from its sensors.
  • Deep learning develops insights and provides unexpected solutions to high-level goals.


  • Finally, the technology and data are so rich that the system can make decisions better and quicker than humans, therefore we automate the entire process.


You do not have permission to view this form.
Have FUN and CHOOSE Powerfully

5. Data Science Technique Overview

Data science is an interdisciplinary field that combines scientific methods, processes, business understanding, and systems to gain knowledge or insights from data. A data scientist analyzes data using multiple methodologies to help companies to improve business through insights, predictions, and automation. These data-driven processes, of course, start with data and apply various techniques to bring about insights. For this section we have broken these techniques down into 3 categories:

  • Visualization – being able to present data in a way that a user can navigate through the data and discover new insights
  • Machine Learning – Using data to predict the future based on new data or automatically find data connections
  • Deep Learning – Using data to make sophisticated decisions similar to human being capabilities in a limited domain


Data visualization makes data more visible and continues the move towards Data-Driven decisions. Being able to view the data can yield more insight into how the company is running. Visualization can help with:

  • Improve response times by putting the data into the hands of the people that can do something with that data.
  • Clarity of Vision – Users can get an overall understanding of a situation first then drill down for more details to validate or disprove their hypothesis.
  • Pattern Detection – Users can absorb more information quickly and see patterns more readily. It allows decision-makers to view data using graphical representations including charts, fever charts, and heat maps.
  • Easier Collaboration – Through visualization, teams can also quickly see the situation, adjust and, react with the new information.


Supervised Machine Learning – Teaching a system with data that has a known answer.

  • Classification – Grouping data in categories (e.g. predicting the type of flower based on measured factors. New data is put into a group). The data is put into discrete classes.

  • Regression – Computing continuous and discrete data (e.g. predicting the amount of a loan based on data factors, number of people attending a football game, etc.). The answer can be a number of continuous values.

Unsupervised Machine Learning – The test set has no known answers so the system discovers trends or features.

  • Clustering – This allows similar items to be grouped together. It uses techniques to compute similarities between different attributes and uses that information to group like items together. This has also been called Data Mining Techniques.



Deep learning allows for deep pattern recognition and currently requires a lot of data for training. There are techniques being developed to lessen the data load but they are not fully commercialized. Some examples where deep learning is making an impact are in the field of language, vision, and automation through text understanding, image recognition, image creation, text creation, self-driving cars, autonomous robots, and games (a.g. Chess, Go, Mario Brothers, etc.).

For more about AI watch the video below:
What AI is? (2:31)
This talks about what AI can do as well as the social impact AI is having.

The Basics of AI and Business by Philipp Gerbert (12.5 min)
This talks about the advantage of AI and why it is good to understand AI.


You do not have permission to view this form.
Have FUN and CHOOSE Powerfully

6. Rating Data Analytics

You do not have permission to view this form.

7. Anatomy of Data Science Projects

Data Science requires data, tools and a desire to solve a problem. To be effective you need:

  1. a central environment that allows the collection, cleaning, and storage of data,
  2. tools to allow the analysis of that data,
  3. a way to deploy the insights that you develop.

This will require a team of individuals with expertise in different areas. This section describes the makeup of a good Analytical team to support the companies developing Data Science projects. Understanding how Data Science projects are developed will help you in describing potential projects for data science and also will help you to interact with the team.

Overview of a Data Science Team

The diagram below shows this flow as well as indicates types of people that go into making the entire system work.

Data Science Functional Team

Now that you are understanding how data science can improve company performance you might wonder how these projects get developed. This section will describe the inner working of a data science project in high-level detail. There are 4 major components to a data science project and as illustrated by the name it starts with Data. There are essentially 4 functions that make up a data science project. They are:

1. Data Gathering – this is getting the data from its source to a usable form for analysis. This may mean getting data out of the Point of Sale System (POS), Enterprise Resource Planning (ERP) system or Contact Relationship Management (CRM) system into a Data Lake (a central repository for storing multiple sources of data in one location) for being able to do analysis. With that move from the source to the Data lake, a process called data cleaning is done. This means that the data is made to be accurate (putting in missing data, getting the data in the right format and getting the data ready for usage). People who do this are usually called Data Engineers.

2. Data Analysis – once the data is gathered in a centralized location with cleaned information then that data can then be used to create dashboards, do predictions with machine learning or used to power a deep learning initiative.  This data analysis is where the new data science techniques we discussed earlier are used. This can be done by a Data Analyst, Data Scientist or AI Expert. Each area goes progressively deeper into using the Data Science tools at hand.

3. Results Deployment – Once the results are known and the models are proven to reach the planned goal then the users have to get access to the information in a way that allows them to make better choices. So if you have a model that predicts sales then the person that makes decisions based on predict sales needs to have access to the model. This access needs to be up 24/7 and is now a live productive application. This is moving a project from discovering what works to now we have a working system that needs to be live. This is usually a team in IT or Dev Ops.

4. Data Architecture – Finally all of the above areas of expertise build and explore the data on a system that needs to be designed for the above needs of the first 3 functions. This is everything from where do we store all this data in a commercial cloud (AWS, Azure, Google, etc.) or on our own machines? There are questions that this role has to answer. The title for this position is usually a Data Architecture.


You do not have permission to view this form.
Have FUN and CHOOSE Powerfully

8. Recognizing Data Analytics Project

Now that you understand a lot about data science and how it works (remember what we said about negative talk) we will now start to identify Data Science opportunities that are all around you. These opportunities are where you see a problem or situation at your job for improvement which can be achieved by having the right information at the right time. This section will provide steps for starting to identify these areas and outline what information can help to solve the problem.

An Example:

ACME International seems to always not have enough inventory for orders at random times during the year. This has a negative impact on customer service and creates a loss in revenue. You know that if the company had a good forecast of orders 6 weeks out then that would solve or at least lessen the problem. At this point, you are only identifying the problem and the information that could be useful to solve the problem. Now that the problem is identified we will discuss later how to go to the next level of putting an informal or formal proposal together as a project to be considered to be worked on. Note that you have all the order data in the sales system and all the items that have been back-ordered. That can be a good start for the type of data that you need to start to look at the problem.

You as well as everyone else in the company has a unique perspective and vantage point in the company. This means that you sometimes see things that others don’t. The idea behind this section is to have you recognize that gift and have you look at situations with a more critical eye towards identifying problems that may only be obvious and empower you with the ability to describe it in terms that others can also see it and make a positive impact on the company’s bottom line.

Here is a step by step technique for identifying and start to describe a problem that may be a candidate for a Data Science solution.

1. Observe and take note of activities where there are work output gaps or inefficiencies

One of the best methods of understanding why a business succeeds or fails is to take a close look at the available business processes. Doing this will give you an insight into why things happen the way they do.

The answers to these questions listed above can be deduced with the right data science model and approach. Taking a closer look at the company’s data could provide the best solution to solving the reason why a company or business keeps losing profits.

As an employee, you tend to have a closer connection with the customers you serve and that attribute provides a better solution towards addressing your employer’s challenges. So first make a list of all the problems that you see around you and would make a difference to do something about it.

2. Find out how much your company or department is losing due to the problems mentioned above? 

Nothing happens by accident; every action attracts an equal but opposite reaction. Taking note of the numbers can reveal quite a lot of information. Let’s say for example we know that every backorder cost the company %5 of the sale. And we know that we have 1M in back orders last year so that cost the company  50K in revenue lost. If we can reduce that cost to half then we have saved the company $25K/year.

This was a back of the envelop type estimate but by identifying problem areas and starting to look at simple analytical thinking starts the ball rolling. 

3. Define What A Potential Process Could Be With The Right Data Science Attached to it.

Now that you have identified a problem then what questions if answered can help you develop a better process? This step allows you to start to understand what information is missing from making a better decision. Let’s start to identify data sources where you can solicit those answers. Ask the question does our company have data that can help me understand my ordering pattern to fun analysis on the data and predict future sales?

a. Identify the Data associated with the project

All companies tend to have a record of daily activities ranging from customer ordering processes, customer demographics and customer buying history. Start to list those systems that are potential data sources. They can be formal like a CRM system or more informal like a spreadsheet on someone’s desktop. Each of these sources is something to consider when proposing a project for data analysis. 

b. Think through if you have enough information to solve the problem or will you need more data

The higher the level of data available to you, the easier it becomes to offer the right solutions to address your company’s needs. Remember that your employers may only see the numbers, while you have the best information to explain why. This exercise will give you insight into other possible data sources that you did not consider at first. Don’t let not having enough data prevent the proposal of the project because there may be other data sources that you are not aware of but are in line with the needs of the project. Open discussion is very valuable. 

c. Inform your stakeholders about the need for discussions about the type of data analysis to solve problems. 

As stated earlier, employees have a direct relationship with the customers, and this can serve as an advantage in getting to know what they want, why they behave the way they do and the possible ways to remedy the situation. Once you are sure of your findings, you can pitch it to your manager by formally writing to him or her or discussing the matter in a meeting. The goal is to point out the problem and make sure your manager understands why this is a step in the right direction. The next chapter gives you a form to help you organize your thoughts and conversation.

Bottom line

You don’t have to be a data scientist to be effective in using this new data analytics. Your part can be to take the knowledge that you have gained overdoing your job and combine that knowledge with the ability to identify company areas of improvement. Then apply a data analytics perspective with suggestions, opportunities for the company to make a difference to its overall bottom line.

  1. Take note of activities where things are broken or inefficient
  2. Find out how much your company or department is losing due to the activities mentioned above?
  3. Define What A Potential Good Process Could Be With The Right Data

Data science is an invaluable addition to company values, objectives and mission as long as the companies can identify when they need one. Data science saves you time while giving you value for your money.  


You do not have permission to view this form.
Have FUN and CHOOSE Powerfully

9. Describing Data Science Projects to Management

OK, now that you have identified a possible data science project let’s think through a few more items to consider. At the end of this lesson is a project form that will guide you to collect the relevant facts about a potential project. You don’t have to fill the entire document in for the project to be worth pursuing. The form is to be used as a guide and not as the final word.

So let’s see what information you want to have to describe the problem. The two most important pieves of infromation are what is the problem and if the problem is solved even partially how much savings or revenue generated can be obtained. Once that is determine then you should start looking at who are the stake holders.

One Page Data Science Checklist

Use this checklist to help define and summarize the start of a data science project. This checklist will help you to cover all aspects of your project to get you started up front in the right direction.


You do not have permission to view this form.
Have FUN and CHOOSE Powerfully

10. Ratings Data Analytics Projects

You do not have permission to view this form.

11. The Benefits of a Data Science Mindset

Since data scientists by the nature of what they do good problem solvers some of the mindsets that they have can help everyone have more capabilities to solve problems. For you, this could mean that by understanding some of those traits and using them in your life you can be more productive. Here are the top 5 Data Scientist traits that can help with general problem solving:

  1. Constant Learners – techniques and technology are changing at an ever-increasing rate so being a constant learner is a necessity. Learning how to learn and learn quickly is a skill to continue to build.
  2. Uses Multiple Tools and Techniques – no one tool does it all. Being versed in many tools as well as (see #1) learning new tools quickly will make you that much more efficient.
  3. Writes Code – Being able to write code to handle the things that off the shelf products don’t can help to make a more well-rounded data scientist. Even if you don’t write code for a living the way that you need to think to write code gives you tools for dividing the problem up into smaller chunks and solving those while also thinking about activities in a systematic organized fashion.
  4. Understands Business Issues – Data science in a vacuum is good if you are doing core research but in a company setting you must balance the technical work with understanding and applying the work to the business. The business exists to make money by servicing customers so understanding how you fit into that equation is an important skill anyone can benefit from.
  5. Problem Solving Truth Seekers – Always be curious. This investigative mindset allows the discovery of insights and valuable methodologies.


You do not have permission to view this form.
Have FUN and CHOOSE Powerfully

12. Bias and Ethics in this Data-Driven Age

Data Science has the ability to influence people, learn new things, automate a task and literally change the world. With this power comes a responsibility to ensure the ethical design and used.

The integrity of Artificial Intelligence (AI) technologies is dependent on acquiring data, the integrity of the data and how the technology is eventually used. More specifically, the ability of AI technologies to perform ethically, and without bias, is dependent on quality data combined with well-designed algorithms that make honest, unbiased decisions.

Therefore, AI technologies can be far-reaching yet properly solve real problems in the world. This is a growing discussion in the AI technology space. Identifying and mitigating bias in AI is important to building trust between the companies that develop and use new tech and the population.
This blog discusses the main issues regarding data science, ethics, and bias in regards to its application to artificial intelligence technologies. The ethics conversation has three broad areas:

  • Data Acquisition
  • Data Integrity
  • Use of Technology

Data Acquisition
Data acquisition is the gathering of the data used by AI technologies. How that data is acquired has raised some ethical discussions and its implications on artificial intelligence include:

  • privacy,
  • data ownership, and
  • acquisition.

In regards to privacy, some people might not want to share certain private or sensitive data about themselves. Do they know what data is being shared and do they have a choice to opt out? And if individual opt-out of having certain personal data acquired does that skew the data now. For example, some people might not want to share their age, race or sexual preference. Does that create a higher data pool skewed towards the more conventional population? This can subsequently create inaccurate and incorrect results when used in AI technologies. Further, certain groups of people, such as minorities, might be more prone to withhold data about themselves than other groups for fear, as shown by history, of usage against that population.

This overlaps and contrasts with the next major issues in regards to data acquisition. These issues include who owns the data once it’s acquired and how the data is acquired. Often times an individual does not even own the data about themselves given today’s legislation. Individuals should likely have more control over choosing what data about them is saved, and who owns it. Further, individuals should be able to delete data that is collected about them as per new laws in Europe (see GDPR (more info))

All of these extra controls afforded to the individual increase the ethical conduct of data acquisition and ownership. However, they increase the challenges of creating high integrity, accurate AI technologies.
Here are some real-world data acquisition dilemmas. Can a phone using Siri always monitor and log your conversations? Can Alexa always record what is being said around it? Should your Fitbit always record where you are and how you are doing? Once the data is acquired does the individual have the right to know and see that data? Do you have a right to delete your data? Can I use all the cameras in the city and watch your every move? Does an individual lose complete control of their personal data that is acquired by the technologies that they use and rely on in their day-to-day lives? All of these issues are being discussed now because the technology exists to make this happen.

Data Integrity
All modern AI techniques use data as a foundation for building models and algorithms that make decisions. The ability of AI technologies to make accurate, unbiased and ethical decisions depends on the integrity of the data that is used by these technologies. The foundational data is what defines whether the model will work. With that said, certain biases can be reflected depending on how the data is acquired. For example, in a lot of medical studies, the subjects used in the past tended to be white (more info).

Therefore, models built around this data have built-in racial biases. This can be true of any data.
Another example, businesses that acquire and own individual data often tend to sell it. There are markets for this data and some data might be priced higher than others. Prices might vary based on demographics and socioeconomic biases. Therefore, making some data less or more accessible to builders of AI technologies. Even worse, large groups of the population might be completely missing from data that is available for sale or that is directly acquired by AI companies. To elaborate, certain groups might not even use or have access to the technologies or programs that acquire data. As a result, AI technologies that use this data will inherently have biases and potentially unethical decision-making abilities embedded into them. This creates an extra area of caution to those developing AI technologies in regards to preventing bias and maintaining integrity and ethics. So, understanding data bias and then working to compensate for it is an ongoing discussion.

Use of Technology
Technology is often used to influence the behavior of people or groups. This raises an ethical dilemma to the limits of how technology should be used. For example, Russia’s use of technology via Facebook to influence the election created a lot of controversies. The identification and targeting of pregnant women by Target to send coupons created privacy discussions. Technology is also being used to predict crimes. However, this could create biases based in the judicial system (more info).

Essentially, technology is being used in many cases to influence behavior or make a decision that affects people. Today, after someone searches for something on the internet, and they open Facebook or even AliExpress, an ad is shown, offering to sell them something related to previous searches. This should make us ask the question of whether or not this is ethically correct. Should this data be used to drive corporate bottom lines, or should it be used to ethically drive intelligence that solves problems that benefits the individuals in the world? There is a saying, perception is reality. When technology is used in ways to drive consumption or influence voting behaviors these behavioral influences, for better or for worse, often become reality. These are just ethical discussions going on about the use of new powerful, AI technologies.

We need to be cognizant of both obvious and indirect ethically questionable biases in both our data and intelligence technologies that use this data. Further, managers of AI technologies should understand the ethical usage issues and make well-informed decisions. To gain trust in your company with the general public maintaining integrity and honesty in the data and how it is used is valuable. Artificial intelligence can be used to create value to people and make your company more competitive.


You do not have permission to view this form.
Have FUN and CHOOSE Powerfully

13. Ratings Aspects of Data Analytics

You do not have permission to view this form.

14. Small Project


We are now to the point of doing some programming. This is the time that you may start to feel those emotions creeping in. If so then remembers the first lesson about choice.

This section will take you through developing a simple categorization model. This development will be taken one step at a time with explanations for each of the steps that you take. Remember you are not looking to learn everything about programming. This is to give you an overview of how programming these models are done through the coding of a very simple model.

Mindshift Minute

Snippets of thoughts about learning in this new Digital Age.

  • Comfortable in the Unknown – Work to get comfortable in the unknown. It is OK to understand partial information and continue to fill in the blanks of knowledge. Learning is a journey not just a destination. You sometimes have to take it piece by piece uncovering new nuggets of knowledge along the way. Doing the exercises below will help you grow in being OK with the unknown yet still being able to take some aspects of what you are learning away with you. Let’ have fun and give it a whirl.


Programing Environment

You will code in a Google collaborative environment. That environment is free to use with your free Google account. Click here to get to the Google collaborative environment. Either login with your existing Google account or sign up for a free account.

Programming Introduction
Coding is made up of Syntax (being very precise about the command format that you give the computer) and Logic (commands you give the computer to accomplish your goal). The system will point out syntax errors because it knows how every function is structured. Code that is in the grey box can be copied and pasted into the coding environment.

# When you have # on a line then any words afterward are ignored by the system

print("Hello World") # is a command that will print to the screen
prnt("Hello World") # is a syntax error. prnt has no meaning

Here are some resources to help you with programming. The idea in this section is not to have you be a python programmer but to get exposed to python programming and machine learning.

Python Tutorial – on the w3schools website.
Programmers Mindset for Beginner Programmers
Analytic Thought: The Importance of Programming
7 Concepts to Help Programming In Any Language

Model Creation Steps

Every data science projects should go through some variation of the steps below. We have added structure to the process that seems to hold well for creating Machne Learning type projects <strong”>(classification, categorization or prediction).

    1. Defining Problem
    2. Identifying Data Set
    3. Loading Data Into Environment
    4. Analyze Data
    5. Cleaning Data
    6. Visualizing Data
    7. Preparing Data For Machine Learning
    8. Trying a Machine Learning Algorithm
    9. Refining the Machine Learning Algorithm
    10. Deploying

You will take the code in each section and copy it into a cell in the Jupiter notebook. Execute that cell and see what the results are for the coding. Again you are just getting exposure to the code and not trying to learn python. In the introduction to machine learning course you will get more in-depth exposure to what all this means.

1. Defining Problem

This step is to be very clear about what your goals are, the type of data that you will use and the possible techniques you will use. The project identification is good to be used for this section.

We have filled out the Worksheet below.

Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

Download [52.19 KB]

Want to classify the type of iris of a sample based on sepal and petal information obtained. 

Use gathered data to train a model to do prediction.

Iris setosa Iris versicolor Iris virginica
  1. Identifying Data Set


https://en.wikipedia.org/wiki/Iris_flower_data_set – more information


3.Loading Data Into Environment

######################## READ IN DATA ############################
import warnings
import pandas as pd
import numpy as np

# Load dataset
url = "https://teammindshift.com/data/iris.csv"

df = pd.read_csv(url)


This will read in the data from the file iris.csv into the environment. That data is stored in a variable df. Again this is only an introduction to give you exposure.

  1. Analyze Data

This will analyze the data in the columns and give you some insight into the column information. Notice that all the values in column sepal_length are on average about 5.8.

  1. Cleaning Data
from sklearn.preprocessing import LabelEncoder

LE = LabelEncoder()
df['species'] = LE.fit_transform(df['species'])


# 0 - Setosa
# 1 - Versicolor
# 2 - Virgina

In doing machine learning all data has to be changed from text to numbers. The above code changes the names of the different types of irises to a number. Compare the df variable last column at step 3 to this step 5.

  1. Visualizing Data

See the box plot of the different columns. There are other types of plots that you can do with data to help in the understanding of that data. If you want to understand box plots feel free to do a google search and investigate. This was just to demonstrate the ability to plot the data and gain an overview of the process and not getting into the details.

  1. Preparing Data For Machine Learning
# import sklearn model_selection code
from sklearn import model_selection

# Split-out validation dataset
array = df.values
X = array[:,0:4]
Y = array[:,4]

# Get Training and Validation sets
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=0.2, random_state=7)

You won’t see any results but this section breaks the data down into 2 sets with attributes and results. X is used for training and Y is used for validation. See the diagram below. This shows the break down of the different data parts.

Attributes columns: sepal_length, sepal_width, petal_length, and petal_width

Results column: species

  1. Trying a Machine Learning Algorithm
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
print("1. Accuracy: {}".format(accuracy_score(Y_validation, predictions)))
print("2. Confusion Matrix:\n{}".format(pd.crosstab(Y_validation, predictions, rownames=['True'], colnames=['Predicted'], margins=True)))

This section runs a LogisticRegression algorithm to predict the outcome. Notice that the accuracy of the validation data is 86%.

  1. Refining the Machine Learning Algorithm
from sklearn.metrics import accuracy_score
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

model = LinearDiscriminantAnalysis()
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)

print("1. Accuracy: {}".format(accuracy_score(Y_validation, predictions)))
print("2. Confusion Matrix:\n{}".format(pd.crosstab(Y_validation, predictions, rownames=['True'], colnames=['Predicted'])))

This section runs a Linear Discriminant Analysis algorithm to predict the outcome. Notice that the accuracy of the validation data is 96%.


In the introduction to Machine Learning, you will learn more about what all the above means. Right now we just wanted you to have the opportunity to see code and copy and paste that code into an environment.


You do not have permission to view this form.
Have FUN and CHOOSE Powerfully

15. Rating Small Project

You do not have permission to view this form.

16. Post Survey

You do not have permission to view this form.

17. Net Promoter Score

You do not have permission to view this form.