In this section, you will create a predictive model. This is to give you a basic hands-on understanding of machine learning, not to achieve a highly accurate prediction. The process is a simple structure that can be expanded upon with additional tools and techniques to increase the accuracy of the model but our focus is on basic understanding. We will be doing the work using Python. If you are new to programming then here is some additional reading before you get started.
The system we will be using is described below. It will allow you to put run your code in a web environment and elevates you have to install anything on your machine. What the 3 min video for an introduction to colab by google.com
- Click on the Python environment, login or create a Gmail account (it’s free), then start up a Jupyter notebook
- Copy the code into a cell of the notebook then run it
- The code will:
- Load the needed libraries for the task at hand: the pandas library for data storage and manipulation and sklearn library for machine learning
- Read data into the environment. The data is read into a dataframe
- To give you an understanding of the data, we will use a function called describe. This gives you some statistics on the data as well as insight into the fields that exist. Look at the printout and note the range of the different fields of information.
- The data will now be prepared for machine learning. We will create a training set of data and then a data set to validate the accuracy of the model
- Create an evaluator using RandomTreeClassifier. There are many different classifiers you can use and parameters you can set for them. Each classifier has its strengths and weaknesses. Once the classifier is selected we train the model. We now check to see how the model did. We don’t want to train it to the validation set so we only use this as a check and only at the end. We don’t want the model to overfit to the training dataset.
- To get further insight into our accuracy we can create a confusion matrix that tells us how we did with our predictions. It will tell us what was the correct answer and how many of the different answers we guess.
Things to note:
# signifies comments and the remaining line is not executed
The comments describe the code under the comments
Click to Show/Hide Solution