How to works Machine Learning ?
From detecting skin cancer to sorting cucumbers to detecting escalators in need of repair. Machine Learning has granted computer systems entirely new abilities. But how does it really work under the hood. Let's walk through a basic example and use it as an excuse to talk about the process of getting answers from your data using machine learning. Welcome to steemit. My name is Addy Bhatti. On this article we'll explore the art science and tools of machine learning. Let's pretend that we've been asked to create a system that answers the question of whether a drink is wine or beer. This question answering system that we build is called a model and this model is created via a process called training machine learning.
The goal of training is to create an accurate model that answers our questions correctly. Most of the time but in order to train a model we need to collect data to train on. This is where we will begin. Our data will be collected from glasses of wine and beer.
There are many aspects of drinks that we could collect data on everything from the amount of foam to the shape of the glass. But for our purposes we'll just pick two simple ones the color as a wavelength of light and the alcohol content as a percentage. The hope is that we can split our two types of drinks along these two factors alone we'll call these are features. From now on color and alcohol. The first step to our process will be to run out to the local grocery store buy up a bunch of different drinks and get some equipment to do our measurements a spectrometer for measuring the color and a hydrometer to measure the alcohol content. It appears that our grocery store has an electronics hardware section as well. Once our equipment and booze we got it all set up it's time for our
- machine learning gathering that data.
This step is very important because the quality and quantity of data that you gather will directly determine how good your predictive model can be. In this case the data we collect will be color and alcohol content of each drink. This will yield us a table of color alcohol content and whether it's beer or wine.
This will be our training data so a few hours of measurements later we've gathered our training data and had a few drinks perhaps and now it's time for our next step of machine learning .
- data preparation.
[Where we load our data into a suitable place and prepare it for use in our machine learning. Training will first put all our data together then randomize the ordering. We wouldn't want the order of our data to affect how we learned since that's not part of determining whether we drink beer or wine. In other words we want to make a determination of what a drink is independent of what drink came before or after it in the sequence. This is also a good time to do any pertinent visualizations of your data helping you see if there is any relevant relationships between different variables as well as show you if there are any data imbalances. For instance if we collected way more data points about beer than wine the model we train will be heavily biased toward guessing that virtually everything that sees beer since it would be right most of the time. However in the real world the model may see beer and wine an equal amount which would mean that it would be guessing beer wrong half the time. We also need to split the data into two parts. The first part used in training our model will be the majority of our dataset.
The second part will be used for evaluating our train models performance. We don't want to use the same data that the model was trained on for evaluation. Since then it would just be able to memorize the questions just as you wouldn't want to use the questions from your math homework on the math exam. Sometimes the data we collect needs other forms of justing and manipulation. Things like the duplication normalization Erica erection and others. These would all happen at the data preparation step. In our case we don't have any further data preparation. So let's move on forward the next step.
- Choosing a model.
There are many models that researchers and data scientists have created over the years. Some are very well suited for image data others for sequences such as text or music some for numerical data and others for text based data. In our case we have just two features color and alcohol percentage we can use a small linear model which is a fairly simple one. I don't get the job done. Now we move on to what is often considered the bulk of machine learning but training in this step will use their data to incrementally improve our models ability to predict whether a given drink is wine or beer. In some ways this is similar to someone first learning to drive. At first they don't know any of the pedals knobs and switches work or when they should be pressed or used.
However after lots of practice and correcting for their mistakes a licensed driver emerges. Moreover after a year of driving they've become quite adept at driving the act of driving and reacting to real world data has adapted their driving abilities honing their skills. We will do this on a much smaller scale with our drinks in particular the formula. y= m * x + b For a straight line is Y equals X plus B where x is the input and there's the slope of the line. B is the y intercept. And why is the value of the line at that position X the values we have available to us to adjust or train are just and b where the M S that slope and B is the y intercept. There's no other way to X affect the position of the line since the only other variables are X or input y or output in machine learning. There are many M's since there may be many features the collection of these values is usually formed into a matrix that is denoted W for the weights matrix. Similarly for B we arrange them together and that's called the Byass the training process involves initializing some random values for W and B and attempting to predict the outputs with those values. And as you might imagine it does pretty poorly at first but we can't compare our models predictions with the output that it should have produced and adjust the values in W and B so that we will have more accurate predictions on the next time around.
So this process then repeats each iteration or cycle of updating the weights and Byass is called One training step. So let's look at what that means more concretely for our viewers. When we first started the training it's like we drew a random line through the data. Then as each step of the training progresses the line moves step by step closer to the idea of separation of the wine and beer. Once the training is complete it's time to see if the model is any good using evaluation. This is where that data set that we set aside earlier comes into play. Evaluation allows us to test our model against data that has never been used for training. This metric allows us to see how the model might perform against data that has not yet seen. This is meant to be representative of how the model might perform in the real world. A good rule of thumb I use for a training evaluation split is somewhere on the order of 80 20 or 70 30. Much of this depends on the size of the original source dataset. If you have a lot of data perhaps you don't need as big of a fraction for the evaluation due to dataset once you've done evaluation it's possible that you want to see if you can further improve your training and any way we can do this by tuning some of our parameters. There were a few that we implicitly assumed when we did our training and now is a good time to go back and test those assumptions values.
One example of a parameter can tune is how many times we run through the training set during training we can actually show the data multiple times. So by doing that we will potentially lead to higher accuracy is another parameter is learning rate. This defines how far we shift the line during each step based on the information from the previous training step. These values all play a role in how accurate our model can become and how long the training takes for more complex models initial conditions can play a significant role as well in determining the outcome of training. Differences can be seen depending on whether a model starts off training with values initialize the zeroes versus some distribution of values and what that distribution is. As you can see there are many considerations at this phase of training and it's important that you define what makes a model good enough for you. Otherwise we might find ourselves tweaking parameters for a very long time. Now these parameters are typically referred to as hyper parameters the adjustment or tuning of these hyper parameters Stormy's a bit more of an art than science and it's an experimental process that heavily depends on the specifics of your data set model in train process. Once you're happy with your training and hyper parameters guided by the evaluation step it's finally time to use your mouth to do something useful.
Machine learning is using data to answer questions. So prediction or inference is that's there where we finally get to answer some questions. This is the point of all of this work where the value of machine learning is realized. We can finally use our model to predict whether a given drink is wine or beer given its colour and alcohol percentage. The power of machine learning is that we were able to determine this hand how to differentiate between wine and beer using our model rather than using human judgment. Manual rules you can extrapolate the ideas presented today to other problem domains as well where the same principles apply. Gathering data preparing their data choosing a model training it and evaluating it doing your hyper parameter tuning and finally prediction. If you're looking for more ways to play with training and parameters check out the test flow playground. It's a completely browser based machine learning sandbox where you can try different parameters and run training against mock datasets. And don't worry you can't break the site. Hope you enjoyed this article thank you