Understanding Evaluate Model in Microsoft Azure Machine Learning

I think one of the coolest features of Azure Machine Learning is the ability to evaluate different algorithms and choose the right one with just few mouse clicks. The Evaluate Model makes it happen.

Official Documentation Page for the evaluate model can be found here.

Anyone can make sense of its output and decide on the right model provided one has basic understanding of the followings:

Regression

When you pass on a scored model for a regression algorithm, the evaluation model generates metrics of following:

Mean Absolute Error

Mean Absolute Error(MAE) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes.


Where is the prediction and the true value.

It has the same unit as the original data, and it can only be compared between models whose errors are measured in the same units.

Root Mean Squared Error


Where is the prediction and the true value.

It can only be compared between models whose errors are measured in the same units.

Relative Absolute Error


Where is the prediction and the true value and is the mean of

It can be compared between models whose errors are measured in the different units.

Relative Squared Error


Where is the prediction and the true value and is the mean of

It can be compared between models whose errors are measured in the different units.

Coefficient of Determination

The coefficient of determination (R2) summarizes the explanatory power of the regression model. If the regression model is perfect R2 is 1. If the regression model is a total failure, R2 is zero.

So for your model more R2 approaches 1, the better it is.

Below is a sample experiment to compare between Linear Regression and Decision Forest Regression using evaluation model.
evaluation model
evaluation model

Classification

When you pass on a scored model for a two class classification algorithm, the evaluation model generates metrics of following:

True Positive

True Positive (TP): Correctly identified e.g. Sick people correctly diagnosed as sick

False Positive

False Positive (FP): Incorrectly identified e.g. healthy people incorrectly identified as sick

True Negative

True Negative (TN): Correctly rejected e.g. healthy people correctly identified as healthy

False Negative

False Negative (FN): Incorrectly rejected e.g. Sick people incorrectly identified as healthy

Accuracy

The proportion of the total number of predictions that is correct.

Accuracy = (TP + TN) / (TP + TN + FP + FN)

Precision

Positive Predictive Value or Precision is the proportion of positive cases that were correctly identified.

Precision = TP / (TP + FP)

Recall

Sensitivity or Recall is the proportion of actual positive cases which are correctly identified.

Recall = TP / (TP + FN)

F1 Score

F1 Score is the harmonic mean of precision and Recall.

F1 = 2TP / (2TP + FP + FN)

Threshold

Threshold is the value above which it belongs to first class and all other values to the second class. E.g. if the threshold is 0.5 then any patient scored more than or equal to 0.5 is identified as sick else healthy.

Below is a sample experiment to compare between Two-Class Logistic Regression and Two-Class Decision Forest Regression using evaluation model.

evaluation model
evaluation model

Sumit Mund

Sumit Mund is an Artificial Intelligence Consultant with more than 12 years of experience. He has an MSc by Research degree and B.Tech degree in Information Technology. He is also a part-time PhD scholar at University of Huddersfield where his research area includes applications of Deep Reinforcement Learning and uses Google Tensorflow extensively. Read More...

This Post Has 2 Comments

  1. Very useful post! It saves a hell of a lot of time looking up wikipedias of each of the functions just to remind myself what the definition of each is. Cheers

  2. A very good example on how to get the result generated by two different algorithms. For those who are doing research work it might be helpful.

Leave a Reply

Close Menu