I think one of the coolest features of Azure Machine Learning is the ability to evaluate different algorithms and choose the right one with just few mouse clicks. The Evaluate Model makes it happen.
Official Documentation Page for the evaluate model can be found here.
Anyone can make sense of its output and decide on the right model provided one has basic understanding of the followings:
Regression
When you pass on a scored model for a regression algorithm, the evaluation model generates metrics of following:
Mean Absolute Error
Mean Absolute Error(MAE) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes.
![]() Where ![]() ![]() |
It has the same unit as the original data, and it can only be compared between models whose errors are measured in the same units.
Root Mean Squared Error
![]() Where ![]() ![]() |
It can only be compared between models whose errors are measured in the same units.
Relative Absolute Error
![]() Where ![]() ![]() ![]() |
It can be compared between models whose errors are measured in the different units.
Relative Squared Error
![]() Where ![]() ![]() ![]() ![]() |
It can be compared between models whose errors are measured in the different units.
Coefficient of Determination
The coefficient of determination (R2) summarizes the explanatory power of the regression model. If the regression model is perfect R2 is 1. If the regression model is a total failure, R2 is zero.
![]() |
So for your model more R2 approaches 1, the better it is.
Below is a sample experiment to compare between Linear Regression and Decision Forest Regression using evaluation model.
Classification
When you pass on a scored model for a two class classification algorithm, the evaluation model generates metrics of following:
True Positive
True Positive (TP): Correctly identified e.g. Sick people correctly diagnosed as sick
False Positive
False Positive (FP): Incorrectly identified e.g. healthy people incorrectly identified as sick
True Negative
True Negative (TN): Correctly rejected e.g. healthy people correctly identified as healthy
False Negative
False Negative (FN): Incorrectly rejected e.g. Sick people incorrectly identified as healthy
Accuracy
The proportion of the total number of predictions that is correct.
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Precision
Positive Predictive Value or Precision is the proportion of positive cases that were correctly identified.
Precision = TP / (TP + FP)
Recall
Sensitivity or Recall is the proportion of actual positive cases which are correctly identified.
Recall = TP / (TP + FN)
F1 Score
F1 Score is the harmonic mean of precision and Recall.
F1 = 2TP / (2TP + FP + FN)
Threshold
Threshold is the value above which it belongs to first class and all other values to the second class. E.g. if the threshold is 0.5 then any patient scored more than or equal to 0.5 is identified as sick else healthy.
Below is a sample experiment to compare between Two-Class Logistic Regression and Two-Class Decision Forest Regression using evaluation model.
Saquib
4 Mar 2015Very useful post! It saves a hell of a lot of time looking up wikipedias of each of the functions just to remind myself what the definition of each is. Cheers