I think one of the coolest features of Azure Machine Learning is the ability to evaluate different algorithms and choose the right one with just few mouse clicks. The *Evaluate Model *makes it happen.

Official Documentation Page for the evaluate model can be found here.

Anyone can make sense of its output and decide on the right model provided one has basic understanding of the followings:

## Regression

When you pass on a scored model for a regression algorithm, the evaluation model generates metrics of following:

#### Mean Absolute Error

*Mean Absolute Error*(**MAE**) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes.

Where is the prediction and the true value. |

It has the same unit as the original data, and it can only be compared between models whose errors are measured in the same units.

#### Root Mean Squared Error

Where is the prediction and the true value. |

It can only be compared between models whose errors are measured in the same units.

#### Relative Absolute Error

Where is the prediction and the true value and is the mean of |

It can be compared between models whose errors are measured in the different units.

#### Relative Squared Error

Where is the prediction and the true value and is the mean of |

It can be compared between models whose errors are measured in the different units.

#### Coefficient of Determination

The coefficient of determination (R^{2}) summarizes the explanatory power of the regression model. If the regression model is perfect R^{2} is 1. If the regression model is a total failure, R2 is zero.

*So for your model more R ^{2} approaches 1, the better it is.*

Below is a sample experiment to compare between Linear Regression and Decision Forest Regression using *evaluation model*.

## Classification

When you pass on a scored model for a two class classification algorithm, the evaluation model generates metrics of following:

### True Positive

True Positive (TP): Correctly identified e.g. Sick people correctly diagnosed as sick

### False Positive

False Positive (FP): Incorrectly identified e.g. healthy people incorrectly identified as sick

### True Negative

True Negative (TN): Correctly rejected e.g. healthy people correctly identified as healthy

### False Negative

False Negative (FN): Incorrectly rejected e.g. Sick people incorrectly identified as healthy

### Accuracy

The proportion of the total number of predictions that is correct.

Accuracy = (TP + TN) / (TP + TN + FP + FN)

### Precision

Positive Predictive Value or Precision is the proportion of positive cases that were correctly identified.

Precision = TP / (TP + FP)

### Recall

Sensitivity or Recall is the proportion of actual positive cases which are correctly identified.

Recall = TP / (TP + FN)

### F1 Score

F1 Score is the harmonic mean of precision and Recall.

F1 = 2TP / (2TP + FP + FN)

### Threshold

Threshold is the value above which it belongs to first class and all other values to the second class. E.g. if the threshold is 0.5 then any patient scored more than or equal to 0.5 is identified as sick else healthy.

Below is a sample experiment to compare between Two-Class Logistic Regression and Two-Class Decision Forest Regression using *evaluation model*.

Very useful post! It saves a hell of a lot of time looking up wikipedias of each of the functions just to remind myself what the definition of each is. Cheers

A very good example on how to get the result generated by two different algorithms. For those who are doing research work it might be helpful.