Error Metrics

Error Metrics specify what type of error to measure when comparing and optimizing solutions. For example, you may wish to minimize "Squared Error" if your data has normally distributed noise, or "Logarithmic Error" if it contains many outliers.

The list below describes some of the fitness metrics available in Formulize. All fitness metrics are normalized based on the target values in the data set.

Error Metric Calculation Description and Comments
Mean Absolute Error $\frac{1}{N} \sum_{i=1}^{N} \left| y - f(x) \right|$ minimizes the mean of the absolute value of residual errors, mean(abs(error)). Assumes noise follows a double exponential distribution.
Mean Squared Error $\frac{1}{N} \sum_{i=1}^{N} {\left( y - f(x) \right)}^2$ minimizes the mean of the squared residual errors. Assumes noise follows a normal distribution.
R2 Goodness of Fit maximizes the R2 explained variance, similar to the squared error but normalized by the scale of the output values.
Correlation Coefficient maximizes the correlation coefficient, normalized covariance. Scale and offset invariant, models the "shape" of the data.
Maximum Error minimizes the single highest error of the residuals. Use to minimize the worst case error or to force algorithm to model a small residual feature.
Logarithm error minimizes the squashed error log(1 + |error|)
median error minimizes the median error value
Interquartile absolute error similar to median error, minimizes the mean absolute error of the middle 50% error values
Signed difference minimizes the left-hand-side minus the right-hand-side, including the sign, toward negative infinity
Hybrid correlation/error special combination of correlation and absolute error (experimental)