Regressions can be simply understood by thinking of the simple algebraic equation y=mx+b, where the algorithm is looking to choose an x and b that will produce the highest accuracy, on average, for each x and y value. This is generally the same concept when there are more variables than one. If the problem is classification and not looking for a scalar value, the regression model outputs a distinct probability that the sample is the positive class; y = probability in classification and not an actual value. Regressions are really good at finding linear trends in data, but sometimes there is a relationship that isn’t linear. For a regression to be able to fit well to a non-linear pattern, data transformation needs to be done before the model is sent through. The benefit, however, about such a strong understanding of linear relationships is that linear relationships generally do the best with extrapolation. Meaning, if there was data that showed sales for an ice cream truck for temperatures ranging from 40 degrees farenheit to 100 degrees fahrenheit, but the owner wanted to know what sales would be like if the temperature was 110 degrees fahrenheit, a temperature that the owner doesn’t have data for, a well-fit regression would do a good job at extrapolating to estimate what sales would be for the ice cream truck on a day that was 110 degrees fahrenheit.
Regression models used in Qlik AutoML:
Model Type |
Model |
Problem Type |
Regression |
Linear Regression |
Regression |
Regression |
Logistic Regression |
Classification |