stochasticLinearRegression
This function implements stochastic linear regression. It supports custom parameters for learning rate, L2 regularization coefficient, mini-batch size, and has a few methods for updating weights (Adam (used by default), simple SGD, Momentum, and Nesterov).
Parameters
There are 4 customizable parameters. They are passed to the function sequentially, but there is no need to pass all four - default values will be used, however good model required some parameter tuning.
learning rateis the coefficient on step length, when the gradient descent step is performed. A learning rate that is too big may cause infinite weights of the model. Default is0.00001.l2 regularization coefficientwhich may help to prevent overfitting. Default is0.1.mini-batch sizesets the number of elements, which gradients will be computed and summed to perform one step of gradient descent. Pure stochastic descent uses one element, however, having small batches (about 10 elements) makes gradient steps more stable. Default is15.method for updating weights, they are:Adam(by default),SGD,Momentum, andNesterov.MomentumandNesterovrequire a little bit more computations and memory, however, they happen to be useful in terms of speed of convergence and stability of stochastic gradient methods.
Usage
stochasticLinearRegression is used in two steps: fitting the model and predicting on new data. In order to fit the model and save its state for later usage, we use the -State combinator, which saves the state (e.g. model weights).
To predict, we use the function evalMLMethod, which takes a state as an argument as well as features to predict on.
1. Fitting
Such query may be used.
Here, we also need to insert data into the train_data table. The number of parameters is not fixed, it depends only on the number of arguments passed into linearRegressionState. They all must be numeric values.
Note that the column with target value (which we would like to learn to predict) is inserted as the first argument.
2. Predicting
After saving a state into the table, we may use it multiple times for prediction or even merge with other states and create new, even better models.
The query will return a column of predicted values. Note that first argument of evalMLMethod is AggregateFunctionState object, next are columns of features.
test_data is a table like train_data but may not contain target value.
Notes
-
To merge two models user may create such query:
sql SELECT state1 + state2 FROM your_modelswhereyour_modelstable contains both models. This query will return newAggregateFunctionStateobject. -
User may fetch weights of the created model for its own purposes without saving the model if no
-Statecombinator is used.sql SELECT stochasticLinearRegression(0.01)(target, param1, param2) FROM train_dataSuch query will fit the model and return its weights - first are weights, which correspond to the parameters of the model, the last one is bias. So in the example above the query will return a column with 3 values.
See Also
stochasticLinearRegression
Introduced in: v20.1
This function implements stochastic linear regression. It supports custom parameters for:
- learning rate
- L2 regularization coefficient
- mini-batch size
It also has a few methods for updating weights:
- Adam (used by default)
- simple SGD
- Momentum
- Nesterov
Usage
The function is used in two steps: fitting the model and predicting on new data.
- Fitting
For fitting a query like this can be used:
Here, we also need to insert data into the train_data table.
The number of parameters is not fixed, it depends only on the number of arguments passed into linearRegressionState.
They all must be numeric values.
Note that the column with target value (which we would like to learn to predict) is inserted as the first argument.
- Predicting
After saving a state into the table, we may use it multiple times for prediction or even merge with other states and create new, even better models.
The query will return a column of predicted values.
Note that first argument of evalMLMethod is AggregateFunctionState object, next are columns of features.
test_data is a table like train_data but may not contain target value.
Notes
- To merge two models user may create such query:
where the your_models table contains both models.
This query will return a new AggregateFunctionState object.
- You may fetch weights of the created model for its own purposes without saving the model if no
-Statecombinator is used.
A query like this will fit the model and return its weights - first are weights, which correspond to the parameters of the model, the last one is bias. So in the example above the query will return a column with 3 values.
Syntax
Arguments
learning_rate— The coefficient on step length when gradient descent step is performed. A learning rate that is too big may cause infinite weights of the model. Default is0.00001.Float64l2_regularization_coef— L2 regularization coefficient which may help to prevent overfitting. Default is0.1.Float64mini_batch_size— Sets the number of elements which gradients will be computed and summed to perform one step of gradient descent. Pure stochastic descent uses one element, however having small batches (about 10 elements) makes gradient steps more stable. Default is15.UInt64method— Method for updating weights:Adam(by default),SGD,Momentum,Nesterov.MomentumandNesterovrequire slightly more computations and memory, however they happen to be useful in terms of speed of convergence and stability of stochastic gradient methods.const Stringtarget— Target value (dependent variable) to learn to predict. Must be numeric.Float*x1, x2, ...— Feature values (independent variables). All must be numeric.Float*
Returned value
Returns the trained linear regression model weights. First values correspond to the parameters of the model, the last one is bias. Use evalMLMethod for predictions. Array(Float64)
Examples
Training a model
Making predictions
Getting model weights