package sklearn

You can search for identifiers within the package.

in-package search v0.2.0

sklearn
- Sklearn

Legend:
Library
Module
Module type
Parameter
Class
Class type

type t

val of_pyobject : Py.Object.t -> t

val to_pyobject : t -> Py.Object.t

val create : 
  ?cs:[ `I of int | `FloatList of float list ] ->
  ?fit_intercept:bool ->
  ?cv:[ `I of int | `CrossValGenerator of Py.Object.t ] ->
  ?dual:bool ->
  ?penalty:[ `L1 | `L2 | `Elasticnet ] ->
  ?scoring:[ `S of string | `Callable of Py.Object.t ] ->
  ?solver:[ `Newton_cg | `Lbfgs | `Liblinear | `Sag | `Saga ] ->
  ?tol:float ->
  ?max_iter:int ->
  ?class_weight:[ `DictIntToFloat of (int * float) list | `Balanced ] ->
  ?n_jobs:int ->
  ?verbose:int ->
  ?refit:bool ->
  ?intercept_scaling:float ->
  ?multi_class:[ `T_auto of Py.Object.t | `Ovr | `Multinomial ] ->
  ?random_state:int ->
  ?l1_ratios:Py.Object.t ->
  unit ->
  t

Logistic Regression CV (aka logit, MaxEnt) classifier.

See glossary entry for :term:`cross-validation estimator`.

This class implements logistic regression using liblinear, newton-cg, sag of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2 regularization with primal formulation. The liblinear solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. Elastic-Net penalty is only supported by the saga solver.

For the grid of `Cs` values and `l1_ratios` values, the best hyperparameter is selected by the cross-validator :class:`~sklearn.model_selection.StratifiedKFold`, but it can be changed using the :term:`cv` parameter. The 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers can warm-start the coefficients (see :term:`Glossary<warm_start>`).

Read more in the :ref:`User Guide <logistic_regression>`.

Parameters ---------- Cs : int or list of floats, default=10 Each of the values in Cs describes the inverse of regularization strength. If Cs is as an int, then a grid of Cs values are chosen in a logarithmic scale between 1e-4 and 1e4. Like in support vector machines, smaller values specify stronger regularization.

fit_intercept : bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

cv : int or cross-validation generator, default=None The default cross-validation generator used is Stratified K-Folds. If an integer is provided, then it is the number of folds used. See the module :mod:`sklearn.model_selection` module for the list of possible cross-validation objects.

.. versionchanged:: 0.22 ``cv`` default value if None changed from 3-fold to 5-fold.

dual : bool, default=False Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

penalty : 'l1', 'l2', 'elasticnet', default='l2' Used to specify the norm used in the penalization. The 'newton-cg', 'sag' and 'lbfgs' solvers support only l2 penalties. 'elasticnet' is only supported by the 'saga' solver.

scoring : str or callable, default=None A string (see model evaluation documentation) or a scorer callable object / function with signature ``scorer(estimator, X, y)``. For a list of scoring functions that can be used, look at :mod:`sklearn.metrics`. The default scoring option used is 'accuracy'.

solver : 'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga', default='lbfgs'

Algorithm to use in the optimization problem.

For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones.
For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs' handle multinomial loss; 'liblinear' is limited to one-versus-rest schemes.
'newton-cg', 'lbfgs' and 'sag' only handle L2 penalty, whereas 'liblinear' and 'saga' handle L1 penalty.
'liblinear' might be slower in LogisticRegressionCV because it does not handle warm-starting.

Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

.. versionadded:: 0.17 Stochastic Average Gradient descent solver. .. versionadded:: 0.19 SAGA solver.

tol : float, default=1e-4 Tolerance for stopping criteria.

max_iter : int, default=100 Maximum number of iterations of the optimization algorithm.

class_weight : dict or 'balanced', default=None Weights associated with classes in the form ``class_label: weight``. If not given, all classes are supposed to have weight one.

The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``.

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

.. versionadded:: 0.17 class_weight == 'balanced'

n_jobs : int, default=None Number of CPU cores used during the cross-validation loop. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.

verbose : int, default=0 For the 'liblinear', 'sag' and 'lbfgs' solvers set verbose to any positive number for verbosity.

refit : bool, default=True If set to True, the scores are averaged across all folds, and the coefs and the C that corresponds to the best score is taken, and a final refit is done using these parameters. Otherwise the coefs, intercepts and C that correspond to the best scores across folds are averaged.

intercept_scaling : float, default=1 Useful only when the solver 'liblinear' is used and self.fit_intercept is set to True. In this case, x becomes x, self.intercept_scaling, i.e. a "synthetic" feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``.

Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

multi_class : 'auto, 'ovr', 'multinomial', default='auto' If the option chosen is 'ovr', then a binary problem is fit for each label. For 'multinomial' the loss minimised is the multinomial loss fit across the entire probability distribution, *even when the data is binary*. 'multinomial' is unavailable when solver='liblinear'. 'auto' selects 'ovr' if the data is binary, or if solver='liblinear', and otherwise selects 'multinomial'.

.. versionadded:: 0.18 Stochastic Average Gradient descent solver for 'multinomial' case. .. versionchanged:: 0.22 Default changed from 'ovr' to 'auto' in 0.22.

random_state : int, RandomState instance, default=None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. Used when `solver='sag'` or `solver='liblinear'`. Note that this only applies to the solver and not the cross-validation generator.

l1_ratios : list of float, default=None The list of Elastic-Net mixing parameter, with ``0 <= l1_ratio <= 1``. Only used if ``penalty='elasticnet'``. A value of 0 is equivalent to using ``penalty='l2'``, while 1 is equivalent to using ``penalty='l1'``. For ``0 < l1_ratio <1``, the penalty is a combination of L1 and L2.

Attributes ---------- classes_ : ndarray of shape (n_classes, ) A list of class labels known to the classifier.

coef_ : ndarray of shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function.

`coef_` is of shape (1, n_features) when the given problem is binary.

intercept_ : ndarray of shape (1,) or (n_classes,) Intercept (a.k.a. bias) added to the decision function.

If `fit_intercept` is set to False, the intercept is set to zero. `intercept_` is of shape(1,) when the problem is binary.

Cs_ : ndarray of shape (n_cs) Array of C i.e. inverse of regularization parameter values used for cross-validation.

l1_ratios_ : ndarray of shape (n_l1_ratios) Array of l1_ratios used for cross-validation. If no l1_ratio is used (i.e. penalty is not 'elasticnet'), this is set to ``None``

coefs_paths_ : ndarray of shape (n_folds, n_cs, n_features) or (n_folds, n_cs, n_features + 1) dict with classes as the keys, and the path of coefficients obtained during cross-validating across each fold and then across each Cs after doing an OvR for the corresponding class as values. If the 'multi_class' option is set to 'multinomial', then the coefs_paths are the coefficients corresponding to each class. Each dict value has shape ``(n_folds, n_cs, n_features)`` or ``(n_folds, n_cs, n_features + 1)`` depending on whether the intercept is fit or not. If ``penalty='elasticnet'``, the shape is ``(n_folds, n_cs, n_l1_ratios_, n_features)`` or ``(n_folds, n_cs, n_l1_ratios_, n_features + 1)``.

scores_ : dict dict with classes as the keys, and the values as the grid of scores obtained during cross-validating each fold, after doing an OvR for the corresponding class. If the 'multi_class' option given is 'multinomial' then the same scores are repeated across all classes, since this is the multinomial class. Each dict value has shape ``(n_folds, n_cs`` or ``(n_folds, n_cs, n_l1_ratios)`` if ``penalty='elasticnet'``.

C_ : ndarray of shape (n_classes,) or (n_classes - 1,) Array of C that maps to the best scores across every class. If refit is set to False, then for each class, the best C is the average of the C's that correspond to the best scores for each fold. `C_` is of shape(n_classes,) when the problem is binary.

l1_ratio_ : ndarray of shape (n_classes,) or (n_classes - 1,) Array of l1_ratio that maps to the best scores across every class. If refit is set to False, then for each class, the best l1_ratio is the average of the l1_ratio's that correspond to the best scores for each fold. `l1_ratio_` is of shape(n_classes,) when the problem is binary.

n_iter_ : ndarray of shape (n_classes, n_folds, n_cs) or (1, n_folds, n_cs) Actual number of iterations for all classes, folds and Cs. In the binary or multinomial cases, the first dimension is equal to 1. If ``penalty='elasticnet'``, the shape is ``(n_classes, n_folds, n_cs, n_l1_ratios)`` or ``(1, n_folds, n_cs, n_l1_ratios)``.

Examples -------- >>> from sklearn.datasets import load_iris >>> from sklearn.linear_model import LogisticRegressionCV >>> X, y = load_iris(return_X_y=True) >>> clf = LogisticRegressionCV(cv=5, random_state=0).fit(X, y) >>> clf.predict(X:2, :) array(0, 0) >>> clf.predict_proba(X:2, :).shape (2, 3) >>> clf.score(X, y) 0.98...