package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type t
val of_pyobject : Py.Object.t -> t
val to_pyobject : t -> Py.Object.t
val create : ?step:[ `Int of int | `Float of float ] -> ?min_features_to_select:int -> ?cv: [ `Int of int | `CrossValGenerator of Py.Object.t | `Ndarray of Ndarray.t ] -> ?scoring:[ `String of string | `Callable of Py.Object.t | `None ] -> ?verbose:int -> ?n_jobs:[ `Int of int | `None ] -> estimator:Py.Object.t -> unit -> t

Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.

See glossary entry for :term:`cross-validation estimator`.

Read more in the :ref:`User Guide <rfe>`.

Parameters ---------- estimator : object A supervised learning estimator with a ``fit`` method that provides information about feature importance either through a ``coef_`` attribute or through a ``feature_importances_`` attribute.

step : int or float, optional (default=1) If greater than or equal to 1, then ``step`` corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), then ``step`` corresponds to the percentage (rounded down) of features to remove at each iteration. Note that the last iteration may remove fewer than ``step`` features in order to reach ``min_features_to_select``.

min_features_to_select : int, (default=1) The minimum number of features to be selected. This number of features will always be scored, even if the difference between the original feature count and ``min_features_to_select`` isn't divisible by ``step``.

cv : int, cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,
  • integer, to specify the number of folds.
  • :term:`CV splitter`,
  • An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, if ``y`` is binary or multiclass, :class:`sklearn.model_selection.StratifiedKFold` is used. If the estimator is a classifier or if ``y`` is neither binary nor multiclass, :class:`sklearn.model_selection.KFold` is used.

Refer :ref:`User Guide <cross_validation>` for the various cross-validation strategies that can be used here.

.. versionchanged:: 0.22 ``cv`` default value of None changed from 3-fold to 5-fold.

scoring : string, callable or None, optional, (default=None) A string (see model evaluation documentation) or a scorer callable object / function with signature ``scorer(estimator, X, y)``.

verbose : int, (default=0) Controls verbosity of output.

n_jobs : int or None, optional (default=None) Number of cores to run in parallel while fitting across folds. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.

Attributes ---------- n_features_ : int The number of selected features with cross-validation.

support_ : array of shape n_features The mask of selected features.

ranking_ : array of shape n_features The feature ranking, such that `ranking_i` corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1.

grid_scores_ : array of shape n_subsets_of_features The cross-validation scores such that ``grid_scores_i`` corresponds to the CV score of the i-th subset of features.

estimator_ : object The external estimator fit on the reduced dataset.

Notes ----- The size of ``grid_scores_`` is equal to ``ceil((n_features - min_features_to_select) / step) + 1``, where step is the number of features removed at each iteration.

Allows NaN/Inf in the input if the underlying estimator does as well.

Examples -------- The following example shows how to retrieve the a-priori not known 5 informative features in the Friedman #1 dataset.

>>> from sklearn.datasets import make_friedman1 >>> from sklearn.feature_selection import RFECV >>> from sklearn.svm import SVR >>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0) >>> estimator = SVR(kernel="linear") >>> selector = RFECV(estimator, step=1, cv=5) >>> selector = selector.fit(X, y) >>> selector.support_ array( True, True, True, True, True, False, False, False, False, False) >>> selector.ranking_ array(1, 1, 1, 1, 1, 6, 4, 3, 2, 5)

See also -------- RFE : Recursive feature elimination

References ----------

.. 1 Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., "Gene selection for cancer classification using support vector machines", Mach. Learn., 46(1-3), 389--422, 2002.

val decision_function : x:[ `Ndarray of Ndarray.t | `SparseMatrix of Csr_matrix.t ] -> t -> Ndarray.t

Compute the decision function of ``X``.

Parameters ---------- X : array-like or sparse matrix of shape (n_samples, n_features) The input samples. Internally, it will be converted to ``dtype=np.float32`` and if a sparse matrix is provided to a sparse ``csr_matrix``.

Returns ------- score : array, shape = n_samples, n_classes or n_samples The decision function of the input samples. The order of the classes corresponds to that in the attribute :term:`classes_`. Regression and binary classification produce an array of shape n_samples.

val fit : ?groups:[ `Ndarray of Ndarray.t | `None ] -> x:[ `Ndarray of Ndarray.t | `SparseMatrix of Csr_matrix.t ] -> y:Ndarray.t -> t -> t

Fit the RFE model and automatically tune the number of selected features.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Training vector, where `n_samples` is the number of samples and `n_features` is the total number of features.

y : array-like of shape (n_samples,) Target values (integers for classification, real numbers for regression).

groups : array-like of shape (n_samples,) or None Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a "Group" :term:`cv` instance (e.g., :class:`~sklearn.model_selection.GroupKFold`).

val fit_transform : ?y:Ndarray.t -> ?fit_params:(string * Py.Object.t) list -> x:Ndarray.t -> t -> Ndarray.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters ---------- X : numpy array of shape n_samples, n_features Training set.

y : numpy array of shape n_samples Target values.

**fit_params : dict Additional fit parameters.

Returns ------- X_new : numpy array of shape n_samples, n_features_new Transformed array.

val get_params : ?deep:bool -> t -> Py.Object.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

val get_support : ?indices:bool -> t -> Ndarray.t

Get a mask, or integer index, of the features selected

Parameters ---------- indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.

Returns ------- support : array An index that selects the retained features from a feature vector. If `indices` is False, this is a boolean array of shape # input features, in which an element is True iff its corresponding feature is selected for retention. If `indices` is True, this is an integer array of shape # output features whose values are indices into the input feature vector.

val inverse_transform : x:Ndarray.t -> t -> Ndarray.t

Reverse the transformation operation

Parameters ---------- X : array of shape n_samples, n_selected_features The input samples.

Returns ------- X_r : array of shape n_samples, n_original_features `X` with columns of zeros inserted where features would have been removed by :meth:`transform`.

val predict : x:Ndarray.t -> t -> Ndarray.t

Reduce X to the selected features and then predict using the underlying estimator.

Parameters ---------- X : array of shape n_samples, n_features The input samples.

Returns ------- y : array of shape n_samples The predicted target values.

val predict_log_proba : x:Ndarray.t -> t -> Ndarray.t

Predict class log-probabilities for X.

Parameters ---------- X : array of shape n_samples, n_features The input samples.

Returns ------- p : array of shape (n_samples, n_classes) The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:`classes_`.

val predict_proba : x:[ `Ndarray of Ndarray.t | `SparseMatrix of Csr_matrix.t ] -> t -> Ndarray.t

Predict class probabilities for X.

Parameters ---------- X : array-like or sparse matrix of shape (n_samples, n_features) The input samples. Internally, it will be converted to ``dtype=np.float32`` and if a sparse matrix is provided to a sparse ``csr_matrix``.

Returns ------- p : array of shape (n_samples, n_classes) The class probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:`classes_`.

val score : x:Ndarray.t -> y:Ndarray.t -> t -> Py.Object.t

Reduce X to the selected features and then return the score of the underlying estimator.

Parameters ---------- X : array of shape n_samples, n_features The input samples.

y : array of shape n_samples The target values.

val set_params : ?params:(string * Py.Object.t) list -> t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

val transform : x:Ndarray.t -> t -> Ndarray.t

Reduce X to the selected features.

Parameters ---------- X : array of shape n_samples, n_features The input samples.

Returns ------- X_r : array of shape n_samples, n_selected_features The input samples with only the selected features.

val n_features_ : t -> int

Attribute n_features_: see constructor for documentation

val support_ : t -> Ndarray.t

Attribute support_: see constructor for documentation

val ranking_ : t -> Ndarray.t

Attribute ranking_: see constructor for documentation

val grid_scores_ : t -> Ndarray.t

Attribute grid_scores_: see constructor for documentation

val estimator_ : t -> Py.Object.t

Attribute estimator_: see constructor for documentation

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Format.formatter -> t -> unit

Pretty-print the object to a formatter.

OCaml

Innovation. Community. Security.