package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
val get_py : string -> Py.Object.t

Get an attribute of this module as a Py.Object.t. This is useful to pass a Python function to another function.

module AdditiveChi2Sampler : sig ... end
module BaseEstimator : sig ... end
module Nystroem : sig ... end
module RBFSampler : sig ... end
module SkewedChi2Sampler : sig ... end
module TransformerMixin : sig ... end
val as_float_array : ?copy:bool -> ?force_all_finite:[ `Bool of bool | `Allow_nan ] -> x:Arr.t -> unit -> Arr.t

Converts an array-like to an array of floats.

The new dtype will be np.float32 or np.float64, depending on the original type. The function can create a copy or modify the argument depending on the argument copy.

Parameters ---------- X : array-like, sparse matrix

copy : bool, optional If True, a copy of X will be created. If False, a copy may still be returned if X's dtype is not a floating point type.

force_all_finite : boolean or 'allow-nan', (default=True) Whether to raise an error on np.inf and np.nan in X. The possibilities are:

  • True: Force all values of X to be finite.
  • False: accept both np.inf and np.nan in X.
  • 'allow-nan': accept only np.nan values in X. Values cannot be infinite.

.. versionadded:: 0.20 ``force_all_finite`` accepts the string ``'allow-nan'``.

Returns ------- XT : array, sparse matrix An array of type np.float

val check_array : ?accept_sparse:[ `S of string | `Bool of bool | `StringList of string list ] -> ?accept_large_sparse:bool -> ?dtype: [ `S of string | `Dtype of Py.Object.t | `TypeList of Py.Object.t | `None ] -> ?order:[ `F | `C ] -> ?copy:bool -> ?force_all_finite:[ `Bool of bool | `Allow_nan ] -> ?ensure_2d:bool -> ?allow_nd:bool -> ?ensure_min_samples:int -> ?ensure_min_features:int -> ?warn_on_dtype:bool -> ?estimator:[ `S of string | `Estimator of Py.Object.t ] -> array:Py.Object.t -> unit -> Py.Object.t

Input validation on an array, list, sparse matrix or similar.

By default, the input is checked to be a non-empty 2D array containing only finite values. If the dtype of the array is object, attempt converting to float, raising on failure.

Parameters ---------- array : object Input object to check / convert.

accept_sparse : string, boolean or list/tuple of strings (default=False) Strings representing allowed sparse matrix formats, such as 'csc', 'csr', etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.

accept_large_sparse : bool (default=True) If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse=False will cause it to be accepted only if its indices are stored with a 32-bit dtype.

.. versionadded:: 0.20

dtype : string, type, list of types or None (default="numeric") Data type of result. If None, the dtype of the input is preserved. If "numeric", dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.

order : 'F', 'C' or None (default=None) Whether an array will be forced to be fortran or c-style. When order is None (default), then if copy=False, nothing is ensured about the memory layout of the output array; otherwise (copy=True) the memory layout of the returned array is kept as close as possible to the original array.

copy : boolean (default=False) Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

force_all_finite : boolean or 'allow-nan', (default=True) Whether to raise an error on np.inf and np.nan in array. The possibilities are:

  • True: Force all values of array to be finite.
  • False: accept both np.inf and np.nan in array.
  • 'allow-nan': accept only np.nan values in array. Values cannot be infinite.

For object dtyped data, only np.nan is checked and not np.inf.

.. versionadded:: 0.20 ``force_all_finite`` accepts the string ``'allow-nan'``.

ensure_2d : boolean (default=True) Whether to raise a value error if array is not 2D.

allow_nd : boolean (default=False) Whether to allow array.ndim > 2.

ensure_min_samples : int (default=1) Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check.

ensure_min_features : int (default=1) Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and ``ensure_2d`` is True. Setting to 0 disables this check.

warn_on_dtype : boolean or None, optional (default=None) Raise DataConversionWarning if the dtype of the input data structure does not match the requested dtype, causing a memory copy.

.. deprecated:: 0.21 ``warn_on_dtype`` is deprecated in version 0.21 and will be removed in 0.23.

estimator : str or estimator instance (default=None) If passed, include the name of the estimator in warning messages.

Returns ------- array_converted : object The converted and validated array.

val check_is_fitted : ?attributes:[ `S of string | `Arr of Arr.t | `StringList of string list ] -> ?msg:string -> ?all_or_any:[ `Callable of Py.Object.t | `PyObject of Py.Object.t ] -> estimator:Py.Object.t -> unit -> Py.Object.t

Perform is_fitted validation for estimator.

Checks if the estimator is fitted by verifying the presence of fitted attributes (ending with a trailing underscore) and otherwise raises a NotFittedError with the given message.

This utility is meant to be used internally by estimators themselves, typically in their own predict / transform methods.

Parameters ---------- estimator : estimator instance. estimator instance for which the check is performed.

attributes : str, list or tuple of str, default=None Attribute name(s) given as string or a list/tuple of strings Eg.: ``"coef_", "estimator_", ..., "coef_"``

If `None`, `estimator` is considered fitted if there exist an attribute that ends with a underscore and does not start with double underscore.

msg : string The default error message is, "This %(name)s instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator."

For custom messages if "%(name)s" is present in the message string, it is substituted for the estimator name.

Eg. : "Estimator, %(name)s, must be fitted before sparsifying".

all_or_any : callable, all, any, default all Specify whether all or any of the given attributes must exist.

Returns ------- None

Raises ------ NotFittedError If the attributes are not found.

val check_random_state : seed:[ `I of int | `RandomState of Py.Object.t | `None ] -> unit -> Py.Object.t

Turn seed into a np.random.RandomState instance

Parameters ---------- seed : None | int | instance of RandomState If seed is None, return the RandomState singleton used by np.random. If seed is an int, return a new RandomState instance seeded with seed. If seed is already a RandomState instance, return it. Otherwise raise ValueError.

val pairwise_kernels : ?y:Arr.t -> ?metric:[ `S of string | `Callable of Py.Object.t ] -> ?filter_params:bool -> ?n_jobs:int -> ?kwds:(string * Py.Object.t) list -> x:[ `Arr of Arr.t | `Otherwise of Py.Object.t ] -> unit -> Py.Object.t

Compute the kernel between arrays X and optional array Y.

This method takes either a vector array or a kernel matrix, and returns a kernel matrix. If the input is a vector array, the kernels are computed. If the input is a kernel matrix, it is returned instead.

This method provides a safe way to take a kernel matrix as input, while preserving compatibility with many other algorithms that take a vector array.

If Y is given (default is None), then the returned matrix is the pairwise kernel between the arrays from both X and Y.

Valid values for metric are: 'additive_chi2', 'chi2', 'linear', 'poly', 'polynomial', 'rbf', 'laplacian', 'sigmoid', 'cosine'

Read more in the :ref:`User Guide <metrics>`.

Parameters ---------- X : array n_samples_a, n_samples_a if metric == "precomputed", or, n_samples_a, n_features otherwise Array of pairwise kernels between samples, or a feature array.

Y : array n_samples_b, n_features A second feature array only if X has shape n_samples_a, n_features.

metric : string, or callable The metric to use when calculating kernel between instances in a feature array. If metric is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. If metric is "precomputed", X is assumed to be a kernel matrix. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from X as input and return the corresponding kernel value as a single number. This means that callables from :mod:`sklearn.metrics.pairwise` are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.

filter_params : boolean Whether to filter invalid parameters or not.

n_jobs : int or None, optional (default=None) The number of jobs to use for the computation. This works by breaking down the pairwise matrix into n_jobs even slices and computing them in parallel.

``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.

**kwds : optional keyword parameters Any further parameters are passed directly to the kernel function.

Returns ------- K : array n_samples_a, n_samples_a or n_samples_a, n_samples_b A kernel matrix K such that K_, j is the kernel between the ith and jth vectors of the given matrix X, if Y is None. If Y is not None, then K_, j is the kernel between the ith array from X and the jth array from Y.

Notes ----- If metric is 'precomputed', Y is ignored and X is returned.

val safe_sparse_dot : ?dense_output:Py.Object.t -> a:Arr.t -> b:Py.Object.t -> unit -> Arr.t

Dot product that handle the sparse matrix case correctly

Parameters ---------- a : array or sparse matrix b : array or sparse matrix dense_output : boolean, (default=False) When False, ``a`` and ``b`` both being sparse will yield sparse output. When True, output will always be a dense array.

Returns ------- dot_product : array or sparse matrix sparse if ``a`` and ``b`` are sparse and ``dense_output=False``.

val svd : ?full_matrices:Py.Object.t -> ?compute_uv:Py.Object.t -> ?overwrite_a:Py.Object.t -> ?check_finite:Py.Object.t -> ?lapack_driver:Py.Object.t -> a:Py.Object.t -> unit -> Arr.t

Singular Value Decomposition.

Factorizes the matrix `a` into two unitary matrices ``U`` and ``Vh``, and a 1-D array ``s`` of singular values (real, non-negative) such that ``a == U @ S @ Vh``, where ``S`` is a suitably shaped matrix of zeros with main diagonal ``s``.

Parameters ---------- a : (M, N) array_like Matrix to decompose. full_matrices : bool, optional If True (default), `U` and `Vh` are of shape ``(M, M)``, ``(N, N)``. If False, the shapes are ``(M, K)`` and ``(K, N)``, where ``K = min(M, N)``. compute_uv : bool, optional Whether to compute also ``U`` and ``Vh`` in addition to ``s``. Default is True. overwrite_a : bool, optional Whether to overwrite `a`; may improve performance. Default is False. check_finite : bool, optional Whether to check that the input matrix contains only finite numbers. Disabling may give a performance gain, but may result in problems (crashes, non-termination) if the inputs do contain infinities or NaNs. lapack_driver : 'gesdd', 'gesvd', optional Whether to use the more efficient divide-and-conquer approach (``'gesdd'``) or general rectangular approach (``'gesvd'``) to compute the SVD. MATLAB and Octave use the ``'gesvd'`` approach. Default is ``'gesdd'``.

.. versionadded:: 0.18

Returns ------- U : ndarray Unitary matrix having left singular vectors as columns. Of shape ``(M, M)`` or ``(M, K)``, depending on `full_matrices`. s : ndarray The singular values, sorted in non-increasing order. Of shape (K,), with ``K = min(M, N)``. Vh : ndarray Unitary matrix having right singular vectors as rows. Of shape ``(N, N)`` or ``(K, N)`` depending on `full_matrices`.

For ``compute_uv=False``, only ``s`` is returned.

Raises ------ LinAlgError If SVD computation does not converge.

See also -------- svdvals : Compute singular values of a matrix. diagsvd : Construct the Sigma matrix, given the vector s.

Examples -------- >>> from scipy import linalg >>> m, n = 9, 6 >>> a = np.random.randn(m, n) + 1.j*np.random.randn(m, n) >>> U, s, Vh = linalg.svd(a) >>> U.shape, s.shape, Vh.shape ((9, 9), (6,), (6, 6))

Reconstruct the original matrix from the decomposition:

>>> sigma = np.zeros((m, n)) >>> for i in range(min(m, n)): ... sigmai, i = si >>> a1 = np.dot(U, np.dot(sigma, Vh)) >>> np.allclose(a, a1) True

Alternatively, use ``full_matrices=False`` (notice that the shape of ``U`` is then ``(m, n)`` instead of ``(m, m)``):

>>> U, s, Vh = linalg.svd(a, full_matrices=False) >>> U.shape, s.shape, Vh.shape ((9, 6), (6,), (6, 6)) >>> S = np.diag(s) >>> np.allclose(a, np.dot(U, np.dot(S, Vh))) True

>>> s2 = linalg.svd(a, compute_uv=False) >>> np.allclose(s, s2) True

OCaml

Innovation. Community. Security.