package scipy

You can search for identifiers within the package.

in-package search v0.2.0

package scipy

scipy
- Scipy
  - Cluster
    
    Hierarchy
    
    ClusterNode
    
    ClusterWarning
    
    Deque
    
    Vq
    
    ClusterError
    
    Deque
  - Conftest
    
    FPUModeChangeWarning
    
    LooseVersion
  - Constants
    
    Codata
    
    ConstantWarning
    
    Constants
  - Fft
  - Fftpack
    
    Basic
    
    Convolve
    
    Helper
    
    Pseudo_diffs
    
    Realtransforms
  - Integrate
    
    AccuracyWarning
    
    BDF
    
    Complex_ode
    
    DOP853
    
    DenseOutput
    
    IntegrationWarning
    
    LSODA
    
    Lsoda
    
    Ode
    
    OdeSolution
    
    OdeSolver
    
    Odepack
    
    ODEintWarning
    
    Quadpack
    
    Error
    
    Partial
    
    RK23
    
    RK45
    
    Radau
    
    Vode
  - Interpolate
    
    Akima1DInterpolator
    
    BPoly
    
    BSpline
    
    BarycentricInterpolator
    
    BivariateSpline
    
    CloughTocher2DInterpolator
    
    CubicHermiteSpline
    
    CubicSpline
    
    Dfitpack
    
    Fitpack
    
    Fitpack2
    
    SphereBivariateSpline
    
    Interp1d
    
    Interp2d
    
    Interpnd
    
    GradientEstimationWarning
    
    NDInterpolatorBase
    
    Interpolate
    
    Intp
    
    Poly1d
    
    InterpolatedUnivariateSpline
    
    KroghInterpolator
    
    LSQBivariateSpline
    
    LSQSphereBivariateSpline
    
    LSQUnivariateSpline
    
    LinearNDInterpolator
    
    NdPPoly
    
    Ndgriddata
    
    CKDTree
    
    NearestNDInterpolator
    
    PPoly
    
    Pchip
    
    PchipInterpolator
    
    Polyint
    
    Rbf
    
    Rbf'
    
    RectBivariateSpline
    
    RectSphereBivariateSpline
    
    RegularGridInterpolator
    
    SmoothBivariateSpline
    
    SmoothSphereBivariateSpline
    
    UnivariateSpline
  - Io
    
    FortranEOFError
    
    FortranFile
    
    FortranFormattingError
    
    Harwell_boeing
    
    HBFile
    
    HBInfo
    
    HBMatrixType
    
    Hb
    
    Csc_matrix
    
    ExpFormat
    
    FortranFormatParser
    
    IntFormat
    
    LineOverflow
    
    MalformedHeader
    
    Idl
    
    AttrDict
    
    ObjectPointer
    
    Pointer
    
    Matlab
    
    Mio
    
    MatFile4Reader
    
    MatFile4Writer
    
    MatFile5Reader
    
    MatFile5Writer
    
    Mio4
    
    MatFileReader
    
    VarHeader4
    
    VarReader4
    
    VarWriter4
    
    Mio5
    
    BytesIO
    
    EmptyStructMarker
    
    MatFileReader
    
    MatReadError
    
    MatReadWarning
    
    MatWriteError
    
    Mat_struct
    
    MatlabFunction
    
    MatlabObject
    
    VarReader5
    
    VarWriter5
    
    ZlibInputStream
    
    Mio5_params
    
    MatlabOpaque
    
    Mio5_utils
    
    Csc_matrix
    
    VarHeader5
    
    Mio_utils
    
    Miobase
    
    MatVarReader
    
    Streams
    
    GenericStream
    
    Mmio
    
    Coo_matrix
    
    MMFile
    
    Ndarray
    
    Netcdf
    
    Dtype
    
    OrderedDict
    
    Netcdf_file
    
    Netcdf_variable
  - Linalg
    
    Basic
    
    Blas
    
    Cython_blas
    
    Cython_lapack
    
    Decomp
    
    Inexact
    
    Decomp_cholesky
    
    Decomp_lu
    
    Decomp_qr
    
    Decomp_schur
    
    Single
    
    Decomp_svd
    
    Flinalg
    
    Lapack
    
    LinAlgError
    
    LinAlgWarning
    
    Matfuncs
    
    Single
    
    Misc
    
    Special_matrices
  - Misc
    
    Doccer
  - Ndimage
    
    Filters
    
    Iterable
    
    Fourier
    
    Interpolation
    
    Measurements
    
    Morphology
  - Obj
  - Odr
    
    Data
    
    Model
    
    Models
    
    ODR
    
    OdrError
    
    OdrStop
    
    OdrWarning
    
    Odrpack
    
    Output
    
    RealData
  - Optimize
    
    BFGS
    
    Bounds
    
    Cobyla
    
    Izip
    
    HessianUpdateStrategy
    
    LbfgsInvHessProduct
    
    Lbfgsb
    
    Float64
    
    LinearOperator
    
    MemoizeJac
    
    LinearConstraint
    
    Linesearch
    
    LineSearchWarning
    
    Minpack
    
    Error
    
    Finfo
    
    Minpack2
    
    ModuleTNC
    
    Nonlin
    
    Anderson
    
    BroydenFirst
    
    BroydenSecond
    
    DiagBroyden
    
    ExcitingMixing
    
    GenericBroyden
    
    InverseJacobian
    
    Jacobian
    
    KrylovJacobian
    
    LinearMixing
    
    LowRankMatrix
    
    NoConvergence
    
    TerminationCondition
    
    NonlinearConstraint
    
    Optimize
    
    Brent
    
    LineSearchWarning
    
    MapWrapper
    
    ScalarFunction
    
    OptimizeResult
    
    OptimizeWarning
    
    RootResults
    
    SR1
    
    Slsqp
    
    Finfo
    
    Tnc
    
    MemoizeJac
    
    Zeros
    
    TOMS748Solver
  - Setup
  - Signal
    
    BadCoefficients
    
    Bsplines
    
    Dlti
    
    Filter_design
    
    Sp_fft
    
    Fir_filter_design
    
    Lti
    
    Lti_conversion
    
    Ltisys
    
    Bunch
    
    LinearTimeInvariant
    
    StateSpaceContinuous
    
    StateSpaceDiscrete
    
    TransferFunctionContinuous
    
    TransferFunctionDiscrete
    
    ZerosPolesGainContinuous
    
    ZerosPolesGainDiscrete
    
    Signaltools
    
    CKDTree
    
    Sp_fft
    
    Sigtools
    
    Spectral
    
    Sp_fft
    
    Spline
    
    StateSpace
    
    TransferFunction
    
    Waveforms
    
    Wavelets
    
    Windows
    
    Windows
    
    Sp_fft
    
    ZerosPolesGain
  - Sparse
    
    Base
    
    SparseFormatWarning
    
    Bsr
    
    Bsr_matrix
    
    Compressed
    
    IndexMixin
    
    Construct
    
    Partial
    
    Coo
    
    Coo_matrix
    
    Csc
    
    Csc_matrix
    
    Csgraph
    
    NegativeCycleError
    
    Csr
    
    Csr_matrix
    
    Data
    
    Dia
    
    Dia_matrix
    
    Dok
    
    IndexMixin
    
    Dok_matrix
    
    Extract
    
    Lil
    
    IndexMixin
    
    Lil_matrix
    
    Linalg
    
    Arpack
    
    IterInv
    
    IterOpInv
    
    LuInv
    
    ReentrancyLock
    
    SpLuInv
    
    ArpackError
    
    ArpackNoConvergence
    
    Dsolve
    
    Linsolve
    
    Eigen
    
    Arpack
    
    Interface
    
    IdentityOperator
    
    MatrixLinearOperator
    
    Isolve
    
    Iterative
    
    Utils
    
    Matrix
    
    Iterative
    
    LinearOperator
    
    Linsolve
    
    Matfuncs
    
    MatrixPowerOperator
    
    ProductOperator
    
    MatrixRankWarning
    
    SuperLU
    
    Utils
    
    IdentityOperator
    
    Matrix
    
    SparseEfficiencyWarning
    
    SparseWarning
    
    Spmatrix
    
    Sputils
  - Spatial
    
    CKDTree
    
    Ckdtree
    
    CKDTreeNode
    
    Coo_entries
    
    Ordered_pairs
    
    ConvexHull
    
    Delaunay
    
    Distance
    
    MetricInfo
    
    Partial
    
    HalfspaceIntersection
    
    KDTree
    
    Kdtree
    
    Qhull
    
    QhullError
    
    Rectangle
    
    SphericalVoronoi
    
    Transform
    
    Rotation
    
    Rotation'
    
    RotationSpline
    
    Slerp
    
    Voronoi
  - Special
    
    Cython_special
    
    Errstate
    
    Orthogonal
    
    Cephes
    
    Orthopoly1d
    
    Sf_error
    
    Specfun
    
    SpecialFunctionError
    
    SpecialFunctionWarning
    
    Spfun_stats
  - Stats
    
    Contingency
    
    Distributions
    
    Alpha_gen
    
    Anglit_gen
    
    Arcsine_gen
    
    Argus_gen
    
    Bernoulli_gen
    
    Beta_gen
    
    Betabinom_gen
    
    Betaprime_gen
    
    Binom_gen
    
    Boltzmann_gen
    
    Bradford_gen
    
    Burr12_gen
    
    Burr_gen
    
    Cauchy_gen
    
    Chi2_gen
    
    Chi_gen
    
    Cosine_gen
    
    Crystalball_gen
    
    Dgamma_gen
    
    Dlaplace_gen
    
    Dweibull_gen
    
    Erlang_gen
    
    Expon_gen
    
    Exponnorm_gen
    
    Exponpow_gen
    
    Exponweib_gen
    
    F_gen
    
    Fatiguelife_gen
    
    Fisk_gen
    
    Foldcauchy_gen
    
    Foldnorm_gen
    
    Frechet_l_gen
    
    Frechet_r_gen
    
    Gamma_gen
    
    Gausshyper_gen
    
    Genexpon_gen
    
    Genextreme_gen
    
    Gengamma_gen
    
    Genhalflogistic_gen
    
    Geninvgauss_gen
    
    Genlogistic_gen
    
    Gennorm_gen
    
    Genpareto_gen
    
    Geom_gen
    
    Gilbrat_gen
    
    Gompertz_gen
    
    Gumbel_l_gen
    
    Gumbel_r_gen
    
    Halfcauchy_gen
    
    Halfgennorm_gen
    
    Halflogistic_gen
    
    Halfnorm_gen
    
    Hypergeom_gen
    
    Hypsecant_gen
    
    Invgamma_gen
    
    Invgauss_gen
    
    Invweibull_gen
    
    Johnsonsb_gen
    
    Johnsonsu_gen
    
    Kappa3_gen
    
    Kappa4_gen
    
    Ksone_gen
    
    Kstwo_gen
    
    Kstwobign_gen
    
    Laplace_gen
    
    Levy_gen
    
    Levy_l_gen
    
    Levy_stable_gen
    
    Loggamma_gen
    
    Logistic_gen
    
    Loglaplace_gen
    
    Lognorm_gen
    
    Logser_gen
    
    Lomax_gen
    
    Maxwell_gen
    
    Mielke_gen
    
    Moyal_gen
    
    Nakagami_gen
    
    Nbinom_gen
    
    Ncf_gen
    
    Nct_gen
    
    Ncx2_gen
    
    Norm_gen
    
    Norminvgauss_gen
    
    Pareto_gen
    
    Pearson3_gen
    
    Planck_gen
    
    Poisson_gen
    
    Powerlaw_gen
    
    Powerlognorm_gen
    
    Powernorm_gen
    
    Randint_gen
    
    Rayleigh_gen
    
    Rdist_gen
    
    Recipinvgauss_gen
    
    Reciprocal_gen
    
    Rice_gen
    
    Rv_frozen
    
    Semicircular_gen
    
    Skellam_gen
    
    Skew_norm_gen
    
    T_gen
    
    Trapz_gen
    
    Triang_gen
    
    Truncexpon_gen
    
    Truncnorm_gen
    
    Tukeylambda_gen
    
    Uniform_gen
    
    Vonmises_gen
    
    Wald_gen
    
    Weibull_max_gen
    
    Weibull_min_gen
    
    Wrapcauchy_gen
    
    Yulesimon_gen
    
    Zipf_gen
    
    F_onewayBadInputSizesWarning
    
    F_onewayConstantInputWarning
    
    Gaussian_kde
    
    Kde
    
    Morestats
    
    AndersonResult
    
    Anderson_ksampResult
    
    AnsariResult
    
    BartlettResult
    
    FlignerResult
    
    LeveneResult
    
    Mean
    
    Rv_generic
    
    ShapiroResult
    
    Std_dev
    
    Variance
    
    WilcoxonResult
    
    Mstats
    
    Mstats_basic
    
    BrunnerMunzelResult
    
    DescribeResult
    
    F_onewayResult
    
    FriedmanchisquareResult
    
    KendalltauResult
    
    KruskalResult
    
    KurtosistestResult
    
    LinregressResult
    
    MannwhitneyuResult
    
    ModeResult
    
    NormaltestResult
    
    PointbiserialrResult
    
    SkewtestResult
    
    SpearmanrResult
    
    Ttest_1sampResult
    
    Ttest_indResult
    
    Ttest_relResult
    
    Mstats_extras
    
    MaskedArray
    
    Mvn
    
    PearsonRConstantInputWarning
    
    PearsonRNearConstantInputWarning
    
    Rv_continuous
    
    Rv_discrete
    
    Rv_histogram
    
    SpearmanRConstantInputWarning
    
    Statlib
    
    Stats
    
    BrunnerMunzelResult
    
    CumfreqResult
    
    DescribeResult
    
    F_onewayResult
    
    FriedmanchisquareResult
    
    HistogramResult
    
    Jarque_beraResult
    
    KendalltauResult
    
    KruskalResult
    
    Ks_2sampResult
    
    KstestResult
    
    KurtosistestResult
    
    MGCResult
    
    MannwhitneyuResult
    
    MapWrapper
    
    ModeResult
    
    NormaltestResult
    
    PointbiserialrResult
    
    Power_divergenceResult
    
    RanksumsResult
    
    RelfreqResult
    
    RepeatedResults
    
    SigmaclipResult
    
    SkewtestResult
    
    SpearmanrResult
    
    Ttest_1sampResult
    
    Ttest_indResult
    
    Ttest_relResult
    
    WeightedTauResult
  - Version
  - Wrap_utils
    
    Types
  - Wrap_version

Legend:
Library
Module
Module type
Parameter
Class
Class type

val get_py : string -> Py.Object.t

Get an attribute of this module as a Py.Object.t. This is useful to pass a Python function to another function.

module ClusterError : sig ... end

module Deque : sig ... end

val cdist : 
  ?metric:[ `Callable of Py.Object.t | `S of string ] ->
  ?kwargs:(string * Py.Object.t) list ->
  xa:[> `Ndarray ] Np.Obj.t ->
  xb:[> `Ndarray ] Np.Obj.t ->
  Py.Object.t list ->
  [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t

Compute distance between each pair of the two collections of inputs.

See Notes for common calling conventions.

Parameters ---------- XA : ndarray An :math:`m_A` by :math:`n` array of :math:`m_A` original observations in an :math:`n`-dimensional space. Inputs are converted to float type. XB : ndarray An :math:`m_B` by :math:`n` array of :math:`m_B` original observations in an :math:`n`-dimensional space. Inputs are converted to float type. metric : str or callable, optional The distance metric to use. If a string, the distance function can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'wminkowski', 'yule'. *args : tuple. Deprecated. Additional arguments should be passed as keyword arguments **kwargs : dict, optional Extra arguments to `metric`: refer to each metric documentation for a list of all possible arguments.

Some possible arguments:

p : scalar The p-norm to apply for Minkowski, weighted and unweighted. Default: 2.

w : ndarray The weight vector for metrics that support weights (e.g., Minkowski).

V : ndarray The variance vector for standardized Euclidean. Default: var(vstack(XA, XB), axis=0, ddof=1)

VI : ndarray The inverse of the covariance matrix for Mahalanobis. Default: inv(cov(vstack(XA, XB.T))).T

out : ndarray The output array If not None, the distance matrix Y is stored in this array. Note: metric independent, it will become a regular keyword arg in a future scipy version

Returns ------- Y : ndarray A :math:`m_A` by :math:`m_B` distance matrix is returned. For each :math:`i` and :math:`j`, the metric ``dist(u=XAi, v=XBj)`` is computed and stored in the :math:`ij` th entry.

Raises ------ ValueError An exception is thrown if `XA` and `XB` do not have the same number of columns.

Notes ----- The following are common calling conventions:

1. ``Y = cdist(XA, XB, 'euclidean')``

Computes the distance between :math:`m` points using Euclidean distance (2-norm) as the distance metric between the points. The points are arranged as :math:`m` :math:`n`-dimensional row vectors in the matrix X.

2. ``Y = cdist(XA, XB, 'minkowski', p=2.)``

Computes the distances using the Minkowski distance :math:`||u-v||_p` (:math:`p`-norm) where :math:`p \geq 1`.

3. ``Y = cdist(XA, XB, 'cityblock')``

Computes the city block or Manhattan distance between the points.

4. ``Y = cdist(XA, XB, 'seuclidean', V=None)``

Computes the standardized Euclidean distance. The standardized Euclidean distance between two n-vectors ``u`` and ``v`` is

.. math::

\sqrt\sum {(u_i-v_i)^2 / V[x_i]

}

V is the variance vector; Vi is the variance computed over all the i'th components of the points. If not passed, it is automatically computed.

5. ``Y = cdist(XA, XB, 'sqeuclidean')``

Computes the squared Euclidean distance :math:`||u-v||_2^2` between the vectors.

6. ``Y = cdist(XA, XB, 'cosine')``

Computes the cosine distance between vectors u and v,

.. math::

1 - \fracu \cdot v { ||u|| _2 ||v|| _2

}

where :math:`||*||_2` is the 2-norm of its argument ``*``, and :math:`u \cdot v` is the dot product of :math:`u` and :math:`v`.

7. ``Y = cdist(XA, XB, 'correlation')``

Computes the correlation distance between vectors u and v. This is

.. math::

1 - \frac(u - \bar{u) \cdot (v - \bar

})}
               {{ ||(u - \bar{u})|| }_2 { ||(v - \bar{v})|| }_2}

   where :math:`\bar{v}` is the mean of the elements of vector v,
   and :math:`x \cdot y` is the dot product of :math:`x` and :math:`y`.


8. ``Y = cdist(XA, XB, 'hamming')``

   Computes the normalized Hamming distance, or the proportion of
   those vector elements between two n-vectors ``u`` and ``v``
   which disagree. To save memory, the matrix ``X`` can be of type
   boolean.

9. ``Y = cdist(XA, XB, 'jaccard')``

   Computes the Jaccard distance between the points. Given two
   vectors, ``u`` and ``v``, the Jaccard distance is the
   proportion of those elements ``u[i]`` and ``v[i]`` that
   disagree where at least one of them is non-zero.

10. ``Y = cdist(XA, XB, 'chebyshev')``

   Computes the Chebyshev distance between the points. The
   Chebyshev distance between two n-vectors ``u`` and ``v`` is the
   maximum norm-1 distance between their respective elements. More
   precisely, the distance is given by

   .. math::

      d(u,v) = \max_i { |u_i-v_i| }.

11. ``Y = cdist(XA, XB, 'canberra')``

   Computes the Canberra distance between the points. The
   Canberra distance between two points ``u`` and ``v`` is

   .. math::

     d(u,v) = \sum_i \frac{ |u_i-v_i| }
                          { |u_i|+|v_i| }.

12. ``Y = cdist(XA, XB, 'braycurtis')``

   Computes the Bray-Curtis distance between the points. The
   Bray-Curtis distance between two points ``u`` and ``v`` is


   .. math::

        d(u,v) = \frac{\sum_i (|u_i-v_i|)}
                      {\sum_i (|u_i+v_i|)}

13. ``Y = cdist(XA, XB, 'mahalanobis', VI=None)``

   Computes the Mahalanobis distance between the points. The
   Mahalanobis distance between two points ``u`` and ``v`` is
   :math:`\sqrt{(u-v)(1/V)(u-v)^T}` where :math:`(1/V)` (the ``VI``
   variable) is the inverse covariance. If ``VI`` is not None,
   ``VI`` will be used as the inverse covariance matrix.

14. ``Y = cdist(XA, XB, 'yule')``

   Computes the Yule distance between the boolean
   vectors. (see `yule` function documentation)

15. ``Y = cdist(XA, XB, 'matching')``

   Synonym for 'hamming'.

16. ``Y = cdist(XA, XB, 'dice')``

   Computes the Dice distance between the boolean vectors. (see
   `dice` function documentation)

17. ``Y = cdist(XA, XB, 'kulsinski')``

   Computes the Kulsinski distance between the boolean
   vectors. (see `kulsinski` function documentation)

18. ``Y = cdist(XA, XB, 'rogerstanimoto')``

   Computes the Rogers-Tanimoto distance between the boolean
   vectors. (see `rogerstanimoto` function documentation)

19. ``Y = cdist(XA, XB, 'russellrao')``

   Computes the Russell-Rao distance between the boolean
   vectors. (see `russellrao` function documentation)

20. ``Y = cdist(XA, XB, 'sokalmichener')``

   Computes the Sokal-Michener distance between the boolean
   vectors. (see `sokalmichener` function documentation)

21. ``Y = cdist(XA, XB, 'sokalsneath')``

   Computes the Sokal-Sneath distance between the vectors. (see
   `sokalsneath` function documentation)


22. ``Y = cdist(XA, XB, 'wminkowski', p=2., w=w)``

   Computes the weighted Minkowski distance between the
   vectors. (see `wminkowski` function documentation)

23. ``Y = cdist(XA, XB, f)``

   Computes the distance between all pairs of vectors in X
   using the user supplied 2-arity function f. For example,
   Euclidean distance between the vectors could be computed
   as follows::

     dm = cdist(XA, XB, lambda u, v: np.sqrt(((u-v)**2).sum()))

   Note that you should avoid passing a reference to one of
   the distance functions defined in this library. For example,::

     dm = cdist(XA, XB, sokalsneath)

   would calculate the pair-wise distances between the vectors in
   X using the Python function `sokalsneath`. This would result in
   sokalsneath being called :math:`{n \choose 2}` times, which
   is inefficient. Instead, the optimized C version is more
   efficient, and we call it using the following syntax::

     dm = cdist(XA, XB, 'sokalsneath')

Examples
--------
Find the Euclidean distances between four 2-D coordinates:

>>> from scipy.spatial import distance
>>> coords = [(35.0456, -85.2672),
...           (35.1174, -89.9711),
...           (35.9728, -83.9422),
...           (36.1667, -86.7833)]
>>> distance.cdist(coords, coords, 'euclidean')
array([[ 0.    ,  4.7044,  1.6172,  1.8856],
       [ 4.7044,  0.    ,  6.0893,  3.3561],
       [ 1.6172,  6.0893,  0.    ,  2.8477],
       [ 1.8856,  3.3561,  2.8477,  0.    ]])


Find the Manhattan distance from a 3-D point to the corners of the unit
cube:

>>> a = np.array([[0, 0, 0],
...               [0, 0, 1],
...               [0, 1, 0],
...               [0, 1, 1],
...               [1, 0, 0],
...               [1, 0, 1],
...               [1, 1, 0],
...               [1, 1, 1]])
>>> b = np.array([[ 0.1,  0.2,  0.4]])
>>> distance.cdist(a, b, 'cityblock')
array([[ 0.7],
       [ 0.9],
       [ 1.3],
       [ 1.5],
       [ 1.5],
       [ 1.7],
       [ 2.1],
       [ 2.3]])

val kmeans : 
  ?iter:int ->
  ?thresh:float ->
  ?check_finite:bool ->
  obs:[> `Ndarray ] Np.Obj.t ->
  k_or_guess:[ `Ndarray of [> `Ndarray ] Np.Obj.t | `I of int ] ->
  unit ->
  [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t * float

Performs k-means on a set of observation vectors forming k clusters.

The k-means algorithm adjusts the classification of the observations into clusters and updates the cluster centroids until the position of the centroids is stable over successive iterations. In this implementation of the algorithm, the stability of the centroids is determined by comparing the absolute value of the change in the average Euclidean distance between the observations and their corresponding centroids against a threshold. This yields a code book mapping centroids to codes and vice versa.

Parameters ---------- obs : ndarray Each row of the M by N array is an observation vector. The columns are the features seen during each observation. The features must be whitened first with the `whiten` function.

k_or_guess : int or ndarray The number of centroids to generate. A code is assigned to each centroid, which is also the row index of the centroid in the code_book matrix generated.

The initial k centroids are chosen by randomly selecting observations from the observation matrix. Alternatively, passing a k by N array specifies the initial k centroids.

iter : int, optional The number of times to run k-means, returning the codebook with the lowest distortion. This argument is ignored if initial centroids are specified with an array for the ``k_or_guess`` parameter. This parameter does not represent the number of iterations of the k-means algorithm.

thresh : float, optional Terminates the k-means algorithm if the change in distortion since the last k-means iteration is less than or equal to threshold.

check_finite : bool, optional Whether to check that the input matrices contain only finite numbers. Disabling may give a performance gain, but may result in problems (crashes, non-termination) if the inputs do contain infinities or NaNs. Default: True

Returns ------- codebook : ndarray A k by N array of k centroids. The ith centroid codebooki is represented with the code i. The centroids and codes generated represent the lowest distortion seen, not necessarily the globally minimal distortion.

distortion : float The mean (non-squared) Euclidean distance between the observations passed and the centroids generated. Note the difference to the standard definition of distortion in the context of the k-means algorithm, which is the sum of the squared distances.

See Also -------- kmeans2 : a different implementation of k-means clustering with more methods for generating initial centroids but without using a distortion change threshold as a stopping criterion.

whiten : must be called prior to passing an observation matrix to kmeans.

Examples -------- >>> from numpy import array >>> from scipy.cluster.vq import vq, kmeans, whiten >>> import matplotlib.pyplot as plt >>> features = array([ 1.9,2.3], ... [ 1.5,2.5], ... [ 0.8,0.6], ... [ 0.4,1.8], ... [ 0.1,0.1], ... [ 0.2,1.8], ... [ 2.0,0.5], ... [ 0.3,1.5], ... [ 1.0,1.0]) >>> whitened = whiten(features) >>> book = np.array((whitened0,whitened2)) >>> kmeans(whitened,book) (array([ 2.3110306 , 2.86287398], # random [ 0.93218041, 1.24398691]), 0.85684700941625547)

>>> from numpy import random >>> random.seed((1000,2000)) >>> codes = 3 >>> kmeans(whitened,codes) (array([ 2.3110306 , 2.86287398], # random [ 1.32544402, 0.65607529], [ 0.40782893, 2.02786907]), 0.5196582527686241)

>>> # Create 50 datapoints in two clusters a and b >>> pts = 50 >>> a = np.random.multivariate_normal(0, 0, [4, 1], [1, 4], size=pts) >>> b = np.random.multivariate_normal(30, 10, ... [10, 2], [2, 1], ... size=pts) >>> features = np.concatenate((a, b)) >>> # Whiten data >>> whitened = whiten(features) >>> # Find 2 clusters in the data >>> codebook, distortion = kmeans(whitened, 2) >>> # Plot whitened data and cluster centers in red >>> plt.scatter(whitened:, 0, whitened:, 1) >>> plt.scatter(codebook:, 0, codebook:, 1, c='r') >>> plt.show()

val kmeans2 : 
  ?iter:int ->
  ?thresh:float ->
  ?minit:string ->
  ?missing:string ->
  ?check_finite:bool ->
  data:[> `Ndarray ] Np.Obj.t ->
  k:[ `Ndarray of [> `Ndarray ] Np.Obj.t | `I of int ] ->
  unit ->
  [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t
  * [ `ArrayLike | `Ndarray | `Object ] Np.Obj.t

Classify a set of observations into k clusters using the k-means algorithm.

The algorithm attempts to minimize the Euclidean distance between observations and centroids. Several initialization methods are included.

Parameters ---------- data : ndarray A 'M' by 'N' array of 'M' observations in 'N' dimensions or a length 'M' array of 'M' 1-D observations. k : int or ndarray The number of clusters to form as well as the number of centroids to generate. If `minit` initialization string is 'matrix', or if a ndarray is given instead, it is interpreted as initial cluster to use instead. iter : int, optional Number of iterations of the k-means algorithm to run. Note that this differs in meaning from the iters parameter to the kmeans function. thresh : float, optional (not used yet) minit : str, optional Method for initialization. Available methods are 'random', 'points', '++' and 'matrix':

'random': generate k centroids from a Gaussian with mean and variance estimated from the data.

'points': choose k observations (rows) at random from data for the initial centroids.

'++': choose k observations accordingly to the kmeans++ method (careful seeding)

'matrix': interpret the k parameter as a k by M (or length k array for 1-D data) array of initial centroids. missing : str, optional Method to deal with empty clusters. Available methods are 'warn' and 'raise':

'warn': give a warning and continue.

'raise': raise an ClusterError and terminate the algorithm. check_finite : bool, optional Whether to check that the input matrices contain only finite numbers. Disabling may give a performance gain, but may result in problems (crashes, non-termination) if the inputs do contain infinities or NaNs. Default: True

Returns ------- centroid : ndarray A 'k' by 'N' array of centroids found at the last iteration of k-means. label : ndarray labeli is the code or index of the centroid the ith observation is closest to.