Library
Module
Module type
Parameter
Class
Class type
Floating point number utilities.
This module defines a few useful constants, functions, predicates and comparisons on floating point numbers. The printers output a lossless textual representation of floats.
Quick recall on OCaml's floating point representation.
The constant e.
The constant pi.
The greatest positive floating point number with a fractional part (the float
before 252). Any number outside [-max_frac_float;max_frac_float
] is an integer.
The greatest positive floating point number (253) such that any integer in the range [-max_int_arith;max_int_arith
] is represented exactly. Integer arithmetic can be performed exactly in this interval.
Note. If applicable, a function taking NaNs returns a NaN unless otherwise specified.
random min len ()
is a random float in the interval [min;min+len
] (min
defaults to 0.). Uses the standard library's default Random
state for the generation.
Warning. The float generated by a given state may change in future versions of the library.
val srandom : Random.State.t -> ?min:float -> len:float -> unit -> float
srandom state min len ()
is like random
but uses state
for the generation.
Warning. The float generated by a given state
may change in future versions of the library.
step edge x
is 0.
if x < edge
and 1.
otherwise. The result is undefined on NaNs.
smooth_step e0 e1 x
is 0.
if x <= e0
, 1.
if x >= e1
and cubic hermite interpolation between 0. and 1. otherwise. The result is undefined on NaNs.
fmax x y
is y
if x < y
and x
otherwise. If x
or y
is NaN returns the other argument. If both are NaNs returns NaN.
fmin x y
is x
if x < y
and y
otherwise. If x
or y
is NaN returns the other argument. If both are NaNs returns NaN.
clamp min max x
is min
if x < min
, max
if x > max
and x
otherwise. The result is undefined on NaNs and if min >
max
.
remap x0 x1 y0 y1 v
applies to v
the affine transform that maps x0
to y0
and x1
to y1
. If the transform is undefined (x0 = x1
and y0 <> y1
) the function returns y0
for any v
.
round x
is the integer nearest to x
. Ties are rounded towards positive infinity. If x
is an infinity, returns x
.
Note. If the absolute magnitude of x
is an integer strictly greater than max_frac_float
, round x = x
may be false
.
int_of_round x
is truncate (round v)
. The result is undefined on NaNs and infinities.
round_dfrac d x
rounds x
to the d
th decimal fractional digit. Ties are rounded towards positive infinity. If x
is an infinity, returns x
. The result is only defined for 0 <= d <=
16
.
round_dsig d x
rounds the normalized decimal significand of x
to the d
th decimal fractional digit. Ties are rounded towards positive infinity. The result is NaN on infinities. The result only defined for 0 <= d <= 16
.
Warning. The current implementation overflows on large x
and d
.
round_zero eps x
is 0.
if abs_float x < eps
and x
otherwise. The result is undefined if eps
is NaN.
chop eps x
is round x
if abs_float (x -. round x) < eps
and x
otherwise. The result is undefined if eps
is NaN.
succ x
is the floating point value just after x
towards positive infinity. Returns in particular :
infinity
on infinity
.-max_float
on neg_infinity
.min_sub_float
on 0.
or -0.
.pred x
is -. succ (-.x)
, i.e. the floating point value before x
towards negative infinity.
nan payload
is a NaN whose 51 lower significand bits are defined by the 51 lower (or less, as int
allows) bits of payload
.
nan_payload x
is the 51 lower significand bits (or less, as int
allows) of the NaN x
.
is_zero eps x
is true
if abs_float x < eps
and false
otherwise. The result is undefined if eps
is NaN.
equal_tol eps x y
is true
iff |x - y
| <= eps
* max (1,|x
|,|y
|). On special values the function behaves like compare x y = 0
. The condition turns into an absolute tolerance test for small magnitudes and a relative tolerance test for large magnitudes.
compare_tol ~eps x y
is 0
iff equal_tol ~eps x y
is true
and Pervasives.compare x y
otherwise.
val pp : Format.formatter -> float -> unit
pp ppf x
prints a lossless textual representation of x
on ppf
.
"[-]0x1.<f>p<e>"
where <f>
is the significand bits in hexadecimal and <e>
the unbiased exponent in decimal."[-]0x0.<f>p-1022"
where <f>
is the significand bits in hexadecimal."[-]nan(0x<p>)"
where <p>
is the payload in hexadecimal."[-]inf"
and "[-]0."
.This format should be compatible with recent implementations of strtod and hence with float_of_string
(but negative NaNs seem to be problematic to get back).
float
sAn OCaml float
is an IEEE-754 64 bit double precision binary floating point number. The 64 bits are laid out as follows :
+----------------+-----------------------+-------------------------+ | sign s (1 bit) | exponent e (11 bits) | significand t (52 bits) | +----------------+-----------------------+-------------------------+ 63|62 52|51 0|
The value represented depends on s, e and t :
sign exponent significand value represented meaning ------------------------------------------------------------------------- s 0 0 -1^s * 0 zero s 0 t <> 0 -1^s * 0.t * 2^-1022 subnormal s 0 < e < 2047 f -1^s * 1.t * 2^(e - 1023) normal s 2047 0 -1^s * infinity infinity s 2047 t <> 0 NaN not a number
There are two zeros, a positive and a negative one but both are deemed equal by =
and Pervasives.compare
. A NaN is never equal (=) to itself or to another NaN however Pervasives.compare
asserts any NaN to be equal to itself and to any other NaN.
The bit layout of a float
can be converted to an int64
and back using Int64.bits_of_float
and Int64.float_of_bits
.
The bit 51 of a NaN is used to distinguish between quiet (bit set) and signaling NaNs (bit cleared); the remaining 51 lower bits of the significand are the NaN's payload which can be used to store diagnostic information. These features don't seem to used in OCaml.
The significand of a floating point number is made of 53 binary digits (don't forget the implicit digit), this corresponds to log10(253) ~ 16 decimal digits.
Only float
values in the interval ]-2
52;252[ may have a fractional part. Float.max_frac_float
is the greatest positive float
with a fractional part.
Any integer value in the interval [-2
53;253] can be represented exactly by a float
value. Integer arithmetic performed in this interval is exact. Float.max_int_arith
is 253.