Library
Module
Module type
Parameter
Class
Class type
UTF-8 enoded strings
Invalid(error, text)
Exception raised when an invalid UTF-8 encoded string is encountered. text
is the faulty text and error
is a description of the first error in text
.
Exception raised when trying to access a character which is outside the bounds of a string.
val check : t -> check_result
check str
checks that str
is a valid UTF-8 encoded string.
val validate : t -> int
Same as check but raises an exception in case the argument is not a valid text, otherwise returns the length of the string.
val next_error : t -> int -> int * int * string
next_error str ofs
returns (ofs', count, msg)
where ofs'
is the offset of the start of the first invalid sequence after ofs
(inclusive) in str
, count
is the number of unicode character between ofs
and ofs'
(exclusive) and msg
is an error message. If there is no error until the end of string then ofs
is String.length str
and msg
is the empty string.
val singleton : Stdlib.Uchar.t -> t
singleton ch
creates a string of length 1 containing only the given character.
val make : int -> Stdlib.Uchar.t -> t
make n ch
creates a string of length n
filled with ch
.
val init : int -> (int -> Stdlib.Uchar.t) -> t
init n f
returns the contenation of singleton (f 0)
, singleton (f 1)
, ..., singleton (f (n - 1))
.
val rev_init : int -> (int -> Stdlib.Uchar.t) -> t
rev_init n f
returns the contenation of singleton (f (n -
1))
, ..., singleton (f 1)
, singleton (f 0)
.
val length : t -> int
Returns the length of the given string.
val get : t -> int -> Stdlib.Uchar.t
get str idx
returns the character at index idx
in str
.
sub str ofs len
Returns the sub-string of str
starting at ofs
and of length len
.
break str pos
returns the sub-strings before and after pos
in str
. It is more efficient than creating two sub-strings with sub
.
remove str pos len
removes the len
characters at position pos
in str
replace str pos len repl
replaces the len
characters at position pos
in str
by repl
.
concat sep l
returns the concatenation of all strings of l
separated by sep
.
concat sep l
returns the concatenation of all strings of l
in reverse order separated by sep
.
val explode : t -> Stdlib.Uchar.t list
explode str
returns the list of all characters of str
.
val rev_explode : t -> Stdlib.Uchar.t list
rev_explode str
returns the list of all characters of str
in reverse order.
val implode : Stdlib.Uchar.t list -> t
implode l
returns the concatenation of all characters of l
.
val rev_implode : Stdlib.Uchar.t list -> t
rev_implode l
is the same as implode (List.rev l)
but more efficient.
val iter : (Stdlib.Uchar.t -> unit) -> t -> unit
iter f str
applies f
an all characters of str
starting from the left.
val rev_iter : (Stdlib.Uchar.t -> unit) -> t -> unit
rev_iter f str
applies f
an all characters of str
starting from the right.
val fold : (Stdlib.Uchar.t -> 'a -> 'a) -> t -> 'a -> 'a
fold f str acc
applies f
on all characters of str
starting from the left, accumulating a value.
val rev_fold : (Stdlib.Uchar.t -> 'a -> 'a) -> t -> 'a -> 'a
rev_fold f str acc
applies f
on all characters of str
starting from the right, accumulating a value.
rev_map f str
maps all characters of str
with f
in reverse order.
map f str
maps all characters of str
with f
and concatenate the result.
rev_map f str
maps all characters of str
with f
in reverse order and concatenate the result.
rev_filter f str
filters characters of str
with f
in reverse order.
filter_map f str
filters and maps characters of str
with f
.
rev_filter_map f str
filters and maps characters of str
with f
in reverse order.
filter_map f str
filters and maps characters of str
with f
and concatenate the result.
rev_filter_map f str
filters and maps characters of str
with f
in reverse order and concatenate the result.
val for_all : (Stdlib.Uchar.t -> bool) -> t -> bool
for_all f text
returns whether all characters of text
verify the predicate f
.
val exists : (Stdlib.Uchar.t -> bool) -> t -> bool
exists f text
returns whether at least one character of text
verify f
.
val count : (Stdlib.Uchar.t -> bool) -> t -> int
count f text
returhs the number of characters of text
verifying f
.
strip ?predicate text
returns text
without its firsts and lasts characters that match predicate
. predicate
default to testing whether the given character has the `White_Space
unicode property. For example:
strip "\n foo\n " = "foo"
lstrip ?predicate text
is the same as strip
but it only removes characters at the left of text
.
lstrip ?predicate text
is the same as strip
but it only removes characters at the right of text
.
add buf ch
is the same as Buffer.add_string buf (singleton
ch)
but is more efficient.
val escaped_char : Stdlib.Uchar.t -> t
escaped_char ch
returns a string containg ch
or an escaped version of ch
if:
ch
is a control character (code < 32)ch
is the character with code 127ch
is a non-ascii, non-alphabetic characterIt uses the syntax \xXX
, \uXXXX
, \UXXXXXX
or a specific escape sequence \n, \r, ...
.
add_escaped_char buf ch
is the same as Buffer.add_string buf
(escaped_char ch)
but a bit more efficient.
val add_escaped : Stdlib.Buffer.t -> t -> unit
add_escaped_char buf text
is the same as Buffer.add_string
buf (escaped text)
but a bit more efficient.
val escaped_string : Uutf.encoding -> string -> t
escaped_string enc str
escape the string str
which is encoded with encoding enc
. If decoding str
with enc
fails, it escape all non-printable bytes of str
with the syntax \yAB
.
add_escaped_char buf enc text
is the same as Buffer.add_string buf (escaped_string enc text)
but a bit more efficient.
val next : t -> int -> int
next str ofs
returns the offset of the next character in str
.
val prev : t -> int -> int
prev str ofs
returns the offset of the previous character in str
.
val extract : t -> int -> Stdlib.Uchar.t
extract str ofs
returns the code-point at offset ofs
in str
.
val extract_next : t -> int -> Stdlib.Uchar.t * int
extract_next str ofs
returns the code-point at offset ofs
in str
and the offset of the next character.
val extract_prev : t -> int -> Stdlib.Uchar.t * int
extract_prev str ofs
returns the code-point at the previous offset in str
and this offset.
These functions does not check that the given offset is inside the bounds of the given string.
val unsafe_next : t -> int -> int
unsafe_next str ofs
returns the offset of the next character in str
.
val unsafe_prev : t -> int -> int
unsafe_prev str ofs
returns the offset of the previous character in str
.
val unsafe_extract : t -> int -> Stdlib.Uchar.t
unsafe_extract str ofs
returns the code-point at offset ofs
in str
.
val unsafe_extract_next : t -> int -> Stdlib.Uchar.t * int
unsafe_extract_next str ofs
returns the code-point at offset ofs
in str
and the offset the next character.
val unsafe_extract_prev : t -> int -> Stdlib.Uchar.t * int
unsafe_extract_prev str ofs
returns the code-point at the previous offset in str
and this offset.