package batteries
Install
Dune Dependency
Authors
Maintainers
Sources
md5=ea26b5c72e6731e59d856626049cca4d
sha512=55975b62c26f6db77433a3ac31f97af609fc6789bb62ac38b267249c78fd44ff37fe81901f1cf560857b9493a6046dd37b0d1c0234c66bd59e52843aac3ce6cb
doc/batteries.unthreaded/BatText/index.html
Module BatText
Source
Heavyweight strings ("ropes")
This module implements ropes as described in Boehm, H., Atkinson, R., and Plass, M. 1995. Ropes: an alternative to strings. Softw. Pract. Exper. 25, 12 (Dec. 1995), 1315-1330.
Ropes are an alternative to strings which support efficient operations:
- determining the length of a rope in constant time
- appending or prepending a small rope to an arbitrarily large one in amortized constant time
- concat, substring, insert, remove operations in amortized logarithmic time
- access to and modification of ropes in logarithmic time
Functional nature and persistence
All operations are non-destructive: the original rope is never modified. When a new rope is returned as the result of an operation, it will share as much data as possible with its "parent". For instance, if a rope of length n
undergoes m
operations (assume n >> m
) like set, append or prepend, the modified rope will only require O(m)
space in addition to that taken by the original one.
However, Rope is an amortized data structure, and its use in a persistent setting can easily degrade its amortized time bounds. It is thus mainly intended to be used ephemerally. In some cases, it is possible to use Rope persistently with the same amortized bounds by explicitly rebalancing ropes to be reused using balance
. Special care must be taken to avoid calling balance
too frequently; in the limit, calling balance
after each modification would defeat the purpose of amortization.
Limitations
The length of ropes is limited to approximately 700 Mb on 32-bit architectures, 220 Gb on 64 bit architectures.
The type of the rope.
Raised when an operation violates the bounds of the rope.
Maximum length of the rope (number of UTF-8 characters).
Creation and conversions
of_string s
returns a rope corresponding to the UTF-8 encoded string s
.
of_uchar c
returns a rope containing exactly character c
.
make i c
returns a rope of length i
consisting of c
chars; it is similar to String.make
explode s
returns the list of characters in the rope s
.
implode cs
returns a rope resulting from concatenating the characters in the list cs
.
Properties
Returns the length of the rope (O(1)
). This is number of UTF-8 characters.
balance r
returns a balanced copy of the r
rope. Note that ropes are automatically rebalanced when their height exceeds a given threshold, but balance
allows to invoke that operation explicitly.
Operations
append r u
concatenates the r
and u
ropes. In general, it operates in O(log(min n1 n2))
amortized time. Small ropes are treated specially and can be appended/prepended in amortized O(1)
time.
append_char c r
returns a new rope with the c
character at the end in amortized O(1)
time.
prepend_char c r
returns a new rope with the c
character at the beginning in amortized O(1)
time.
get r n
returns the (n+1)th character from the rope r
; i.e. get r 0
returns the first character. Operates in worst-case O(log size)
time.
set r n c
returns a copy of rope r
where the (n+1)th character has been set to c
. See also get
. Operates in worst-case O(log size)
time.
sub r m n
returns a sub-rope of r
containing all characters whose indexes range from m
to m + n - 1
(included). Operates in worst-case O(log size)
time.
insert n r u
returns a copy of the u
rope where r
has been inserted between the characters with index n
and n + 1
in the original rope. The length of the new rope is length u + length r
. Operates in amortized O(log(size r) + log(size u))
time.
remove m n r
returns the rope resulting from deleting the characters with indexes ranging from m
to m + n - 1
(included) from the original rope r
. The length of the new rope is length r - n
. Operates in amortized O(log(size r))
time.
Iteration
iter f r
applies f
to all the characters in the r
rope, in order.
Operates like iter
, but also passes the index of the character to the given function.
range_iter f m n r
applies f
to all the characters whose indices k
satisfy m
<= k
< m + n
. It is thus equivalent to iter f (sub m n r)
, but does not create an intermediary rope. range_iter
operates in worst-case O(n + log m)
time, which improves on the O(n log m)
bound from an explicit loop using get
.
As range_iter
, but passes base + index of the character in the subrope defined by next to arguments.
Rope.fold f a r
computes f (... (f (f a r0) r1)...) rN-1
where rn = Rope.get n r
and N = length r
.
init l f
returns the rope of length l
with the chars f 0 , f 1 , f 2 ... f (l-1).
map f s
returns a rope where all characters c
in s
have been replaced by f c
. *
filter_map f l
calls (f a0) (f a1).... (f an)
where a0..an
are the characters of l
. It returns the list of elements bi
such as f ai = Some bi
(when f
returns None
, the corresponding element of l
is discarded).
filter f s
returns a copy of rope s
in which only characters c
such that f c = true
remain.
enumerate the rope's characters
enumerates the rope's characters, in reverse order
converts the enumeration into a rope
Finding
index s c
returns the position of the leftmost occurrence of character c
in rope s
.
index_from r i c
returns the character number of the first occurrence of character c
in rope r
after position i
. index s c
is equivalent to index_from s 0 c
.
Rope.rindex s c
returns the position of the rightmost occurrence of character c
in rope s
.
Same as rindex
, but start searching at the character position given as second argument. rindex s c
is equivalent to rindex_from s (length s - 1) c
.
contains s c
tests if character c
appears in the rope s
.
contains_from s start c
tests if character c
appears in the subrope of s
starting from start
to the end of s
.
rcontains_from s stop c
tests if character c
appears in the subrope of s
starting from the beginning of s
to index stop
(included).
find s x
returns the starting index of the first occurrence of rope x
within rope s
.
Note This implementation is optimized for short ropes.
find_from s ofs x
behaves as find s x
but starts searching at offset ofs
. find s x
is equivalent to find_from s 0 x
.
rfind s x
returns the starting index of the last occurrence of rope x
within rope s
.
Note This implementation is optimized for short ropes.
rfind_from s ofs x
behaves as rfind s x
but starts searching at offset ofs
. rfind s x
is equivalent to rfind_from s (length s - 1) x
.
starts_with s x
returns true
if s
is starting with x
, false
otherwise.
ends_with s x
returns true
if the rope s
is ending with x
, false
otherwise.
exists str sub
returns true if sub
is a subrope of str
or false otherwise.
tail r pos
returns the rope containing all but the pos
first characters of r
Returns the rope without the chars if they are at the beginning or at the end of the rope. By default chars are " \t\r\n".
Returns the same rope but without the first character. does nothing if the rope is empty.
Returns the same rope but without the last character. does nothing if the rope is empty.
slice ?first ?last s
returns a "slice" of the rope which corresponds to the characters s.[first]
, s.[first+1]
, ..., s[last-1]
. Note that the character at index last
is not included! If first
is omitted it defaults to the start of the rope, i.e. index 0, and if last
is omitted is defaults to point just past the end of s
, i.e. length s
. Thus, slice s
is equivalent to copy s
.
Negative indexes are interpreted as counting from the end of the rope. For example, slice ~last:-2 s
will return the rope s
, but without the last two characters.
This function never raises any exceptions. If the indexes are out of bounds they are automatically clipped.
splice s off len rep
returns the rope in which the section of s
indicated by off
and len
has been cut and replaced by rep
.
Negative indices are interpreted as counting from the end of the string.
fill s start len c
returns the rope in which characters number start
to start + len - 1
of s
has been replaced by c
.
blit src srcoff dst dstoff len
returns a copy of dst
in which len
characters have been copied from rope src
, starting at character number srcoff
, to rope dst
, starting at character number dstoff
. It works correctly even if src
and dst
are the same rope, and the source and destination chunks overlap.
concat sep sl
concatenates the list of ropes sl
, inserting the separator rope sep
between each.
replace ~str ~sub ~by
returns a tuple constisting of a boolean and a rope where the first occurrence of the rope sub
within str
has been replaced by the rope by
. The boolean is true
if a substitution has taken place, false
otherwise.
Splitting around
rsplit s sep
splits the rope s
between the last occurrence of sep
.
nsplit s sep
splits the rope s
into a list of ropes which are separated by sep
. nsplit "" _
returns the empty list. If the separator is not found, it returns a list of the rope s
. If two occurrences of the separator are consecutive (with nothing in between), the empty rope is added in the sequence. For example, nsplit "a//b/" "/"
is "a"; ""; "b"; ""
.
Prints a rope to the given out_channel
Read one Unicode char from a UTF-8 encoded input
Read up to n chars from a UTF-8 encoded input
Read a line of UTF-8
Read the whole contents of a UTF-8 encoded input
Write one uchar to a UTF-8 encoded output.
Write a character text onto a UTF-8 encoded output.
Write one line onto a UTF-8 encoded output, followed by a \n.
offer the lines of a UTF-8 encoded input as an enumeration
offer the characters of an UTF-8 encoded input as an enumeration
Write the text on the given output channel.