package dune-private-libs
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=a10386f980cda9417d1465466bed50dd2aef9c93b9d06a0f7feeedb0a1541158
sha512=d1622939713133a1f28617229896298d6ef194c48a47d011e4b752490fc83893cc920a8395d7ac60bc384a6c9b233ebf0665f38f74f2774a983e9d3b241a7746
doc/dune-private-libs.dune_re/Dune_re/index.html
Module Dune_re
Source
Regular expression
Compiled regular expression
Compilation and execution of a regular expression
Compile a regular expression into an executable version that can be used to match strings, e.g. with exec
.
Return the number of capture groups (including the one corresponding to the entire regexp).
exec re str
searches str
for a match of the compiled expression re
, and returns the matched groups if any.
More specifically, when a match exists, exec
returns a match that starts at the earliest position possible. If multiple such matches are possible, the one specified by the match semantics described below is returned.
Examples:
# let regex = Re.compile Re.(seq [str "//"; rep print ]);;
val regex : re = <abstr>
# Re.exec regex "// a C comment";;
- : Re.substrings = <abstr>
# Re.exec regex "# a C comment?";;
Exception: Not_found
# Re.exec ~pos:1 regex "// a C comment";;
Exception: Not_found
Similar to exec
, but returns an option instead of using an exception.
Examples:
# let regex = Re.compile Re.(seq [str "//"; rep print ]);;
val regex : re = <abstr>
# Re.exec_opt regex "// a C comment";;
- : Re.substrings option = Some <abstr>
# Re.exec_opt regex "# a C comment?";;
- : Re.substrings option = None
# Re.exec_opt ~pos:1 regex "// a C comment";;
- : Re.substrings option = None
Similar to exec
, but returns true
if the expression matches, and false
if it doesn't. This function is more efficient than calling exec
or exec_opt
and ignoring the returned group.
Examples:
# let regex = Re.compile Re.(seq [str "//"; rep print ]);;
val regex : re = <abstr>
# Re.execp regex "// a C comment";;
- : bool = true
# Re.execp ~pos:1 regex "// a C comment";;
- : bool = false
More detailed version of execp
. `Full
is equivalent to true
, while `Mismatch
and `Partial
are equivalent to false
, but `Partial
indicates the input string could be extended to create a match.
Examples:
# let regex = Re.compile Re.(seq [bos; str "// a C comment"]);;
val regex : re = <abstr>
# Re.exec_partial regex "// a C comment here.";;
- : [ `Full | `Mismatch | `Partial ] = `Full
# Re.exec_partial regex "// a C comment";;
- : [ `Full | `Mismatch | `Partial ] = `Partial
# Re.exec_partial regex "//";;
- : [ `Full | `Mismatch | `Partial ] = `Partial
# Re.exec_partial regex "# a C comment?";;
- : [ `Full | `Mismatch | `Partial ] = `Mismatch
val exec_partial_detailed :
?pos:int ->
?len:int ->
re ->
string ->
[ `Full of Group.t | `Partial of int | `Mismatch ]
More detailed version of exec_opt
. `Full group
is equivalent to Some group
, while `Mismatch
and `Partial _
are equivalent to None
, but `Partial position
indicates that the input string could be extended to create a match, and no match could start in the input string before the given position. This could be used to not have to search the entirety of the input if more becomes available, and use the given position as the ?pos
argument.
High Level Operations
Repeatedly calls exec
on the given string, starting at given position and length.
Examples:
# let regex = Re.compile Re.(seq [str "my"; blank; word(rep alpha)]);;
val regex : re = <abstr>
# Re.all regex "my head, my shoulders, my knees, my toes ...";;
- : Re.substrings list = [<abstr>; <abstr>; <abstr>; <abstr>]
# Re.all regex "My head, My shoulders, My knees, My toes ...";;
- : Re.substrings list = []
Same as all
, but extracts the matched substring rather than returning the whole group. This basically iterates over matched strings.
Examples:
# let regex = Re.compile Re.(seq [str "my"; blank; word(rep alpha)]);;
val regex : re = <abstr>
# Re.matches regex "my head, my shoulders, my knees, my toes ...";;
- : string list = ["my head"; "my shoulders"; "my knees"; "my toes"]
# Re.matches regex "My head, My shoulders, My knees, My toes ...";;
- : string list = []
# Re.matches regex "my my my my head my 1 toe my ...";;
- : string list = ["my my"; "my my"]
# Re.matches ~pos:2 regex "my my my my head my +1 toe my ...";;
- : string list = ["my my"; "my head"]
split re s
splits s
into chunks separated by re
. It yields the chunks themselves, not the separator.
Examples:
# let regex = Re.compile (Re.char ',');;
val regex : re = <abstr>
# Re.split regex "Re,Ocaml,Jerome Vouillon";;
- : string list = ["Re"; "Ocaml"; "Jerome Vouillon"]
# Re.split regex "No commas in this sentence.";;
- : string list = ["No commas in this sentence."]
# Re.split ~pos:3 regex "1,2,3,4. Commas go brrr.";;
- : string list = ["3"; "4. Commas go brrr."]
split re s
splits s
into chunks separated by re
. It yields the chunks along with the separators. For instance this can be used with a whitespace-matching re such as "[\t ]+"
.
Examples:
# let regex = Re.compile (Re.char ',');;
val regex : re = <abstr>
# Re.split_full regex "Re,Ocaml,Jerome Vouillon";;
- : Re.split_token list =
[`Text "Re"; `Delim <abstr>; `Text "Ocaml"; `Delim <abstr>;
`Text "Jerome Vouillon"]
# Re.split_full regex "No commas in this sentence.";;
- : Re.split_token list = [`Text "No commas in this sentence."]
# Re.split_full ~pos:3 regex "1,2,3,4. Commas go brrr.";;
- : Re.split_token list =
[`Delim <abstr>; `Text "3"; `Delim <abstr>; `Text "4. Commas go brrr."]
val replace :
?pos:int ->
?len:int ->
?all:bool ->
re ->
f:(Group.t -> string) ->
string ->
string
replace ~all re ~f s
iterates on s
, and replaces every occurrence of re
with f substring
where substring
is the current match. If all = false
, then only the first occurrence of re
is replaced.
replace_string ~all re ~by s
iterates on s
, and replaces every occurrence of re
with by
. If all = false
, then only the first occurrence of re
is replaced.
Examples:
# let regex = Re.compile (Re.char ',');;
val regex : re = <abstr>
# Re.replace_string regex ~by:";" "[1,2,3,4,5,6,7]";;
- : string = "[1;2;3;4;5;6;7]"
# Re.replace_string regex ~all:false ~by:";" "[1,2,3,4,5,6,7]";;
- : string = "[1;2,3,4,5,6,7]"
String expressions (literal match)
Basic operations on regular expressions
Alternative.
alt []
is equivalent to empty
.
By default, the leftmost match is preferred (see match semantics below).
repn re i j
matches re
at least i
times and at most j
times, bounds included. j = None
means no upper bound.
String, line, word
We define a word as a sequence of latin1 letters, digits and underscore.
Beginning of string. This differs from start
because it matches the beginning of the input string even when using ~pos
arguments:
let b = execp (compile (seq [ bos; str "a" ])) "aa" ~pos:1 in
assert (not b)
Initial position. This differs from bos
because it takes into account the ~pos
arguments:
let b = execp (compile (seq [ start; str "a" ])) "aa" ~pos:1 in
assert b
Match semantics
A regular expression frequently matches a string in multiple ways. For instance exec (compile (opt (str "a"))) "ab"
can match "" or "a". Match semantic can be modified with the functions below, allowing one to choose which of these is preferable.
By default, the leftmost branch of alternations is preferred, and repetitions are greedy.
Note that the existence of matches cannot be changed by specifying match semantics. seq [ bos; str "a"; non_greedy (opt (str "b")); eos ]
will match when applied to "ab". However if seq [ bos; str "a"; non_greedy (opt (str "b")) ]
is applied to "ab", it will match "a" rather than "ab".
Also note that multiple match semantics can conflict. In this case, the one executed earlier takes precedence. For instance, any match of shortest (seq [ bos; group (rep (str "a")); group (rep (str "a")); eos ])
will always have an empty first group. Conversely, if we use longest
instead of shortest
, the second group will always be empty.
Longest match semantics. That is, matches will match as many bytes as possible. If multiple choices match the maximum amount of bytes, the one respecting the inner match semantics is preferred.
First match semantics for alternations (not repetitions). That is, matches will prefer the leftmost branch of the alternation that matches the text.
Groups (or submatches)
Delimit a group. The group is considered as matching if it is used at least once (it may be used multiple times if is nested inside rep
for instance). If it is used multiple times, the last match is what gets captured.
When matching against nest e
, only the group matching in the last match of e will be considered as matching.
For instance:
let re = compile (rep1 (nest (alt [ group (str "a"); str "b" ]))) in
let group = Re.exec re "ab" in
assert (Group.get_opt group 1 = None);
(* same thing but without [nest] *)
let re = compile (rep1 (alt [ group (str "a"); str "b" ])) in
let group = Re.exec re "ab" in
assert (Group.get_opt group 1 = Some "a");
Mark a regexp. the markid can then be used to know if this regexp was used.
Character sets
Predefined character sets
Case modifiers
Case sensitive matching. Note that this works on latin1, not ascii and not utf8.
Case insensitive matching. Note that this works on latin1, not ascii and not utf8.
Internal debugging
Alias for pp_re
. Deprecated
Experimental functions
witness r
generates a string s
such that execp (compile r) s
is true.
Be warned that this function is buggy because it ignores zero-width assertions like beginning of words. As a result it can generate incorrect results.
Deprecated functions
Same as Group.offset
. Deprecated
Same as Group.all_offset
. Deprecated
Same as Group.test
. Deprecated
Same as Mark.all
. Deprecated
- Compilation and execution of a regular expression
- High Level Operations
- String expressions (literal match)
- Basic operations on regular expressions
- String, line, word
- Match semantics
- Groups (or submatches)
- Character sets
- Predefined character sets
- Case modifiers
- Internal debugging
- Experimental functions
- Deprecated functions