package angstrom

  1. Overview
  2. Docs

Module AngstromSource

Parser combinators built for speed and memory-efficiency.

Angstrom is a parser-combinator library that provides monadic and applicative interfaces for constructing parsers with unbounded lookahead. Its parsers can consume input incrementally, whether in a blocking or non-blocking environment. To achieve efficient incremental parsing, Angstrom offers both a buffered and unbuffered interface to input streams, with the Unbuffered interface enabling zero-copy IO. With these features and low-level iteration parser primitives like take_while and skip_while, Angstrom makes it easy to write efficient, expressive, and reusable parsers suitable for high-performance applications.

Sourcetype +'a t

A parser for values of type 'a.

Basic parsers

Sourceval peek_char : char option t

peek_char accepts any char and returns it, or returns None if the end of input has been reached.

This parser does not advance the input. Use it for lookahead.

Sourceval peek_char_fail : char t

peek_char_fail accepts any char and returns it. If end of input has been reached, it will fail.

This parser does not advance the input. Use it for lookahead.

Sourceval peek_string : int -> string t

peek_string n accepts exactly n characters and returns them as a string. If there is not enough input, it will fail.

This parser does not advance the input. Use it for lookahead.

Sourceval char : char -> char t

char c accepts c and returns it.

Sourceval not_char : char -> char t

not_char accepts any character that is not c and returns the matched character.

Sourceval any_char : char t

any_char accepts any character and returns it.

Sourceval satisfy : (char -> bool) -> char t

satisfy f accepts any character for which f returns true and returns the accepted character.

Sourceval string : string -> string t

string s accepts s exactly and returns it.

Sourceval string_ci : string -> string t

string_ci s accepts s, ignoring case, and returns the matched string, preserving the case of the original input.

Sourceval skip : (char -> bool) -> unit t

skip f accepts any character for which f returns true and discards the accepted character. skip f is equivalent to satisfy f but discards the accepted character.

Sourceval skip_while : (char -> bool) -> unit t

skip_while f accepts input as long as f returns true and discards the accepted characters.

Sourceval take : int -> string t

take n accepts exactly n characters of input and returns them as a string.

Sourceval take_while : (char -> bool) -> string t

take_while f accepts input as long as f returns true and returns the accepted characters as a string.

This parser does not fail. If f returns false on the first character, it will return the empty string.

Sourceval take_while1 : (char -> bool) -> string t

take_while f accepts input as long as f returns true and returns the accepted characters as a string.

This parser requires that f return true for at least one character of input, and will fail otherwise.

Sourceval take_till : (char -> bool) -> string t

take_till f accepts input as long as f returns false and returns the accepted characters as a string.

This parser does not fail. If f returns true on the first character, it will return the empty string.

Sourceval advance : int -> unit t

advance n advances the input n characters, failing if the remaining input is less than n.

Sourceval end_of_line : unit t

end_of_input accepts either a line feed \n, or a carriage return followed by a line feed \r\n and returns unit.

Sourceval at_end_of_input : bool t

at_end_of_input returns whether the end of the end of input has been reached. This parser always succeeds.

Sourceval end_of_input : unit t

end_of_input succeeds if all the input has been consumed, and fails otherwise.

Sourceval scan : 'state -> ('state -> char -> 'state option) -> (string * 'state) t

scan init f consumes until f returns None. Returns the final state before None and the accumulated string

Sourceval scan_state : 'state -> ('state -> char -> 'state option) -> 'state t

scan init f Like scan but only returns the final state before None. Much more efficient than scan

Sourceval scan_string : 'state -> ('state -> char -> 'state option) -> string t

scan init f Like scan but discards the final state and returns the accumulated string

Sourceval any_uint8 : int t

any_uint8 accepts any byte and returns it as an unsigned int8.

Sourceval any_int8 : int t

any_int8 accepts any byte and returns it as a signed int8.

Sourcemodule BE : sig ... end

Big endian parsers

Sourcemodule LE : sig ... end

Little endian parsers

Combinators

Sourceval option : 'a -> 'a t -> 'a t

option v p runs p, returning the result of p if it succeeds and v if it fails.

Sourceval list : 'a t list -> 'a list t

list ps runs each p in ps in sequence, returning a list of results of each p.

Sourceval count : int -> 'a t -> 'a list t

count n p runs p n times, returning a list of the results.

Sourceval many : 'a t -> 'a list t

many p runs p zero or more times and returns a list of results from the runs of p.

Sourceval many1 : 'a t -> 'a list t

many1 p runs p one or more times and returns a list of results from the runs of p.

Sourceval many_till : 'a t -> 'b t -> 'a list t

many_till p e runs parser p zero or more times until action e succeeds and returns the list of result from the runs of p.

Sourceval sep_by : 'a t -> 'b t -> 'b list t

sep_by s p runs p zero or more times, interspersing runs of s in between.

Sourceval sep_by1 : 'a t -> 'b t -> 'b list t

sep_by1 s p runs p one or more times, interspersing runs of s in between.

Sourceval skip_many : 'a t -> unit t

skip_many p runs p zero or more times, discarding the results.

Sourceval skip_many1 : 'a t -> unit t

skip_many1 p runs p one or more times, discarding the results.

Sourceval fix : ('a t -> 'a t) -> 'a t

fix f computes the fixpoint of f and runs the resultant parser. The argument that f receives is the result of fix f, which f must use, paradoxically, to define fix f.

fix is useful when constructing parsers for inductively-defined types such as sequences, trees, etc. Consider for example the implementation of the many combinator defined in this library:

let many p =
  fix (fun m ->
    (cons <$> p <*> m) <|> return [])

many p is a parser that will run p zero or more times, accumulating the result of every run into a list, returning the result. It's defined by passing fix a function. This function assumes its argument m is a parser that behaves exactly like many p. You can see this in the expression comprising the left hand side of the alternative operator <|>. This expression runs the parser p followed by the parser m, and after which the result of p is cons'd onto the list that m produces. The right-hand side of the alternative operator provides a base case for the combinator: if p fails and the parse cannot proceed, return an empty list.

Another way to illustrate the uses of fix is to construct a JSON parser. Assuming that parsers exist for the basic types such as false, true, null, strings, and numbers, the question then becomes how to define a parser for objects and arrays? Both contain values that are themselves JSON values, so it seems as though it's impossible to write a parser that will accept JSON objects and arrays before writing a parser for JSON values as a whole.

This is the exact situation that fix was made for. By defining the parsers for arrays and objects within the function that you pass to fix, you will gain access to a parser that you can use to parse JSON values, the very parser you are defining!

let json =
  fix (fun json ->
    let arr = char '[' *> sep_by (char ',') json <* char ']' in
    let obj = char '{' *> ... json ... <* char '}' in
    choice [str; num; arr json, ...])

Alternatives

Sourceval (<|>) : 'a t -> 'a t -> 'a t

p <|> q runs p and returns the result if succeeds. If p fails, then the input will be reset and q will run instead.

Sourceval choice : 'a t list -> 'a t

choice ts runs each parser in ts in order until one succeeds and returns that result.

Sourceval (<?>) : 'a t -> string -> 'a t

p <?> name associates name with the parser p, which will be reported in the case of failure.

Sourceval commit : unit t

commit prevents backtracking beyond the current position of the input, allowing the manager of the input buffer to reuse the preceding bytes for other purposes.

The Unbuffered parsing interface will report directly to the caller the number of bytes committed to the when returning a Unbuffered.state.Partial state, allowing the caller to reuse those bytes for any purpose. The Buffered will keep track of the region of committed bytes in its internal buffer and reuse that region to store additional input when necessary.

Monadic/Applicative interface

Sourceval return : 'a -> 'a t

return v creates a parser that will always succeed and return v

Sourceval fail : string -> 'a t

fail msg creates a parser that will always fail with the message msg

Sourceval (>>=) : 'a t -> ('a -> 'b t) -> 'b t

p >>= f creates a parser that will run p, pass its result to f, run the parser that f produces, and return its result.

Sourceval (>>|) : 'a t -> ('a -> 'b) -> 'b t

p >>| f creates a parser that will run p, and if it succeeds with result v, will return f v

Sourceval (<*>) : ('a -> 'b) t -> 'a t -> 'b t

f <*> p is equivalent to f >>= fun f -> p >>| f.

Sourceval (<$>) : ('a -> 'b) -> 'a t -> 'b t

f <$> p is equivalent to p >>| f

Sourceval (*>) : 'a t -> 'b t -> 'b t

p *> q runs p, discards its result and then runs q, and returns its result.

Sourceval (<*) : 'a t -> 'b t -> 'a t

p <* q runs p, then runs q, discards its result, and returns the result of p.

Sourceval lift : ('a -> 'b) -> 'a t -> 'b t
Sourceval lift2 : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t
Sourceval lift3 : ('a -> 'b -> 'c -> 'd) -> 'a t -> 'b t -> 'c t -> 'd t
Sourceval lift4 : ('a -> 'b -> 'c -> 'd -> 'e) -> 'a t -> 'b t -> 'c t -> 'd t -> 'e t

The liftn family of functions promote functions to the parser monad. For any of these functions, the following equivalence holds:

liftn f p1 ... pn = f <$> p1 <*> ... <*> pn

These functions are more efficient than using the applicative interface directly, mostly in terms of memory allocation but also in terms of speed. Prefer them over the applicative interface, even when the arity of the function to be lifted exceeds the maximum n for which there is an implementation for liftn. In other words, if f has an arity of 5 but only lift4 is provided, do the following:

lift4 f m1 m2 m3 m4 <*> m5

Even with the partial application, it will be more efficient than the applicative implementation.

Running

Sourceval parse_bigstring : 'a t -> bigstring -> ('a, string) Result.result

parse_bigstring t bs runs t on bs. The parser will receive an `Eof after all of bs has been consumed. For use-cases requiring that the parser be fed input incrementally, see the Buffered and Unbuffered modules below.

Sourceval parse_string : 'a t -> string -> ('a, string) Result.result

parse_string t bs runs t on bs. The parser will receive an `Eof after all of bs has been consumed. For use-cases requiring that the parser be fed input incrementally, see the Buffered and Unbuffered modules below.

Sourcemodule Buffered : sig ... end

Buffered parsing interface.

Sourcemodule Unbuffered : sig ... end

Unbuffered parsing interface.

OCaml

Innovation. Community. Security.