package batteries

  1. Overview
  2. Docs
A community-maintained standard library extension

Install

Dune Dependency

Authors

Maintainers

Sources

v3.9.0.tar.gz
md5=ea26b5c72e6731e59d856626049cca4d
sha512=55975b62c26f6db77433a3ac31f97af609fc6789bb62ac38b267249c78fd44ff37fe81901f1cf560857b9493a6046dd37b0d1c0234c66bd59e52843aac3ce6cb

doc/batteries.unthreaded/BatGenlex/index.html

Module BatGenlexSource

A generic lexical analyzer.

This module implements a simple ``standard'' lexical analyzer, presented as a function from character streams to token streams. It implements roughly the lexical conventions of OCaml, but is parameterized by the set of keywords of your language.

Example: a lexer suitable for a desk calculator is obtained by

     let lexer = make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"]  

The associated parser would be a function from token stream to, for instance, int, and would have rules such as:

  let parse_expr = parser
      [< 'Int n >] -> n
                 | [< 'Kwd "("; n = parse_expr; 'Kwd ")" >] -> n
                 | [< n1 = parse_expr; n2 = parse_remainder n1 >] -> n2
  and parse_remainder n1 = parser
      [< 'Kwd "+"; n2 = parse_expr >] -> n1+n2
                         | ...
  • author Jacques Garrigue
  • author David Teller
Sourcetype token = Genlex.token =
  1. | Kwd of string
  2. | Ident of string
  3. | Int of int
  4. | Float of float
  5. | String of string
  6. | Char of char

The type of tokens. The lexical classes are: Int and Float for integer and floating-point numbers; String for string literals, enclosed in double quotes; Char for character literals, enclosed in single quotes; Ident for identifiers (either sequences of letters, digits, underscores and quotes, or sequences of ``operator characters'' such as +, *, etc); and Kwd for keywords (either identifiers or single ``special characters'' such as (, }, etc).

Sourceval make_lexer : string list -> char Stream.t -> token Stream.t

Construct the lexer function. The first argument is the list of keywords. An identifier s is returned as Kwd s if s belongs to this list, and as Ident s otherwise. A special character s is returned as Kwd s if s belongs to this list, and cause a lexical error (exception Parse_error) otherwise. Blanks and newlines are skipped. Comments delimited by (* and *) are skipped as well, and can be nested.

Sourcetype lexer_error =
  1. | IllegalCharacter of char
  2. | NotReallyAChar
  3. | NotReallyAnEscape
  4. | EndOfStream
Sourceexception LexerError of lexer_error * int
Sourcetype t

A lexer

Sourceval of_list : string list -> t

Create a lexer from a list of keywords

Sourceval to_stream_filter : t -> char Stream.t -> token Stream.t

Apply the lexer to a stream.

Sourceval to_enum_filter : t -> char BatEnum.t -> token BatEnum.t

Apply the lexer to an enum.

Sourceval to_lazy_list_filter : t -> char BatLazyList.t -> token BatLazyList.t

Apply the lexer to a lazy list.

Sourceval string_of_token : token -> string

Extending to other languages

Sourcemodule Languages : sig ... end
OCaml

Innovation. Community. Security.