package emile

  1. Overview
  2. Docs
[Images](https://youtube.com/watch?v=S70NaQqAfaw))

Install

Dune Dependency

Authors

Maintainers

Sources

emile-0.3.tbz
sha256=8a560f3c030e241c1a09deb3da179f37257c905ad03d2dd67bfd48172470c793
md5=57c23427a11e5ac5a2e37f9b21477b0b

doc/emile/Emile/index.html

Module EmileSource

Emile module, parser of e-mail address.

Sourcetype raw =
  1. | Quoted_printable of string
  2. | Base64 of [ `Dirty of string | `Clean of string | `Wrong_padding ]

An e-mail address can contain as a part of a phrase (identifier) an encoded string. Standards describe 2 kinds of encoding:

  • Quoted Printable: used to insert hexadecimal value with the = operator.
  • Base 64: string encoded in MIME's Base64

Parser already decodes encoded raw, the client can use it as is.

Sourcetype word = [
  1. | `Atom of string
  2. | `String of string
]

The local part of an e-mail address is composed by two kinds of words:

  • `Atom is string as is.
  • `String is a string surrounded by double-quote to allow white-space.

The second kind is sanitized — we deleted double-quote which surround string.

Sourcetype local = word list

Local part of e-mail address.

Sourcetype addr =
  1. | IPv4 of Ipaddr.V4.t
  2. | IPv6 of Ipaddr.V6.t
  3. | Ext of string * string

Subset of domain described by RFC5321 which contains 3 kinds of address:

  • IPv4: a valid IPv4 address
  • IPv6: a valid IPv6 address
  • Ext (ldh, value): an extended kind of domain recognized by ldh identifier which valus is value

Parser of IPv4 and IPv6 was done by Ipaddr. An extended kind Ext needs to be resolved by the client.

Sourcetype domain = [
  1. | `Domain of string list
  2. | `Addr of addr
  3. | `Literal of string
]

Domain part of e-mail address. A domain integrate kinds from RFC5321 (see addr), a domain described by RFC5322 and a `Literal which is the last best-effort value possible as a domain.

Emile does not resolve domain.

Sourcetype phrase = [ `Dot | `Word of word | `Encoded of string * raw ] list

A phrase is a sentence to associate a name with an e-mail address or a group of e-mail addresses. `Encoded value is not normalized on the charset specified. The encoded's string is decoded as is only. For example, `Encoded can inform to use KOI-8 encoding (cyrillic charset). However, Emile does not check if value is a valid KOI-8 string, nor normalizes to unicode. Emile just decodes it as is.

Sourcetype mailbox = {
  1. name : phrase option;
  2. local : local;
  3. domain : domain * domain list;
}

A mailbox is an e-mail address. It contains an optional name (see phrase), a local-part see {!local

}

and one or more domain(s).

Sourcetype group = {
  1. group : phrase;
  2. mailboxes : mailbox list;
}

A group is a named set of mailbox.

Sourcetype address = local * (domain * domain list)

A basic e-mail address.

Sourcetype set = [
  1. | `Mailbox of mailbox
  2. | `Group of group
]

The Emile's set type which is a singleton (only one mailbox) or a set of e-mail addresses (a group).

Pretty-printer

Sourceval pp_addr : addr Fmt.t
Sourceval pp_domain : domain Fmt.t
Sourceval pp_word : word Fmt.t
Sourceval pp_local : local Fmt.t
Sourceval pp_raw : raw Fmt.t
Sourceval pp_phrase : phrase Fmt.t
Sourceval pp_mailbox : mailbox Fmt.t
Sourceval pp_group : group Fmt.t
Sourceval pp_address : address Fmt.t
Sourceval pp_set : set Fmt.t

Equal & Compare

Sourcetype 'a equal = 'a -> 'a -> bool
Sourcetype 'a compare = 'a -> 'a -> int
Sourceval case_sensitive : string -> string -> int

Alias of String.compare.

Sourceval case_insensitive : string -> string -> int

case_insensitive a b maps values with lowercase_ascii and compare them with String.compare. We do not map UTF8 value.

Sourceval equal_word : compare:string compare -> word equal

equal ~compare a b tests if word a and word b are semantically equal. compare specifies implementation to compare two string (i.e. to be case-sensitive or not).

Sourceval compare_word : ?case_sensitive:bool -> word compare

compare_word ?case_sensitive a b compares word a and word b semantically. From standards, word SHOULD be case-sensitive, the client can notice this behaviour by ?case_sensitive (default is true).

Sourceval equal_raw : compare:string compare -> raw equal

equal_raw a b tests if raw a and raw b are semantically equal. Semantically equal means we compare raw's content, by this way, a Base64 raw could be equal to a Quoted_printable raw if and only if string are equal.

Sourceval compare_raw : compare:string compare -> raw compare

compare_raw a b compares raw a and raw b semantically.

Sourceval equal_phrase : phrase equal

equal_phrase a b tests if phrase a and phrase b are semantically equal. In this case, the comparison is case-insensitive between elements in phrase. The order of elements is important.

Sourceval compare_phrase : phrase compare

compare_phrase a b compares phrase a and phrase b semantically.

Sourceval equal_addr : addr equal

equal_addr a b tests if addr a and addr b are semantically equal. An IPv4 should be equal with an IPv6 address. Then, for extended kind, we strictly compare (Pervasives.compare) kind and value.

Sourceval compare_addr : addr compare

compare_addr a b compares addr a and addr b, we prioritize IPv6, IPv4 and finally Ext.

Sourceval equal_domain : domain equal

equal_addr a b tests if domain a and domain b are semantically equal. We do not resolve domain, a `Domain could be semantically equal to another `Domain if they point to the same IPv4/IPv6.

Sourceval compare_domain : domain compare

comapre_domain a b compares domain a and domain b, we prioritize `Domain, `Literal and finally `Addr. The comparison between two `Literal and between part of `Domain are case-insensitive.

Sourceval equal_domains : (domain * domain list) equal

equal_domains a b apply equal_domain to ordered domains (see compare_domain) between a and b.

Sourceval compare_domains : (domain * domain list) compare

compare_domains a b compares ordered list of domain a and ordered list of domain b.

Sourceval equal_local : ?case_sensitive:bool -> local equal

equal_local ?case_sensitive a b tests if local a and local b are semantically equal. Standards notices local-part SHOULD be case-sensitive, the client can choose this behaviour with case_sensitive.

Sourceval compare_local : ?case_sensitive:bool -> local compare

compare_local ?case_sensitive a b compares local a and local b semantically. The user can decide if the comparison is case-sensitive or not (with case_sensitive).

Sourceval equal_mailbox : ?case_sensitive:bool -> mailbox equal

equal_mailbox ?case_sensitive a b tests if mailbox a and mailbox b are semantically equal. The user can define if the local-part need to be case-sensitive or not (by case_sensitive). If a xor b has a name, we consider a = b if we have the same local-part and same domain(s). Otherwise, we compare identifier/phrase between them.

Sourceval compare_mailbox : ?case_sensitive:bool -> mailbox compare

compare ?case_sensitive a b compares mailbox a and mailbxo b semantically. We prioritize local-part, domain-part and finally optionnal name.

Sourceval compare_group : group compare

comapre_group a b compares group a and group b. We compare the group name first and compare ordered mailboxes list then.

Sourceval equal_group : group equal

equal_group a b tests if group a and group b are semantically equal. We compare first group name and ordered mailboxes list then.

Sourceval compare_address : address compare

compare_address a b compares semantically address a* and address b.

Sourceval equal_address : address equal

equal_address a b tests semantically address a and address b.

Sourceval equal_set : set equal

equal a b tests semantically set a and set b.

Sourceval compare_set : set compare

compare a b compares set a and set b.

Sourceval strictly_equal_set : set equal

A structurally equal function on set.

Parsers

If you don't want a headache, you should move on.

Sourcemodule Parser : sig ... end

This is an aggregation of rules used to parse an e-mail address. The goal of this documentation is to show relations between RFCs, updates, and final description of parts needed to parse an e-mail address.

Decoders

We have 4 kinds of parsers for e-mail address:

  • List.of_string* is the most general parser which used as the parser of To: field into an e-mail. Indeed, this value is a list of set which can contain only one e-mail address or a named group of e-mail addresses.

    This parser is used into tests of Emile.

  • address_of_string* is the parser of e-mail address like local-part@domain. This is the most common (in your mind) case for the client to parse an e-mail address. This parser does not handle a named e-mail address or a multiple domains e-mail address however.
  • set_of_string* is the parser which performs a named group of e-mail addresses (group) or an optionaly named e-mail address (mailbox). In constrast to address_of_string, this parser handles multiple domains e-mail address.
  • of_string* is the most general unit parser of e-mail address. That means, this parser is like set_of_string without a named group of e-mail addresses. It handles named e-mail address and multiple domains e-mail address. The client should use this function if he does not know exactly the format of input.

For each parser, you have the common of_string function, the of_string_with_crlf function and finally the of_string_raw function. The first one is the most easy to understand, it takes your string and try to extract an e-mail address (or a set or a list of set).

Then, the second is a more general parser. The delimiter of an e-mail address into an e-mail context is a double CRLF code (to stop the folding whitespace rule). Indeed, an e-mail can be encoded on multiple lines... So, of_string function is a special case of of_string_with_crlf where we put a double CRLF code at the end of your string to ensure to stop parser somewhere.

The final function, of_string_raw could be interesting client who wants to integrate Emile inside a parser. This function compute only a slice of your string and returns how many bytes it consumed to extract e-mail address. Internal stuff put CRLF code too to stop parser and uncount CRLF code when it returns how many byte(s) it consumed.

For client who wants to use Emile into an existing parser, your e-mail address should be delimited or surrounded by characters. For example, you can have an e-mail in this form: <local@domain>. In this example, e-mail address is surrounded by < and >. Your goal is to extract string inside them and use address_of_string which does not allow < and > into e-mail address.

In other case, your e-mail address can have this form: John <local@domain>\n. In this case, your e-mail address is delimited by \n and you should use of_string which will compute name (John) and associated e-mail address.

By these examples, extract an e-mail address is clearly not easy because it can take different forms and client needs to figure out what he clearly needs. Then, these parsers can fail for different non-obvious reasons - and, in this case, client needs to understand standards sadly to understand where is specially the problem.

In other way, if client is comfortable with Angstrom, Emile provides indigestible parsers (see Parser).

Sourcetype error = [
  1. | `Invalid of string * string list
  2. | `Incomplete
]
Sourceval pp_error : error Fmt.t

pp_error ppf err prints an error.

Sourcemodule List : sig ... end
Sourceval address_of_string_with_crlf : string -> (address, error) result
Sourceval address_of_string : string -> (address, error) result
Sourceval address_of_string_raw : string -> int -> int -> (address * int, error) result
Sourceval set_of_string_with_crlf : string -> (set, error) result
Sourceval set_of_string : string -> (set, error) result
Sourceval set_of_string_raw : string -> int -> int -> (set * int, error) result
Sourceval of_string_with_crlf : string -> (mailbox, error) result
Sourceval of_string : string -> (mailbox, error) result
Sourceval of_string_raw : string -> int -> int -> (mailbox * int, error) result
OCaml

Innovation. Community. Security.