package emile

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

This is an aggregation of rules used to parse an e-mail address. The goal of this documentation is to show relations between RFCs, updates, and final description of parts needed to parse an e-mail address.

Obviously, this part is most a copy-paste from RFCs to explain what we implement. And for a client, it's a boring and indigestible (but needed) work. We provide implementations only for people know what they really need — and avoid duplicate code in some ways.

But the biggest advise about this module is just to ignore it and move on — like what I really want when I wrote this documentation.

val is_vchar : char -> bool

From RFC5234 (used in RFC5322).

VCHAR = %x21-7E ; visible (printing) characters

val is_obs_no_ws_ctl : char -> bool

From RFC5322.

obs-NO-WS-CTL = %d1-8 / ; US-ASCII control %d11 / ; characters that do not %d12 / ; include the carriage %d14-31 / ; return, line feed, and %d127 ; white space characters

val is_ctext : char -> bool

From RFC822.

ctext = <any CHAR excluding "(", ; => may be folded ")", BACKSLASH & CR, & including linear-white-space>

From RFC1522 (occurrences).

From RFC2047 § Appendix.

From RFC2822.

ctext = NO-WS-CTL / ; Non white space controls %d33-39 / ; The rest of the US-ASCII %d42-91 / ; characters not including "(", %d93-126 ; ")", or BACKSLASH

From RFC5322.

ctext = %d33-39 / ; Printable US-ASCII %d42-91 / ; characters not including %d93-126 / ; "(", ")", or BACKSLASH obs-ctext obs-ctext = obs-NO-WS-CTL Update from RFC 2822 + Removed NO-WS-CTL from ctext

From RFC5335.

ctext =/ UTF8-xtra-char UTF8-xtra-char = UTF8-2 / UTF8-3 / UTF8-4 UTF8-2 = %xC2-DF UTF8-tail UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2(UTF8-tail) / %xED %x80-9F UTF8-tail / %xEE-EF 2(UTF8-tail) UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) / %xF4 %x80-8F 2( UTF8-tail ) UTF8-tail = %x80-BF

From RFC6532.

ctext =/ UTF8-non-ascii

@note about UTF-8, the process is out of this scope where we check only one byte here. @note about compliance with RFC1522, it's out of scope where we check only one byte here.

val is_qtext : char -> bool

From RFC822.

qtext = <any CHAR excepting DQUOTE, ; => may be folded BACKSLASH & CR, and including linear-white-space>

From RFC2822.

qtext = NO-WS-CTL / ; Non white space controls %d33 / ; The rest of the US-ASCII %d35-91 / ; characters not including BACKSLASH %d93-126 ; or the quote character

From RFC5322.

qtext = %d33 / ; Printable US-ASCII %d35-91 / ; characters not including %d93-126 / ; BACKSLASH or the quote character obs-qtext obs-qtext = obs-NO-WS-CTL

From RFC5335 (see is_ctext about UTF-xtra-char).

utf8-qtext = qtext / UTF8-xtra-char

From RFC6532.

qtext =/ UTF8-non-ascii

@note about UTF-8, the process is out of this scope where we check only one byte here.

val is_atext : char -> bool

The ABNF of atext is not explicit from RFC822 but the relic could be find here.

atom = 1*<any CHAR except specials, SPACE and CTLs>

From RFC2822.

atext = ALPHA / DIGIT / ; Any character except controls, "!" / "#" / ; SP, and specials. "$" / "%" / ; Used for atoms "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~"

From RFC5322.

atext = ALPHA / DIGIT / ; Printable US-ASCII "!" / "#" / ; characters not including "$" / "%" / ; specials. Used for atoms. "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~"

From RFC535 (see is_ctext about UTF-xtra-char).

utf8-atext = ALPHA / DIGIT / "!" / "#" / ; Any character except "$" / "%" / ; controls, SP, and specials. "&" / "'" / ; Used for atoms. "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~" / UTF8-xtra-char

From RFC6532.

atext =/ UTF8-non-ascii

@note about, UTF-8, the process is out of this scope where we check only byte here.

val is_wsp : char -> bool

From RFC822.

LWSP-char = SPACE / HTAB ; semantics = SPACE

From RFC2882 and RFC5322, we did not find any occurrence of LWSP-char. It replaced by WSP (available on RFC5234).

val is_quoted_pair : char -> bool

From RFC822.

quoted-pair = BACKSLASH CHAR ; may quote any char CHAR is case-sensitive

From RFC2822.

quoted-pair = (BACKSLASH text) / obs-qp text = %d1-9 / ; Characters excluding CR and LF %d11 / %d12 / %d14-127 / obs-text obs-text = *LF *CR *(obs-char *LF *CR) obs-char = %d0-9 / %d11 / ; %d0-127 except CR and %d12 / %d14-127 ; LF obs-qp = BACKSLASH (%d0-127)

From RFC5322.

quoted-pair = (BACKSLASH (VCHAR / WSP)) / obs-qp obs-qp = BACKSLASH (%d0 / obs-NO-WS-CTL / LF / CR)

From RFC5335 (see is_ctext about UTF-xtra-char).

utf8-text = %d1-9 / ; all UTF-8 characters except %d11-12 / ; US-ASCII NUL, CR, and LF %d14-127 / UTF8-xtra-char utf8-quoted-pair = (BACKSLASH utf8-text) / obs-qp

@note this function is fun _chr -> true. @note RFC5322 (last version of e-mail) does not mention an update from RFC2822. RFC6532 does not mention an update of quoted-pair. This implemention follow RFC5322 without unicode support.

val is_dtext : char -> bool

From RFC822.

dtext = <any CHAR excluding "[", ; => may be folded "]", BACKSLASH & CR, & including linear-white-space>

From RFC2822.

dtext = NO-WS-CTL / ; Non white space controls %d33-90 / ; The rest of the US-ASCII %d94-126 ; characters not including "[", ; "]", or BACKSLASH

From RFC5322.

+ Removed NO-WS-CTL from dtext dtext = %d33-90 / ; Printable US-ASCII %d94-126 / ; characters not including obs-dtext ; "[", "]", or BACKSLASH obs-dtext = obs-NO-WS-CTL / quoted-pair

@note quoted-pair can not be processed here where we handle only one byte.

val quoted_pair : char Angstrom.t
val fws : (bool * bool * bool) Angstrom.t

From RFC822.

From RFC2822 § 3.2.3 & RFC2822 § 4.2.

White space characters, including white space used in folding (described in section 2.2.3), may appear between many elements in header field bodies. Also, strings of characters that are treated as comments may be included in structured field bodies as characters enclosed in parentheses. The following defines the folding white space (FWS) and comment constructs. Strings of characters enclosed in parentheses are considered comments so long as they do not appear within a "quoted-string", as defined in section 3.2.5. Comments may nest. There are several places in this standard where comments and FWS may be freely inserted. To accommodate that syntax, an additional token for "CFWS" is defined for places where comments and/or FWS can occur. However, where CFWS occurs in this standard, it MUST NOT be inserted in such a way that any line of a folded header field is made up entirely of WSP characters and nothing else. FWS = ([*WSP CRLF] 1*WSP) / ; Folding white space obs-FWS In the obsolete syntax, any amount of folding white space MAY be inserted where the obs-FWS rule is allowed. This creates the possibility of having two consecutive "folds" in a line, and therefore the possibility that a line which makes up a folded header field could be composed entirely of white space. obs-FWS = 1*WSP *(CRLF 1*WSP)

From RFC5322 § 3.2.2 & RFC322 § 4.2.

White space characters, including white space used in folding (described in section 2.2.3), may appear between many elements in header field bodies. Also, strings of characters that are treated as comments may be included in structured field bodies as characters enclosed in parentheses. The following defines the folding white space (FWS) and comment constructs. Strings of characters enclosed in parentheses are considered comments so long as they do not appear within a "quoted-string", as defined in section 3.2.4. Comments may nest. There are several places in this specification where comments and FWS may be freely inserted. To accommodate that syntax, an additional token for "CFWS" is defined for places where comments and/or FWS can occur. However, where CFWS occurs in this specification, it MUST NOT be inserted in such a way that any line of a folded header field is made up entirely of WSP characters and nothing else. FWS = ([*WSP CRLF] 1*WSP) / obs-FWS ; Folding white space In the obsolete syntax, any amount of folding white space MAY be inserted where the obs-FWS rule is allowed. This creates the possibility of having two consecutive "folds" in a line, and therefore the possibility that a line which makes up a folded header field could be composed entirely of white space. obs-FWS = 1*WSP *(CRLF 1*WSP)

val obs_fws : (bool * bool * bool) Angstrom.t

See fws.

val comment : unit Angstrom.t

From RFC822.

comment = "(" *(ctext / quoted-pair / comment) ")"

From RFC2822.

ccontent = ctext / quoted-pair / comment comment = "(" *([FWS] ccontent) [FWS] ")"

From RFC5322.

ccontent = ctext / quoted-pair / comment comment = "(" *([FWS] ccontent) [FWS] ")"

val cfws : unit Angstrom.t

From RFC822, see fws and obs_fws.

From RFC2822.

CFWS = *([FWS] comment) (([FWS] comment) / FWS)

From RFC5322.

val qcontent : string Angstrom.t

From RFC822.

quoted-string = DQUOTE *(qtext/quoted-pair) DQUOTE ; Regular qtext or ; quoted chars.

From RFC2822.

qcontent = qtext / quoted-pair

From RFC5322.

qcontent = qtext / quoted-pair

From RFC5355.

utf8-qcontent = utf8-qtext / utf8-quoted-pair qcontent = utf8-qcontent

val quoted_string : string Angstrom.t

From RFC822.

quoted-string = DQUOTE *(qtext/quoted-pair) DQUOTE ; Regular qtext or ; quoted chars.

From RFC2047.

+ An 'encoded-word' MUST NOT appear within a 'quoted-string'

From RFC2822.

quoted-string = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS] A quoted-string is treated as a unit. That is, quoted-string is identical to atom, semantically. Since a quoted-string is allowed to contain FWS, folding is permitted. Also note that since quoted-pair is allowed in a quoted-string, the quote and backslash characters may appear in a quoted-string so long as they appear as a quoted-pair. Semantically, neither the optional CFWS outside of the quote characters nor the quote characters themselves are part of the quoted-string; the quoted-string is what is contained between the two quote characters. As stated earlier, the BACKSLASH in any quoted-pair and the CRLF in any FWS/CFWS that appears within the quoted-string are semantically "invisible" and therefore not part of the quoted-string either.

@note in other words, space(s) in FWS are "visible" between DQUOTE.

From RFC5322.

quoted-string = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]

@note currenlty, this implementation has a bug about multiple spaces in quoted-string. We need to update fws to count how many space(s) we skip.

val atom : string Angstrom.t

From RFC822.

From RFC2822.

atom = [CFWS] 1*atext [CFWS]

From RFC5322.

atom = [CFWS] 1*atext [CFWS]

From RFC5335.

utf8-atom = [CFWS] 1*utf8-atext [CFWS]

val word : word Angstrom.t

From RFC822.

word = atom / quoted-string

From RFC2822.

word = atom / quoted-string

From RFC5322.

word = atom / quoted-string

val dot_atom_text : string list Angstrom.t

From RFC2822.

dot-atom-text = 1*atext *("." 1*atext)

From RFC5322.

dot-atom-text = 1*atext *("." 1*atext)

val dot_atom : string list Angstrom.t

From RFC2822.

dot-atom = [CFWS] dot-atom-text [CFWS]

From RFC5322.

dot-atom = [CFWS] dot-atom-text [CFWS]

val local_part : local Angstrom.t

From RFC822.

From RFC2822 § 3.4.1 & RFC2822 § 4.4.

From RFC5322 § 3.4.1 & RFC5322 § 4.4.

local-part = dot-atom / quoted-string / obs-local-part obs-local-part = word *("." word)

val obs_local_part : local Angstrom.t
val domain_literal : string Angstrom.t

From RFC822.

From RFC2822.

domain-literal = [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]

From RFC5322.

domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]

val obs_domain : string list Angstrom.t
val domain : domain Angstrom.t

From RFC822 § 6.1, RFC822 § 6.2.1, RFC822 § 6.2.2 & RFC822 § 6.2.3.

From RFC2822 § 3.4.1 & RFC2822 § 4.4.

domain = dot-atom / domain-literal / obs-domain obs-domain = atom *("." atom)

From RFC5322 § 3.4.1 & RFC5322 § 4.4.

domain = dot-atom / domain-literal / obs-domain obs-domain = atom *("." atom)

@note from RFC5322, we should accept any domain as `Literal and let the user to resolve it. Currently, we fail when we catch a `Literal and do the best effort where we follow RFC5321. But may be it's inconvenient (or not?) to fail.

val id_left : local Angstrom.t

From RFC2822 § 3.6.4 & RFC2822 § 4.5.4.

obs-id-left = local-part no-fold-quote = DQUOTE *(qtext / quoted-pair) DQUOTE id-left = dot-atom-text / no-fold-quote / obs-id-left

From RFC5322 § 3.6.4 & RFC5322 § 4.5.4.

id-left = dot-atom-text / obs-id-left obs-id-left = local-part

val no_fold_literal : string Angstrom.t

From RFC2822.

no-fold-literal = "[" *(dtext / quoted-pair) "]"

From RFC5322.

no-fold-literal = "[" *dtext "]"

val id_right : domain Angstrom.t

From RFC2822 § 3.6.4 & RFC2822 § 4.5.4.

id-right = dot-atom-text / no-fold-literal / obs-id-right obs-id-right = domain

From RFC5322 § 3.6.4 & RFC5322 § 4.5.4.

id-right = dot-atom-text / no-fold-literal / obs-id-right obs-id-right = domain

val msg_id : (local * domain) Angstrom.t

From RFC822 § 4.1 & RFC822 § 6.1.

addr-spec = local-part "@" domain ; global address msg-id = "<" addr-spec ">" ; Unique message id

From RFC2822.

From RFC5322.

val addr_spec : mailbox Angstrom.t

From RFC822.

addr-spec = local-part "@" domain ; global address

From RFC2822.

An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character ("@", ASCII value 64) followed by an Internet domain. The locally interpreted string is either a quoted-string or a dot-atom. If the string can be represented as a dot-atom (that is, it contains no characters other than atext characters or "." surrounded by atext characters), then the dot-atom form SHOULD be used and the quoted-string form SHOULD NOT be used. Comments and folding white space SHOULD NOT be used around the "@" in the addr-spec. addr-spec = local-part "@" domain

From RFC5322.

val angle_addr : mailbox Angstrom.t

From RFC822.

The ABNF of angle-addr is not explicit from RFC 822 but the relic could be find here, as a part of mailbox:

mailbox = addr-spec ; simple address / phrase route-addr ; name & addr-spec

From RFC2822 § 3.4 & RFC2822 § 4.4.

From RFC5322 § 3.4 & RFC5322 § 4.4.

val obs_domain_list : domain list Angstrom.t
val obs_route : domain list Angstrom.t
val obs_angle_addr : mailbox Angstrom.t
val phrase : phrase Angstrom.t

From RFC822.

phrase = 1*word ; Sequence of words

From RFC2047 § 2 & RFC2047 § 5.

From RFC2822 § 3.2.6 & RFC2822 § 4.1.

From RFC5322 § 3.2.5 & RFC5322 § 4.1.

val obs_phrase : phrase Angstrom.t

See phrase.

val display_name : phrase Angstrom.t

From RFC822.

mailbox = addr-spec ; simple address / phrase route-addr ; name & addr-spec

From RFC2822.

From RFC5322.

name-addr = [display-name] angle-addr display-name = phrase

val mailbox : mailbox Angstrom.t

From RFC822.

mailbox = addr-spec ; simple address / phrase route-addr ; name & addr-spec

From RFC2822.

mailbox = name-addr / addr-spec

From RFC5322.

mailbox = name-addr / addr-spec

OCaml

Innovation. Community. Security.