package mrmime
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=0fe37cf06ef25a0a0fef7e485b836188824528d1425743cc5c479bc281262b31
sha512=f15bf2cee0998c162886a512a680a415529db39dbdea241c910c1fae7cad5c1bb94afe13ee247ac52149349f08dc922d4d4f37920fe54a0a0e6d0ab40ee2170e
Description
Parser and generator of mail in OCaml
Published: 26 Nov 2020
README
Mr. MIME (Multipurpose Internet Mail Extensions)
mrmime
is a library to parse and generate mail according several RFCs:
RFC822: Standard For The Format of ARPA Internet Text Messages
RFC2822: Internet Message Format
RFC5321: Simple Mail Transfer Protocol
RFC5322: Internet Message Format
RfC2045: MIME Part One: Format of Internet Message Bodies
RFC2046: MIME Part Two: Media Types
RFC2047: MIME Part Three: Message-Header Extensions for Non-ASCII Text
RFC2049: MIME Part Five: Conformance Criteria and Examples
RFC6532: Internationalized Email Headers
mrmime
was made with angstrom
to be able to parse mails and try to do the best-effort. From a bunch of mails (2 billions), mrmime
is able to parse all of them - however, results can diverge from what you expect.
In other side, mrmime
is able to generate valid mail from an OCaml description. Generation follows some rules:
stream produced emits only line per line
we do the best-effort to limit lines by 78 characters
we follows RFC6532 and emit UTF-8 mail
How to parse a mail?
We have different ways to parse a mail and it's depends of what you want. In fact, in some ways, you should be interesting only by the header part. In some others cases, you probably want bodies. We decide to separate these tasks into 2 API (which differ) to fit under some constraints.
For example, if you want to extract only the header, we probably want to take care about memory consumption - if you want, for example, to implement a SMTP server and where only the header is interesting.
An stream API is provided in this case and from this, we are able to implement a DKIM checker which needs only one-pass to verify your mail.
In other side, if you want to extract bodies of your mail, parser provided is not a stream parser where we need to extract bodies from a multipart mail. An explanation of how to use it is given in this document.
Parse only the header part
For many purposes, we are mostly interesting to parse only the header part of a mail. In this case, Hd
sub-module should be what you want.
A complex example of Hd
is available on the ocaml-dkim
project which wants to extract DKIM
signature from header.
let dkim_signature = Mrmime.Field_name.v "DKIM-Signature"
let extract_dkim () =
let open Mrmime in
let tmp = Bytes.create 0x1000 in
let buffer = Bigstringaf.create 0x1000 in
let decoder = Hd.decoder buffer in
let rec decode () = match Hd.decode decoder with
| `Field field ->
( match Location.prj field with
| Field.Field (field_name, Unstructured, v)
when Field_name.equal field_name dkim_signature ->
Fmt.pr "%a: %a\n%!" Field_name.pp dkim_signature Unstructured.pp v
| _ -> decode () )
| `Malformed err -> failwith err
| `End rest -> ()
| `Await ->
let len = input stdin tmp 0 (Bytes.length tmp) in
( match Hd.src decoder (Bytes.unsafe_to_string tmp) 0 len with
| Ok () -> decode ()
| Error (`Msg err) -> failwith err ) in
decode ()
This little snippet will parse a mail which is encoded with CRLF end-of-line from stdin
(so you should map your mail with this newline convention). When it reachs a DKIM
field, it prints a well-parsed value of it (in our case, an unstructured value). [Other
] corresponds to other fields - DKIM
signature can appear here where we failed to parse value as an unstructured value.
Parse entirely a mail
Of course, the initial goal of mrmime
is to parse an entire mail. In this case, you should use the Mail
sub-module which provides angstrom
parser.
Bodies can be weight and if you want to store them by yourself, we provide an API which expects consumers to consume bodies (and store them, for example, into UNIX files).
A complex example is available on ptt
to extract bodies and save them into UNIX files. For this we use:
val stream : emitters:(Header.t -> (string option -> unit) * 'id) -> (Header.t * 'id t) Angstrom.t
Which will call emitters
at any part of your mail. parser will decode properly part (according Content-Transfer-Encoding
) and give you inputs into your consumer.
How to emit a mail?
mrmime
is able to generate a mail from an OCaml description of it. You have several ways to craft informations like address or Content-Type
field for a specific part.
Many sub-modules of mrmime
provide a way to construct an information like a subject needed for you mail or recipients of it. For example, the sub-module Mailbox
provides an easy way to construct an address:
let romain_calascibetta =
let open Mrmime.Mailbox in
Local.[ w "romain"; w "calascibetta" ] @ Domain.(domain, [ a "x25519"; a "net" ])
Documentation was done to help you to construct many of these values. Of course, Header
will be the module to construct an header:
let header =
let open Mrmime in
Field.[ Field (Field_name.subject, Unstructured,
Unstructured.Craft.(compile [ v "Simple"; sp 1; v "Email" ]))
; Field (Field_name.v "To", Addresses, [ `Mailbox romain_calascibetta ])
; Field (Field_name.date, Date, (Date.of_ptime ~zone:GMT (Ptime_clock.now ()))) ]
|> Header.of_list
Then, Header
provides a to_stream
function which will emit your header line per line (with the CRLF newline convention) - mostly to be able to branch it into a SMTP pipe.
Finally, for a multipart mail, the Mt
sub-module is the most interesting to make part from stream (stream from a file or from standard input) associated to Content
fields (like Content-Transfer-Encoding
). mrmime
takes care about how to encode your stream (base64
or quoted-printable
).
A complex example of how to use Mt
module is available in facteur
project which is able to send a multipart mail.
Encoding
A real effort was made to consider any inputs/outputs of mrmime
as UTF-8 string. This result is done by some underlying packages:
rosetta as universal unifier to unicode
uuuu as mapper from ISO-8859 to Unicode
coin as mapper from KOI8-{U,R} to Unicode
yuscii as mapper from UTF-7 to Unicode
SMTP protocol constraints bodies to use only 7 bits per byte (historial limitation). By this way, encoding such as quoted-printable or base64 are used to encode bodies and respect this limitation. mrmime
uses:
Status of the project
mrmime
is really experimental. Where it wants to take care about many purposes (encoding or multipart), API should change often. We reach a first version because we are able to send a well formed multipart mail from it - however, it's possible to reach weird case where mrmime
can emit invalid mail.
About parser, the same advise is done where Mail format is not really respected by implementations in many cases and the parser should fail on some of them for a weird reason.
Of course, feedback is expected to improve it. So you can use it, but you should not expect an industrial quality - I mean, not yet. So play with it, and enjoy your hacking!
Dependencies (17)
Dev Dependencies (4)
Used by (7)
Conflicts
None