package bin

  1. Overview
  2. Docs
A DSL to describe binary formats

Install

Dune Dependency

Authors

Maintainers

Sources

bstr-0.0.1.tbz
sha256=05701fb60e9f9b419cc2d3989cf58c60eba1b5e7a5d08fba7f9d92f29ea3a46e
sha512=2199910457d8dedeefe3ffa3e4ef7a4f0845bb66459bcf3951d6197ec5f32a0c49c29874341d7fc4ee026c82a95a0c8aaaea084840b10530f6c00a6120960174

Description

Published: 28 Apr 2025

README

Bstr, Slice & Bin

This small set of libraries offers a homogeneous API between 2 types and their derivations with the slice type, as well as a small DSL for decoding "packets" (such as ARP or DNS) without too much difficulty.

The aim is to homogenize the 2 types bytes and bigstring and to derive them with a slice type, giving the user all the levers needed to manipulate byte sequences, whether in the form of a bigstring or bytes. The slice view avoids copying when it comes to decoding a packet and extracting a sub-part. The slice also applies to bigstrings, whose Bigarray.Array1.sub is more expensive.

This set of libraries is a synthesis of astring (which offers a range of useful functions as well as slice), cstruct (which offers a similar API for bigstrings), bigstringaf (which offers some other useful functions), the standard OCaml library and repr for decoding/encoding these values into OCaml records/variants.

About API

Here is an overview of the functions offered by bstr compared to other libraries:

bstr

cstruct

bigstringaf

slice.bstr

overlap

memcpy

memmove

fast sub

fast blit

release GC lock

fast contains

Fast sub

sub is perhaps the most useful operation for a bigarray. In fact, unlike bytes and strings, sub offers a view (equivalent or smaller) of a bigarray without making a copy. If, for example, you need to decode^1 a large sequence of bytes (without having the notion of a "stream"), it may be useful to use the sub operation to decode the information byte by byte and avoid copying throughout the decoding process.

The implementation of sub proposed by Bstr is a little different from that of the standard OCaml library. In fact, it is specialized for a bigarray of dimension 1 containing bytes. In fact, the Bigarray.Array1.sub function is a little more generic and Bstr takes the opportunity to "specialize" the function according to our type.

However, according to the representation proposed by Cstruct, Cstruct.sub remains the fastest operation compared to Bstr and Bigstringaf. If you want to have the same performance as Cstruct, the specialized Slice module for Bstr.t values is equivalent.

Here is a comparative table of the sub function between all implementations (AMD Ryzen 9 7950X 16-Core Processor):

bigstringaf

bstr

cstruct

slice

sub

20.0 ns

17.8ns

2.8ns

2.4ns

Fast blit

blit from a string or a bytes is a little faster than Bigstringaf and Cstruct. The difference basically lies in the fact that Bstr.t uses other "tags" to describe the FFI with the C memcpy function (specifically the [@untagged] tag).

Here is a comparative table of the blit_from_string function between all the implementations:

bigstringaf

bstr

cstruct

blit_from_string

5.1ns

4.3ns

4.7ns

mmaped or not? (GC lock)

There are 2 ways to copy bytes between two bigarrays:

  • the "mmaped" version ({memcpy,memmove}_mmaped)
  • the simple version ({memcpy,memmove})

The first is quite specific because it releases the GC lock after a certain number of bytes (4096) have been copied. This can be advantageous if you want to make a large copy between two bigarrays in parallel in a Thread.

If we specify mmaped, it is because the copy between two bigarrays, one of which may come from Unix.map_file, can also take time (and we may want to do it in parallel in a Thread) since it involves reading/writing on the disk.

let copy_to_file bstr filename () =
  let len = Bstr.length bstr in
  let fd = Unix.openfile filename Unix.[ O_WRONLY ] 0o644 in
  let dst = Unix.map_file fd Bigarray.char Bigarray.c_layout false [| len |] in
  let dst = Bigarray.array1_of_genarray dst in
  Bstr.memcpy_mmaped bstr ~src_off:0 dst ~dst_off:0 ~len

let () =
  let th = Thread.create (copy_to_file bstr filename) () in
  (* do something else in true parallel of [copy_to_file]. *)
  (* the GC will not interrupt [th] during the copy. *)
  Thread.join th

The simple version does not release the GC lock and only applies the desired function (memmove or memcpy).

memmove or memcpy?

Bstr.blit always uses the memmove function. However, it can be advantageous to use memcpy in a fairly specific case: when you know that the source refers to a memory area that is not shared with the destination.

To find out, you can use the Bstr.overlap function, which checks whether or not the two bigarrays given have a common memory area.

Dependencies (3)

  1. slice = version
  2. dune >= "3.5.0"
  3. ocaml >= "4.14.0"

Dev Dependencies

None

Used by

None

Conflicts

None

OCaml

Innovation. Community. Security.