package bap-std

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

Disassembled program.

Project contains data that we were able to reconstruct during the disassembly, semantic analysis, and other arbitrary amount of analyses.

Actually, project allows to associate arbitrary data with memory regions, program terms, and even attach them globally to itself. So it can be seen as a knowledge base of deeply interconnected facts.

Other than delivering information, from the bap to a passes, it can be also used as a communication media between different passes, (see Working with project).

type t = project
type input

IO interface to a project data structure.

include Regular.Std.Data.S with type t := t
type info = string * [ `Ver of string ] * string option
val version : string
val size_in_bytes : ?ver:string -> ?fmt:string -> t -> int
val of_bytes : ?ver:string -> ?fmt:string -> Regular.Std.bytes -> t
val to_bytes : ?ver:string -> ?fmt:string -> t -> Regular.Std.bytes
val blit_to_bytes : ?ver:string -> ?fmt:string -> Regular.Std.bytes -> t -> int -> unit
val of_bigstring : ?ver:string -> ?fmt:string -> Core_kernel.Std.bigstring -> t
val to_bigstring : ?ver:string -> ?fmt:string -> t -> Core_kernel.Std.bigstring
val blit_to_bigstring : ?ver:string -> ?fmt:string -> Core_kernel.Std.bigstring -> t -> int -> unit
module Io : sig ... end
module Cache : sig ... end
val add_reader : ?desc:string -> ver:string -> string -> t Regular.Std.reader -> unit
val add_writer : ?desc:string -> ver:string -> string -> t Regular.Std.writer -> unit
val available_readers : unit -> info list
val default_reader : unit -> info
val set_default_reader : ?ver:string -> string -> unit
val with_reader : ?ver:string -> string -> (unit -> 'a) -> 'a
val available_writers : unit -> info list
val default_writer : unit -> info
val set_default_writer : ?ver:string -> string -> unit
val with_writer : ?ver:string -> string -> (unit -> 'a) -> 'a
val default_printer : unit -> info option
val set_default_printer : ?ver:string -> string -> unit
val with_printer : ?ver:string -> string -> (unit -> 'a) -> 'a
val find_reader : ?ver:string -> string -> t Regular.Std.reader option
val find_writer : ?ver:string -> string -> t Regular.Std.writer option
val create : ?disassembler:string -> ?brancher:brancher source -> ?symbolizer:symbolizer source -> ?rooter:rooter source -> ?reconstructor:reconstructor source -> input -> t Core_kernel.Std.Or_error.t

from_file filename creates a project from a provided input source. The reconstruction is a multi-pass process driven by the following input variables, provided by a user:

  • brancher decides instruction successors;
  • rooter decides function starts;
  • symbolizer decides function names;
  • reconstructor provides algorithm for symtab reconstruction;

The project is built incrementally and iteratively until a fixpoint is reached. The fixpoint is reached when an information stops to flow from the input variables.

The overall algorithm of can depicted with the following diargram, where boxes denote data and ovals denote processes:

               +---------+   +---------+   +---------+
               | brancher|   |code/data|   |  rooter |
               +----+----+   +----+----+   +----+----+
                    |             |             |
                    |             v             |
                    |        -----------        |
                    +------>(   disasm  )<------+
                             -----+-----
                                  |
                                  v
              +----------+   +---------+   +----------+
              |symbolizer|   |   CFG   |   | reconstr +
              +-----+----+   +----+----+   +----+-----+
                    |             |             |
                    |             v             |
                    |        -----------        |
                    +------>(  reconstr )<------+
                             -----+-----
                                  |
                                  v
                             +---------+
                             |  symtab |
                             +----+----+
                                  |
                                  v
                             -----------
                            (  lift IR  )
                             -----+-----
                                  |
                                  v
                             +---------+
                             | program |
                             +---------+

The input variables, are represented with stream of values. Basically, they can be viewed as cells, that depends on some input. When input changes, the value is recomputed and passed to the stream. Circular dependencies are allowed, so a rooter may actually depend on the program term. In case of circular dependencies, the above algorithm will be run iteratively, until a fixpoint is reached. A criterium for the fixpoint, is when no data need to be recomputed. And the data must be recomputed when its input is changed or needs to be recomputed.

User provided input can depend on any information, but a good start is the information provided by the Info module. It contains several variables, that are guaranteed to be defined in the process of reconstruction.

For example, let's assume, that a create_source function actually requires a filename as its input, to create a source t, then it can be created as easily as:

Stream.map Input.file ~f:create_source

As a more complex, example let's assume, that a source now requires that both arch and file are known. We can combine two different streams of information with a merge function:

Stream.merge Input.file Input.arch ~f:create_source, where create_source is a function of type: string -> arch -> t.

If the source requires more than two arguments, then a Stream.Variadic, that is a generalization of a merge function can be used. Suppose, that a source of information requires three inputs: filename, architecture and compiler name. Then we first define a list of arguments,

let args = Stream.Variadic.(args Input.arch $Input.file $Compiler.name)

and apply them to our function create_source:

Stream.Variadic.(apply ~f:create_source args.

Sources, specified in the examples above, will call a create_source when all arguments changes. This is an expected behavior for the arch and file variables, since the do not change during the program computation. Mixing constant and non-constant (with respect to a computation) variables is not that easy, but still can be achieved using either and parse combinators. For example, let's assume, that a source requires arch and cfg as its input:

Stream.either Input.arch Input.cfg |>
Stream.parse inputs ~init:nil ~f:(fun create -> function
    | First arch -> None, create_source arch
    | Second cfg -> Some (create cfg), create)

In the example, we parse the stream that contains either architectures or control flow graphs with a state of type, cfg -> t Or_error.t. Every time an architecture is changed, (i.e., a new project is started), we recreate a our state, by calling the create_source function. Since, we can't proof, that architecture will be decided before the cfg, or decided at all we need to provide an initial nil function. It can return either a bottom value, e.g., let nil _ = Or_error.of_string "expected arch"

or it can just provide an empty information.

val arch : t -> arch

arch project reveals the architecture of a loaded file

val disasm : t -> disasm

disasm project returns results of disassembling

val program : t -> program term

program project returns a program lifted into IR

val with_program : t -> program term -> t

with_program project program updates a project program

val symbols : t -> symtab

symbols t returns reconstructed symbol table

val with_symbols : t -> symtab -> t

with_symbols project symbols updates project symbols

val storage : t -> dict

returns an attribute storage of the project

val with_storage : t -> dict -> t

updates the attribute storage

val memory : t -> value memmap

memory t returns the memory as an interval tree marked with arbitrary values.

val tag_memory : t -> mem -> 'a tag -> 'a -> t

tag_memory project region tag value tags a given region of memory in project with a given tag and value. Example: Project.tag_memory project tained color red

val substitute : t -> mem -> string tag -> string -> t

substitute p region tag value is like tag_memory, but it will also apply substitutions in the provided string value, as per OCaml standard library's Buffer.add_substitute function.

Example:

Project.substitute project comment "$symbol starts at $symbol_addr"

The following substitutions are supported:

  • $section{_name,_addr,_min_addr,_max_addr} - name of region of file to which it belongs. For example, in ELF this name will correspond to the section name
  • $symbol{_name,_addr,_min_addr,_max_addr} - name or address of the symbol to which this memory belongs
  • $asm - assembler listing of the memory region
  • $bil - BIL code of the tagged memory region
  • $block{_name,_addr,_min_addr,_max_addr} - name or address of a basic block to which this region belongs
  • $min_addr, $addr - starting address of a memory region
  • $max_addr - address of the last byte of a memory region.
val with_memory : t -> value memmap -> t

with_memory project updates project memory. It is recommended to use tag_memory and substitute instead of this function, if possible.

Extensible record

Project can also be viewed as an extensible record, where one can store arbitrary values. Example,

let p = Project.set project color `green

This will set field color to a value `green.

val set : t -> 'a tag -> 'a -> t

set project field value sets a field to a give value. If field was already set, then new value overrides the old one. Otherwise the field is added.

val get : t -> 'a tag -> 'a option

get project field returns the value of the field if it exists

val has : t -> 'a tag -> bool

has project field checks whether field exists or not. Useful for fields of type unit, that actually isomorphic to bool fields, e.g., if Project.has project mark

val del : t -> 'a tag -> t

del project attr removes an attribute from a project

module Info : sig ... end

Information obtained during project reconstruction.

module Input : sig ... end

Input information.

Registering passes

To add new pass one of the following register_* functions should be called.

type pass
val register_pass : ?autorun:bool -> ?runonce:bool -> ?deps:string list -> ?name:string -> (t -> t) -> unit

register_pass ?autorun ?runonce ?deps ?name pass registers a pass over a project.

If autorun is true, then the host program will run this pass automatically. If runonce is true, then for a given project the pass will be run only once. Each repeating attempts to run the pass will be ignored. The runonce parameter defaults to false when autorun is false, and to true otherwise.

Parameter deps is list of dependencies. Each dependency is a name of a pass, that should be run before the pass. The dependencies will be run in a specified order every time the pass is run.

To get access to command line arguments use Plugin.argv

val register_pass' : ?autorun:bool -> ?runonce:bool -> ?deps:string list -> ?name:string -> (t -> unit) -> unit

register_pass' pass registers pass that doesn't modify the project effect and is run only for side effect. (See register_pass)

val passes : unit -> pass list

passes () returns all currently registered passes.

val find_pass : string -> pass option

find_pass name returns a pass with the given name.

type second = float

time duration in seconds

module Pass : sig ... end

A program analysis pass.

OCaml

Innovation. Community. Security.