from_file filename
creates a project from a provided input source. The reconstruction is a multi-pass process driven by the following input variables, provided by a user:
brancher
decides instruction successors;rooter
decides function starts;symbolizer
decides function names;reconstructor
provides algorithm for symtab reconstruction;
The project is built incrementally and iteratively until a fixpoint is reached. The fixpoint is reached when an information stops to flow from the input variables.
The overall algorithm of can depicted with the following diargram, where boxes denote data and ovals denote processes:
+---------+ +---------+ +---------+
| brancher| |code/data| | rooter |
+----+----+ +----+----+ +----+----+
| | |
| v |
| ----------- |
+------>( disasm )<------+
-----+-----
|
v
+----------+ +---------+ +----------+
|symbolizer| | CFG | | reconstr +
+-----+----+ +----+----+ +----+-----+
| | |
| v |
| ----------- |
+------>( reconstr )<------+
-----+-----
|
v
+---------+
| symtab |
+----+----+
|
v
-----------
( lift IR )
-----+-----
|
v
+---------+
| program |
+---------+
The input variables, are represented with stream of values. Basically, they can be viewed as cells, that depends on some input. When input changes, the value is recomputed and passed to the stream. Circular dependencies are allowed, so a rooter may actually depend on the program
term. In case of circular dependencies, the above algorithm will be run iteratively, until a fixpoint is reached. A criterium for the fixpoint, is when no data need to be recomputed. And the data must be recomputed when its input is changed or needs to be recomputed.
User provided input can depend on any information, but a good start is the information provided by the Info
module. It contains several variables, that are guaranteed to be defined in the process of reconstruction.
For example, let's assume, that a create_source
function actually requires a filename as its input, to create a source t
, then it can be created as easily as:
Stream.map Input.file ~f:create_source
As a more complex, example let's assume, that a source now requires that both arch
and file
are known. We can combine two different streams of information with a merge
function:
Stream.merge Input.file Input.arch ~f:create_source
, where create_source
is a function of type: string -> arch -> t
.
If the source requires more than two arguments, then a Stream.Variadic
, that is a generalization of a merge function can be used. Suppose, that a source of information requires three inputs: filename, architecture and compiler name. Then we first define a list of arguments,
let args = Stream.Variadic.(args Input.arch $Input.file $Compiler.name)
and apply them to our function create_source
:
Stream.Variadic.(apply ~f:create_source args
.
Sources, specified in the examples above, will call a create_source
when all arguments changes. This is an expected behavior for the arch
and file
variables, since the do not change during the program computation. Mixing constant and non-constant (with respect to a computation) variables is not that easy, but still can be achieved using either
and parse
combinators. For example, let's assume, that a source
requires arch
and cfg
as its input:
Stream.either Input.arch Input.cfg |>
Stream.parse inputs ~init:nil ~f:(fun create -> function
| First arch -> None, create_source arch
| Second cfg -> Some (create cfg), create)
In the example, we parse the stream that contains either architectures or control flow graphs with a state of type, cfg -> t Or_error.t
. Every time an architecture is changed, (i.e., a new project is started), we recreate a our state, by calling the create_source
function. Since, we can't proof, that architecture will be decided before the cfg
, or decided at all we need to provide an initial nil
function. It can return either a bottom value, e.g., let nil _ = Or_error.of_string "expected arch"
or it can just provide an empty information.