normalize ?normalize_exp xs
produces a normalized BIL program with the same^1
semantics but in the BIL normalized form (BNF). There are two normalized forms, both described below. The first form (BNF1) is more readable, the second form (BNF2) is more strict, but sometimes yields a code, that is hard for a human to comprehend. The BNF1
is the default, to request BNF2
pass normalize_exp:true
.
Precondition: xs
is well-typed.
The BIL First Normalized Form (BNF1) is a subset of the BIL language, where expressions have the following properties:
- Memory load expressions can be only applied to a memory. This effectively disallows creation of temporary memory regions, and requires all store operations to be committed via the assignment operation. Also, this provides a guarantee, that store expressions will not occur in integer assignments, jmp destinations, and conditional expressions, leaving them valid only in an assignment statement where the rhs has type mem_t. This is effectively the same as make the
Load
constructor to have type (Load (var,exp,endian,size)
).
- No load or store expressions in the following positions: 1. the right-hand side of the let expression; 2. address or value subexpressions of the store expression; 3. storage or address subexpressions of the load expression;
The BIL Second Normalized Form (BNF2) is a subset of the BNF1 (in a sense that all BNF2 programs are also in BNF1). This form puts the following restrictions:
- No let expressions - new variables can be created only with the Move instruction.
All memory operations have sizes equal to one byte. Thus the size and endianness can be ignored in analysis. During the normalization, the following rewrites are performed
let x = <expr> in ... x ... => ... <expr> ...
x[a,el]:n => x[a+n-1] @ ... @ x[a]
x[a,be]:n => x[a] @ ... @ x[a+n-1]
m[a,el]:n <- x => (...((m[a] <- x<0>)[a+1] <- x<1>)...)[a+n-1] <- x<n-1>
m[a,be]:n <- x => (...((m[a] <- x<n-1>)[a+1] <- x<n>)...)[a+n-1] <- x<0>
(x[a] <- b)[c] => m := x[a] <- b; m[c]
^1
: The normalization procedure may duplicate expressions that might be considered non-generative. For example,
let x = m[a] in x + x
is rewritten to m[a] + m[a]
. Given a concrete semantics of a memory (for example, if memory is mapped to a device register that changes every times it is read) this expression may have different value. It will also have different effect (such as two memory accesses, page faults etc).
However, in the formal semantics of BAP we do not consider effects, and treat all expressions as side-effect free, thus the above transformation, are preserving the semantics.