Legend:
Library
Module
Module type
Parameter
Class
Class type
Clang_AST - Raw Clang AST, in OCaml
The definition of the AST types returned by Clang_parser.
Tested with Clang 5.0 up to 11.0.1 It was initially based on Clang 4. Although we tried to keep it up to date, some features introduced in later Clang may be missing (especially C++, which is not generally tested).
We support C and a subset of C++.
Not necessarily supported:
attributes
C++ coroutines
C++ structured exception handling
Visual C++ specific (#pragma comment, etc.)
Objective-C
openMP
openCL
features added in Clang 5 and later
A function, enum, struct, etc. can be declared several times and defined at most once in a translation unit. Clang distinguishes every declaration and definition in the AST. We don't. Either the element is effectively defined, and we always expose the definition, or it is not, and we return the declarations. This may change in the future in case this turns out to be a bad idea. (we could do as Clang and always expose each and every declaration, and include an indirect pointer to the definition, if any).
Nodes that can be referenced from several points of the AST (such as declaration) are tagged with an identifier which is unique across a translation unit.
type lang =
| Lang_C
| Lang_CXX
(*
Language in extern declaration.
*)
type name = {
name_print : string;
(*
human-readable name for the declaration, even if it is one of the special kinds of names (C++ constructor, etc.
*)
name_qualified : string;
(*
fully-qualified human-readable name (with namespace)
Whether this is an anonymous struct or union. To be an anonymous struct or union, it must have been declared without a name and there must be no objects of this type declared
*)
record_is_complete : bool;
(*
true if this can be considered a complete type.
*)
record_is_valid : bool;
(*
true if it has no semantic error and thus has a valid layout information; otherwise, size and offset information are set to 0
(C++) Represents a C++ base or member initializer. This is part of a constructor initializer that initializes one non-static member variable or one base class.
(C++) Represents the dependent type named by a dependently-scoped typename using declaration, e.g. using typename Base<T>::foo; Template instantiation turns these into the underlying type.
(C++) ref-qualifier associated with this function type
*)
}
Represents a prototype with parameter type info, e.g. 'int foo(int)' or 'int foo(void)'. 'void' is represented as having no parameters, not as having a single void parameter.
C array with a specified size that is not an integer-constant-expression
*)
| Size_Incomplete
(*
C array with an unspecified size
*)
| Size_Dependent
(*
(C++) array with dependent size
*)
Array size
and array_size_modifier =
| Size_Normal
(*
no modifier
*)
| Size_Static
(*
static keyword, as in int arraystatic 2
*)
| Size_Star
(*
star modifier, as in int array*
*)
Array size modifier
and ref_qualifier =
| RQ_None
(*
No ref-qualifier was provided
*)
| RQ_LValue
(*
An lvalue ref-qualifier was provided (&)
*)
| RQ_RValue
(*
An rvalue ref-qualifier was provided (&&)
*)
(C++) The kind of C++11 ref-qualifier associated with a function type. This determines whether a member function's "this" object can be an lvalue, rvalue, or neither.
Represents an enum. In C++11, enums can be forward-declared with a fixed underlying type, and in C we allow them to be forward-declared with no underlying type as an extension.
Variadic atomic builtins: __atomic_exchange, __atomic_fetch_*, __atomic_load, __atomic_store, and __atomic_compare_exchange_*, for the similarly-named C++11 instructions, and __c11 variants for <stdatomic.h>.
A builtin binary operation expression such as "x + y" or "x <= y". The operands will already have been converted to appropriate types (e.g., by performing promotions or conversions). Note that assignment is a binary operator, not a coumpound assignment operator.
Represents a C99 designated initializer expression. A designated initializer expression (C99 6.7.8) contains one or more designators (which can be field designators, array designators, or GNU array-range designators) followed by an expression that initializes the field or element(s) that the designators refer to.
NOTE: Designators may not be useful for us. If I understand correctly, Clang's semantic analysis removes them by generating explicit nested list of initializers and putting designated initializers at the correct place. Similarly, default value for missing initializers are put explicitly...
A generic selection (C11 6.5.1.1) contains an unevaluated controlling expression, followed by one or more generic associations. Each generic association specifies a type name and an expression, or "default" and an expression (in which case it is known as a default generic association). The type and value of the generic selection are identical to those of its result expression, which is defined as the expression in the generic association with a type name that is compatible with the type of the controlling expression, or the expression in the default generic association if no types are compatible.
*)
| GNUNullExpr
(*
Implements the GNU __null extension, which is a name for a null pointer constant that has integral type (e.g., int or long) and is the same size and alignment as a pointer.
We support imaginary integer and floating point literals, like "1.0i". We represent these as a wrapper around FloatingLiteral and IntegerLiteral, and have a Complex type whose element type matches the subexpression.
*)
| ImplicitValueInitExpr
(*
Represents an implicitly-generated value initialization of an object of a given type. Implicit value initializations occur within semantic initializer list expressions (InitListExpr) as placeholders for subobject initializations not explicitly specified by the user.
Structure and union members. X->F and X.F. In C++, also method or overloaded operator.
*)
| NoInitExpr
(*
Represents a place-holder for an object not to be initialized by anything. This only makes sense when it appears as part of an updater of a DesignatedInitUpdateExpr.
An expression which accesses a pseudo-object l-value. A pseudo-object is an abstract object, accesses to which are translated to calls. The pseudo-object expression has a syntactic form, which shows how the expression was actually written in the source code, and a semantic form, which is a series of expressions to be executed in order which detail how the operation is actually evaluated. Optionally, one of the semantic forms may also provide a result value for the expression.
This is the GNU Statement Expression extension: (nt X=4; X;). The StmtExpr contains a single CompoundStmt node, which it evaluates and takes the value of the last subexpression. A StmtExpr is always an r-value; values "returned" out of a StmtExpr will be copied.
Expression with either a type or (unevaluated) expression operand. For expression operands, we only keep its type as it is all that matters for these operators.
(C++) Represents binding an expression to a temporary. This ensures the destructor is called for the temporary. It should only be needed for non-POD, non-trivially destructable class types.
*)
| CXXBoolLiteralExprof bool
(*
(C++) A boolean literal, per (C++ lex.bool Boolean literals).
(C++) A default argument (C++ dcl.fct.default). This wraps up a function call argument that was created from the corresponding parameter's default argument, when the call did not explicitly supply arguments for all of the parameters.
(C++) A use of a default initializer in a constructor or in aggregate initialization. This wraps a use of a C++ default initializer (technically, a brace-or-equal-initializer for a non-static data member) when it is implicitly used in a mem-initializer-list in a constructor (C++11 class.base.initp8) or in aggregate initialization (C++1y dcl.init.aggrp7).
(C++) Represents a C++ member access expression where the actual member referenced could not be resolved because the base expression or the member name was dependent.
(C++) Represents a folding of a pack over an operator. This expression is always dependent and represents a pack expansion of the forms: ( expr op ... ) ( ... op expr ) ( expr op ... op expr )
(C++) Represents a call to an inherited base class constructor from an inheriting constructor. This call implicitly forwards the arguments from the enclosing context (an inheriting constructor) to the specified inherited base class constructor.
(C++) Represents a C++11 noexcept expression (C++ expr.unary.noexcept). The noexcept expression tests whether a given expression might throw. Its result is a boolean constant.
*)
| CXXNullPtrLiteralExpr
(*
(C++) The null pointer literal (C++11 lex.nullptr). Introduced in C++11, the only literal of type nullptr_t is nullptr.
(C++) Represents a C++ pseudo-destructor (C++ expr.pseudo). A pseudo-destructor is an expression that looks like a member access to a destructor of a scalar type, except that scalar types don't have destructors.
*)
| CXXScalarValueInitExpr
(*
(C++) An expression "T()" which creates a value-initialized rvalue of type T, which is a non-class type. See (C++98 5.2.3p2).
(C++) A C++ typeid expression (C++ expr.typeid), which gets the type_info that corresponds to the supplied type, or the (possibly dynamic) type of the supplied expression.
(C++) Represents a reference to a function parameter pack that has been substituted but not yet expanded. When a pack expansion contains multiple parameter packs at different levels, this node is used to represent a function parameter pack at an outer level which we have already substituted to refer to expanded parameters, but where the containing pack expansion cannot yet be expanded.
Array subscripting (normalized: we always put the array expression first and the index expression last, although the source may state '4A' and not 'A4')
and atomic_expr = {
atomic_op : int;
(*
Kind of atomic builtin operator. TODO: use a variant.
type of the computed result, before converted back to lvalue type
*)
}
For compound assignments (e.g. +=), we keep track of the type the operation is performed in. Due to the semantics of these operators, the operands are promoted, the arithmetic performed, an implicit conversion back to the result type done, then the assignment takes place. This captures the intermediate type which the computation is done in.
(C++) if this is a call to an overloaded operator, give its name
*)
}
Represents a function call. In C++, this also represents a method call or an overloaded operator call, in which case the callee is a MemberExpr contaning both the objet argument (possibly an implicit this made explicit) and the member function. We don't expose Clang's CXXMemberCallExpr and CXXOperatorCallExpr, which are redundant.
and unary_operator =
| UO_PostInc
(*
++
*)
| UO_PostDec
(*
--
*)
| UO_PreInc
(*
++
*)
| UO_PreDec
(*
--
*)
| UO_AddrOf
(*
&
*)
| UO_Deref
(*
*
*)
| UO_Plus
(**)
| UO_Minus
(*
*)
| UO_Not
(*
~
*)
| UO_LNot
(*
!
*)
| UO_Real
(*
__real extension
*)
| UO_Imag
(*
__imag extension
*)
| UO_Extension
(*
__extension__ marker
*)
| UO_Coawait
(*
(C++) coroutines co_await operator
*)
and cast_kind =
| CStyleCast
(*
An explicit cast in C (C99 6.5.4) or a C-style cast in C++ (C++ expr.cast), which uses the syntax (Type)expr
*)
| CXXFunctionalCast
(*
(C++) An explicit C++ type conversion that uses "functional" notation (C++ expr.type.conv). More
*)
| CXXConstCast
(*
A(C++) C++ const_cast
*)
| CXXDynamicCast
(*
(C++) A C++ dynamic_cast
*)
| CXXReinterpretCast
(*
(C++) A C++ reinterpret_cast
*)
| CXXStaticCast
(*
(C++) A C++ static_cast
*)
| ImplicitCast
(*
Implicit type conversions, which have no direct representation in the original source code
if this initializer list initializes an array with more elements than there are initializers in the list, specifies an expression to be used for value initialization of the rest of the elements
*)
}
Describes an initializer list, which can be used to initialize objects of different types, including struct/class/union types, arrays, and vectors.
(C++) A reference to a name which we were able to look up during parsing but could not resolve to a specific declaration. This arises in several ways: we might be waiting for argument-dependent lookup; the name might resolve to an overloaded function; and eventually: the lookup might have included a function template. These never include UnresolvedUsingValueDecls, which are always class members and therefore appear only in UnresolvedMemberLookupExprs.
(C++) Represents a C++ member access expression for which lookup produced a set of overloaded functions. The member access may be explicit or implicit. In the final AST, an explicit access always becomes a MemberExpr. An implicit access may become either a MemberExpr or a DeclRefExpr, depending on whether the member is static.
whether the lambda is mutable, meaning that any captures values can be modified
*)
lambda_has_explicit_parameters : bool;
(*
whether this lambda has an explicit parameter list vs. an implicit (empty) parameter list.
*)
lambda_has_explicit_result_type : bool;
(*
whether this lambda had its result type explicitly specified
*)
}
(C++) A C++ lambda expression, which produces a function object (of unspecified type) that can be invoked later. C++11 lambda expressions can capture local variables, either by copying the values of those local variables at the time the function object is constructed (not when it is called!) or by holding a reference to the local variable. These captures can occur either implicitly or can be written explicitly between the square brackets (...) that start the lambda expression. C++1y introduces a new form of "capture" called an init-capture that includes an initializing expression (rather than capturing a variable), and which can never occur implicitly.
declaration of the local variable being captured, if any
*)
lambda_capture_is_implicit : bool;
(*
whether this was an implicit capture (not written between the square brackets introducing the lambda)
*)
lambda_capture_is_pack_expansion : bool;
(*
whether this capture is a pack expansion, which captures a function parameter pack
*)
}
Describes the capture of a variable or of this, or of a C++1y init-capture.
and lambda_capture_default =
| LCD_None
| LCD_ByCopy
| LCD_ByRef
(*
(C++) The default, if any, capture method for a lambda expression.
*)
and lambda_capture_kind =
| LCK_This
(*
Capturing the *this object by reference.
*)
| LCK_StarThis
(*
Capturing the *this object by copy
*)
| LCK_ByCopy
(*
Capturing by copy (a.k.a., by value)
*)
| LCK_ByRef
(*
Capturing by reference.
*)
| LCK_VLAType
(*
Capturing variable-length array type.
*)
The different capture forms in a lambda introducer. C++11 allows capture of this, or of local variables by copy or by reference. C++1y also allows "init-capture", where the initializer is an expression.
template arguments provided as part of this template-id
*)
}
(C++) Represents a C++ member access expression where the actual member referenced could not be resolved because the base expression or the member name was dependent.
typeid(expr) form: expression argument; None for typeid(type)
*)
typeid_is_potentially_evaluated : bool;
(*
determine whether this typeid has a type operand which is potentially evaluated, per C++11 expr.typeidp3
*)
}
(C++) A C++ typeid expression (C++ expr.typeid), which gets the type_info that corresponds to the supplied type, or the (possibly dynamic) type of the supplied expression. This represents code like typeid(int) or typeid( *objPtr ).
(C++) A qualified reference to a name whose declaration cannot yet be resolved. DependentScopeDeclRefExpr is similar to DeclRefExpr in that it expresses a reference to a declaration such as X<T>::value. The difference, however, is that an DependentScopeDeclRefExpr node is used only within C++ templates when the qualification (e.g., X<T>::) refers to a dependent type. In this case, X<T>::value cannot resolve to a declaration because the declaration will differ from one instantiation of X<T> to the next. Therefore, DependentScopeDeclRefExpr keeps track of the qualifier (X<T>::) and the name of the entity being referenced ("value"). Such expressions will instantiate to a DeclRefExpr once the declaration can be found.
(C++) Represents an expression – generally a full-expression – that introduces cleanups to be run at the end of the sub-expression's evaluation.The most common source of expression-introduced cleanups is temporary objects in C++, but several other kinds of expressions can create cleanups, including basically every call in ARC that returns an Objective-C pointer. This expression also tracks whether the sub-expression contains a potentially-evaluated block literal. The lifetime of a block literal is the extent of the enclosing scope.
(C++) This represents a block literal declaration, which is like an unnamed FunctionDecl. For example: ^{ statement-body } or ^(int arg1, float arg2){ statement-body }
declaration which triggered the lifetime-extension of this temporary, if any.
*)
materialize_is_bound_to_lvalue_reference : bool;
(*
whether this materialized temporary is bound to an lvalue reference; otherwise, it's bound to an rvalue reference
*)
}
(C++) Represents a prvalue temporary that is written into memory so that a reference can bind to it. Prvalue expressions are materialized when they need to have an address in memory for a reference to bind to. This happens when binding a reference to the result of a conversion, e.g., const int &r = 1.0; Here, 1.0 is implicitly converted to an int. That resulting int is then materialized via a MaterializeTemporaryExpr, and the reference binds to the temporary. MaterializeTemporaryExprs are always glvalues (either an lvalue or an xvalue, depending on the kind of reference binding to it), maintaining the invariant that references always bind to glvalues. Reference binding and copy-elision can both extend the lifetime of a temporary. When either happens, the expression will also track the declaration which is responsible for the lifetime extension.
umber of expansions that will be produced when this pack expansion is instantiated, if already known
*)
}
(C++) Represents a C++11 pack expansion that produces a sequence of expressions. A pack expansion expression contains a pattern (which itself is an expression) followed by an ellipsis.
a type that was preceded by the 'template' keyword
*)
| Name_specifier_Global
(*
the global specifier '::'
*)
(C++) Nested name specifiers are the prefixes to qualified namespaces. For example, "foo::" in "foo::x" is a nested name specifier. Nested name specifiers are made up of a sequence of specifiers, each of which can be a namespace, type, identifier (for dependent names), decltype specifier, or the global specifier ('::'). The last two specifiers can only appear at the start of a nested-namespace-specifier.
a template template parameter pack that has been substituted for a template template argument pack, but has not yet been expanded into individual arguments
*)
and initialization_style =
| New_NoInit
(*
New-expression has no initializer as written
*)
| New_CallInit
(*
New-expression has a C++98 paren-delimited initializer
*)
| New_ListInit
(*
New-expression has a C++11 list-initializer
*)
(C++) Initialization style for new expression (C++).
Adaptor for mixing declarations with statements and expressions. CompoundStmt mixes statements, expressions and declarations (variables, types). Another example is ForStmt, where the first statement can be an expression or a declaration.
(C++) This represents C++0x stmt.ranged's ranged for statement, represented as 'for (range-declarator : range-expression)'. TODO: expose also the desugared form begin/end/cond/inc/loopvar?