Previously on this blog, we’ve talked about how MSRC automates the root cause analysis of vulnerabilities reported and found. After doing this, our next step is variant analysis: finding and investigating any variants of the vulnerability. It’s important that we find all such variants and patch them simultaneously, otherwise we bear the risk of these being exploited in the wild. In this post, I’d like to explain the automation we use in variant finding.
For the past year or so, we’ve been augmenting our manual code review processes with Semmle, a third-party static analysis environment. It compiles code to a relational database (the snapshot database – a combination of database and source code), which is queried using Semmle QL, a declarative, object-oriented query language designed for program analysis.
The basic workflow is that, after root cause analysis, we write queries to find code patterns that are semantically similar to the original vulnerability. Any results are triaged as usual and provided to our engineering teams for a fix to be developed. Also, the queries are placed in a central repository to be re-run periodically by MSRC and other security teams. This way, we can scale our variant finding over time and across multiple codebases.
In addition to variant analysis, we’ve been using QL proactively, in our security reviews of source code. This will be the topic of a future blog post. For now, let’s look at some real-world examples inspired by MSRC cases.
Incorrect integer overflow checks
This first case is a bug that’s straightforward to define, but would be tedious to find variants of in a large codebase.
The code below shows a common idiom for detecting overflow on the addition of unsigned integers:
if (x + y < x) { // handle integer overflow }
Unfortunately, this does not work properly when the width of the integer type is small enough to be subject to integral promotion. For example, if x
and y
were both unsigned short
, when compiled, the above code would end up being equivalent to (unsigned int)x + y < x
, making this overflow check ineffective.
Here’s a QL query that matches this code pattern:
`import cpp
from AddExpr a, Variable v, RelationalOperation r where a.getAnOperand() = v.getAnAccess() and r.getAnOperand() = v.getAnAccess() and r.getAnOperand() = a and forall(Expr op | op = a.getAnOperand() | op.getType().getSize() < 4) and not a.getExplicitlyConverted().getType().getSize() < 4 select r, “Useless overflow check due to integral promotion”`
In the from
clause, we define the variables, and their types, to be used in the rest of the query. AddExpr
, Variable
, and RelationalOperation
are QL classes representing various sets of entities in the snapshot database, e.g. RelationalOperation
covers every expression with a relational operation (less than, greater than, etc.)
The where
clause is the meat of the query, using logical connectives and quantifiers to define how to relate the variables. Breaking it down, this means that the addition expression and the relational operation need the same variable as one of their operands (x
in the example code above):
a.getAnOperand() = v.getAnAccess() and r.getAnOperand() = v.getAnAccess()
The other operand of the relational operation must be the addition:
r.getAnOperand() = a
Both operands of the addition must have a width less than 32 bits (4 bytes):
forall(Expr op | op = a.getAnOperand() | op.getType().getSize() < 4)
But if there is an explicit cast on the addition expression, we’re not interested if it’s less than 32 bits:
not a.getExplicitlyConverted().getType().getSize() < 4
(This is so a check like (unsigned short)(x + y) < x
doesn’t get flagged by the query.)
Finally, the select
clause defines the result set.
This vulnerability was originally reported in Chakra (the JavaScript engine of Edge), where the consequence of that particular ineffective overflow check was memory corruption. The query matched the original vulnerability but no additional variants in Chakra. However, we discovered several when applying this exact query to other Windows components.
Unsafe use of SafeInt
An alternative to rolling your own integer overflow checks is to use a library with such checks built in. SafeInt is a C++ template class that overrides arithmetic operators to throw an exception where overflow is detected.
Here’s an example of how to use it correctly:
int x, y, z; // ... z = SafeInt<int>(x) + y; |
But here is an example of how it can be unintentionally misused – the expression passed to the constructor may already have overflowed:
int x, y, z; // ... z = SafeInt<int>(x + y); |
How to write a query to detect this? In the previous example, our query only used built-in QL classes. For this one, let’s start by defining our own. For this, we choose one or more QL classes to subclass from (with extends
), and define a characteristic predicate which specifies those entities in the snapshot database that are matched by the class:
class SafeInt extends Type { SafeInt() { this.getUnspecifiedType().getName().matches("SafeInt<%") } }
The QL class Type
represents the set of all types in the snapshot database. For the QL class SafeInt
, we subset this to just types with a name that begins with “SafeInt<
”, thus indicating that they are instantiations of the SafeInt
template class. The getUnspecifiedType()
predicate is used to disregard typedefs and type specifiers such as const
.
Next, we define the subset of expressions that could potentially overflow. Most arithmetic operations can overflow, but not all; this uses QL’s instanceof
operator to define which ones. We use a recursive definition because we need expressions such as (x+1)/y
to be included, even though x/y
is not.
`class PotentialOverflow extends Expr { PotentialOverflow() { (this instanceof BinaryArithmeticOperation // match x+y x-y x*y and not this instanceof DivExpr // but not x/y and not this instanceof RemExpr) // or x%y
or (this instanceof UnaryArithmeticOperation // match x++ x– ++x –x -x and not this instanceof UnaryPlusExpr) // but not +x
// recursive definitions to capture potential overflow in // operands of the operations excluded above or this.(BinaryArithmeticOperation).getAnOperand() instanceof PotentialOverflow or this.(UnaryPlusExpr).getOperand() instanceof PotentialOverflow } }`
Finally, we relate these two classes in a query:
from PotentialOverflow po, SafeInt si where po.getParent().(Call).getTarget().(Constructor).getDeclaringType() = si select po, po + " may overflow before being converted to " + si
.(Call)
and .(Constructor)
are examples of an inline cast, which, similar to instanceof
, is another way of restricting which QL classes match. In this case we are saying that, given an expression that may overflow, we’re only interested if its parent expression is some sort of call. Furthermore, we only want to know if the target of that call is a constructor, and if it’s a constructor for some SafeInt
.
Like the previous example, this was a query that provided a number of actionable results across multiple codebases.
JavaScript re-entrancy to use-after-free
This next example was a vulnerability in Edge caused by re-entrancy into JavaScript code.
Edge defines many functions that can be called from JavaScript. This model function illustrates the essence of the vulnerability:
`HRESULT SomeClass::vulnerableFunction(Var_args, UINT argCount, Var_ retVal) { // get first argument - // from Chakra, acquire pointer to array BYTE* pBuffer; UINT bufferSize; hr = Jscript::GetTypedArrayBuffer(args[1], &pBuffer, &bufferSize);
// get second argument – // from Chakra, obtain an integer value int someValue; hr = Jscript::VarToInt(args[2], &someValue);
// perform some operation on the array acquired previously doSomething(pBuffer, bufferSize);
…`
The problem was that when Edge calls back into Chakra, e.g. during VarToInt
, arbitrary JavaScript code may be executed. The above function could be exploited by passing it a JavaScript object that overrides valueOf
to free the buffer, so when VarToInt
returns, pBuffer
is a dangling pointer:
``var buf = new ArrayBuffer(length); var arr = new Uint8Array(buf);
var param = {} param.valueOf = function() {
/free buf
(code to actually do this would be defined elsewhere)/
neuter(buf); // neuter buf
by e.g. posting it to a web worker
gc(); // trigger garbage collection
return 0;
};
vulnerableFunction(arr, param);``
The specific pattern we’re looking for with QL is therefore: acquisition of a pointer from GetTypedArrayBuffer
, followed by a call to some Chakra function that may execute JavaScript, followed by some use of the pointer.
For the array buffer pointer, we match on the calls to GetTypedArrayBuffer
, where the second argument (getArgument
of Call
is zero-indexed) is an address-of expression (&
), and take its operand:
class TypedArrayBufferPointer extends Expr { TypedArrayBufferPointer() { exists(Call c | c.getTarget().getName() = "GetTypedArrayBuffer"and this = c.getArgument(1).(AddressOfExpr).getOperand()) } }
The exists
logical quantifier is used here to introduce a new variable (c
) into the scope.
There are several Chakra API functions that could be used for JavaScript re-entrancy. Rather than defining them by name, we can identify the internal Chakra function that performs this task, and use QL to figure this out from the call graph:
`// examine call graph to match any function that may eventually call MethodCallToPrimitive predicate mayExecJsFunction(Function f) { exists(Function g | f.calls+(g) and g.hasName(“MethodCallToPrimitive”) }
// this defines a call to any of the above functionsclass MayExecJsCall extends FunctionCall { MayExecJsCall() { mayExecJsFunction(this) } }`
The “+
” suffix of the calls
predicate specifies a transitive closure – it applies the predicate to itself until there is a match. This permits a concisely defined exploration of the function call graph.
Finally, this query brings these QL class definitions together, relating them by control flow:
from TypedArrayBufferPointer def, MayExecJsCall call, VariableAccess use, Variable v where v.getAnAccess() = def and v.getAnAccess() = use and def.getASuccessor+() = call and call.getASuccessor+() = use select use, "Call to " + call + " between definition " + def + " and use " + use
The predicate getASuccessor()
specifies the next statement or expression in the control flow. Therefore, using e.g. call.getASuccessor+() = use
will follow the control flow graph from call
until there is a match to use
. The diagram below illustrates this:
This query uncovered four variants in addition to the originally reported vulnerability, all of them assessed as critical severity.
That’s all for now. The next instalment will cover using QL for data flow analysis and taint tracking, with examples from our security review of an Azure firmware component.
Steven Hunter, MSRC Vulnerabilities & Mitigations team