The heap in user mode has a number of different measures built in to make exploiting heap overrun vulnerabilities more challenging. Similar checks have been in debug versions of the kernel pool for some time to aid driver debugging. Windows 7 RC is the first version of Windows with some of these integrity checks turned on in release builds.
Background
The vast majority of MSRC cases have historically been in user mode Windows components and applications. It should therefore come as no surprise that this is where MSEC has focused its efforts in developing security mitigation techniques to make it harder for attackers to exploit software vulnerabilities reliably.
Mitigations such as stack protection (/GS), Data Execution Prevention (DEP), Heap Protection, Address Space Layout Randomization (ASLR) and Structured Exception Handler Overwrite Protection (SEHOP) have made reliably exploiting any vulnerabilities that do exist far more challenging.
Over the last couple of years, the proportion of security bulletins affecting the Windows kernel has been increasing, from under 5% in 2007 to just over 10% in 2008 [Bulletins]. We therefore decided to look again at possible mitigations in the kernel. A number of the bulletins addressed pool overruns (for example MS07-017, MS08-001, MS08-007, MS08-030) so we reconsidered the cases for and against including pool validation checks in free builds. Irrespective of whether a particular vulnerability would have resulted in denial of service or code execution, mitigating the pool overrun ensures the former.
Pool Overruns
The pool in the kernel is analogous to the heap in user mode. It is designed to be a high performance dynamic memory manager. There are two types of pool memory.**
-
- Nonpaged pool is memory that is guaranteed always to reside in physical memory and can therefore be accessed at any time from any IRQL. It is a very scarce resource.
- Paged pool is memory that can be paged in and out of physical memory. It cannot be accessed from Dispatch level or above since page faults cannot be satisfied at these levels.
Pool memory is allocated differently depending on the size of the block requested. Allocations of up to 256 bytes use Lookaside lists which are singly linked lists of blocks of the same size. Allocations from 256 bytes up to 4080 bytes use doubly linked lists of blocks. Both of these have a granularity of 8 bytes. Larger allocations return a whole number of pages. Each block is preceded by an 8 byte header which contains information about the size of the block, the size of the previous block and the type of pool memory. If a block is free, its header is followed by a LIST_ENTRY structure which contains pointers to the next and previous free blocks.
Pool overruns usually occur as secondary effects of arithmetic errors such as integer overflows. This could be in the calculation for the amount of memory to allocate (as in MS08-001) or in the validation code that ensures a buffer is sufficiently large (as in MS08-007).
Like heap overruns, pool overruns are commonly exploited by using the unlinking operation when an entry is removed from a doubly linked list to perform two arbitrary overwrites.
BOOLEAN RemoveEntryList(IN PLIST_ENTRY Entry)
{
PLIST_ENTRY Blink;
PLIST_ENTRY Flink;
Flink = Entry->Flink; // what
Blink = Entry->Blink; // where
Blink->Flink = Flink; // *(where) = what
Flink->Blink = Blink; // *(what+4) = where
return (BOOLEAN)(Flink == Blink);
}
When a block is freed, if either of the adjacent blocks is also free, then the two free blocks are merged into a single larger block. Merging unlinks the existing free block and leads to the arbitrary write. There are a number of conditions that need to be met for this to happen reliably, but essentially the attacker needs to construct a fake header for the next block and supply a LIST_ENTRY with the desired what/where values. This technique is described in several of the references, with [SoBeIt05] and [Kortch08] specifically focussing on the Pool.
Safe Unlinking
Safe unlinking is a very simple idea. In any valid doubly-linked list the following should always hold:
Entry->Flink->Blink == Entry->Blink->Flink == Entry
Adding these checks to the code gives us the following.
BOOLEAN RemoveEntryList(IN PLIST_ENTRY Entry)
{
PLIST_ENTRY Blink;
PLIST_ENTRY Flink;
Flink = Entry->Flink;
Blink = Entry->Blink;
if (Flink->Blink != Entry) KeBugCheckEx(...);
if (Blink->Flink != Entry) KeBugCheckEx(...);
Blink->Flink = Flink;
Flink->Blink = Blink;
return (BOOLEAN)(Flink == Blink);
}
Checking that these conditions hold before performing the unlinking operation makes it possible to detect the memory corruption at the earliest opportunity. Pool corruption should always be considered a fatal error, hence the Bug Check. This is the same check that has been in the user-mode heap since XPSP2, while more extensive measures were introduced in Windows Vista [Marinescu06].
Security
This simple check blocks the most common exploit technique for pool overruns. It doesn’t mean pool overruns are impossible to exploit, but it significantly increases the work for an attacker. As one security researcher put it, “[safe unlinking] makes pool overruns immeasurably harder to exploit”. One of the main goals of mitigations is to remove generic exploit vectors. Work done on exploiting one pool overrun should not buy an attacker anything when it comes to exploiting a different one. Safe unlinking clearly meets this goal.
Reliability
Safe unlinking also has benefits from a reliability point of view. Since the vast majority of pool corruptions will already result in a Bug Check, we would not expect the mitigation to increase the overall number of crashes. What it will do is Bug Check as soon as a pool overrun is detected, preventing further memory corruption with more unpredictable consequences.
Performance
One of the prime concerns about adding checks to a high performance piece of code like the pool management is whether it will affect speed. In practice this check adds no more than 8 instructions to the binary in each place it occurs; this is not enough to make a noticeable difference.
A more serious concern would be if the check might require a paging operation, which is very expensive in terms of performance. In this case, the only memory accessed is already touched by the code, so there is no difference in the paging requirements.
Running live performance tests bears the theory out, and there is no measurable performance difference between builds with and without the checks.
Compatibility
Another concern with mitigations is that some applications may rely on particular behavior of the system that the mitigation prevents. In this case, it is safe to say that pool corruption is always bad. There are no exceptions.
Conclusion
We have already seen a significant benefit from the inclusion of safe unlinking in the user mode heap, and we believe this value will carry over into kernel mode.
- Peter Beck, MSEC Security Science
*Postings are provided “AS IS” with no warranties, and confers no rights.*
References
[Bulletins] MS07-017, MS07-066, MS07-067, MS08-001, MS08-004, MS08-007, MS08-025, MS08-030, MS08-036, MS08-061, MS08-063, MS08-066
[Solar00] Solar Designer. JPEG COM Marker Processing Vulnerability in Netscape Browsers, Bugtraq Jul, 2000
[Maxx01] MaXX. Vudo malloc tricks, Phrack 57 Aug, 2001
[Anon01] Anonymous. Once upon a free(), Phrack 57 Aug, 2001
[SoBeIt05] SoBeIt. How to exploit Windows kernel memory pool, Xcon 2005
[Marinescu06] Adrian Marinescu. Windows Vista Heap Management Enhancements, Blackhat Vegas 2006
[Kortch08] Kostya Kortchinsky. Real World Kernel Pool Exploitation, SyScan 08 Hong Kong
[SRDblog] MS08-001 (part 3) – The case of the IGMP network critical, January 2008