One of Microsoft’s longstanding strategies toward improving software security continues to involve investing in defensive technologies that make it difficult and costly for attackers to exploit vulnerabilities. These solutions generally have a broad and long lasting impact on software security because they focus on eliminating classes of vulnerabilities or breaking the exploitation primitives that attackers rely on. This also helps improve software security over the long run because it shifts the focus away from the hand-to-hand combat of finding, fixing, and servicing individual vulnerabilities and instead accepts the fact that complex software will undoubtedly have vulnerabilities.
To further emphasize our commitment to this strategy and to cast a wider net for defensive ideas, Microsoft awarded the BlueHat Prize in 2012 and subsequently started the ongoing Microsoft Defense Bounty in June, 2013 which has offered up to $50,000 USD for novel defensive solutions. Last month, we announced that we will now award up to $100,000 USD for qualifying Microsoft Defense Bounty submissions. This increase further affirms the value that we place on these types of defensive solutions and we’re hopeful this will help encourage more research into practical defenses.
In this blog post, we wanted to take this opportunity to explain how we evaluate defensive solutions and describe the characteristics that we look for in a good defense. There are a few key dimensions that we evaluate solutions based on, specifically: robustness, performance, compatibility, agility, and adoptability. Keeping these dimensions in mind when developing a defense should increase the likelihood of the defense being deemed a good candidate for the Microsoft Defense Bounty and will also go a long way toward increasing the likelihood of the defense being integrated and adopted in practice.
Criteria for evaluating defensive solutions
Robustness
The first and most important criteria deals with the security impact of the defense. After all, the defense must have an appreciable impact on making it difficult and costly to exploit vulnerabilities in order for it to be worth pursuing.
We evaluate robustness in terms of:
- The impact the defense will have on modern classes of vulnerabilities and/or exploits. A good defense should eliminate a common vulnerability class or break a key exploitation technique or primitive used by modern exploits.
- The level of difficulty that attackers will face when adapting to the defense. A good defense should include a rigorous analysis of the limitations of the defense and how attackers are likely to adapt to it. Defenses that offer only a small impediment to attackers are unlikely to qualify.
Performance
The second most important criteria deals with the impact the defense is expected to have on performance. Our customers expect Windows and the applications that run on Windows to be highly responsive and performant. In most cases, the scenarios where we are most interested in applying defenses (e.g. web browsers) are the same places where high performance is expected. As such, it is critical that defenses have minimal impact on performance and that the robustness of a defense justifies any potential performance costs.
Since performance impact is measured across multiple dimensions, it is not possible to simply distill the requirements down into a single allowed regression percentage. Instead, we evaluate performance in context using the following guide posts:
- Impact on industry standard benchmarks. There are various industry standard benchmarks that evaluate performance in common application workloads (e.g. browser DOM/JS benchmarks). Although SPEC CPU benchmarks can provide a good baseline for comparing defense solutions, we find that it is critical to evaluate performance impact under real-world application workloads.
- Impact on runtime performance. This is measured in terms of CPU time and elapsed time either in the context of benchmarks or in common application scenarios (e.g. navigating to top websites in a browser). Defenses with low impact on runtime performance will rate higher in our assessment.
- Impact on memory performance. This is measured in terms of the how the defense affects various aspects of memory footprint including commit, working set, and code size. Defenses with low impact on memory performance will rate higher in our assessment.
Compatibility
One of the reasons that Windows has been an extremely successful platform is because of the amount of care that has been taken to retain binary compatibility with applications. As such, it is critical that defenses retain compatibility with existing applications or that there is a path for enabling the defense in an opt-in fashion. Rebuilding the world (e.g. all binaries that run on Windows) is not an option for us in general. As such, defenses are expected to be 100% compatible in order to rate highly in our assessment.
In particular, we evaluate compatibility in terms of the following:
- Binary interoperability. Any defense must be compatible with legacy applications/binaries or it must support enabling the defense on an opt-in basis. If an opt-in model is pursued, then the defense must generally support legacy binaries (such as legacy DLLs) being loaded by an application that enables the defense. In the case where the defense requires binaries to be rebuilt in order to be protected, the protected binaries must be able to be loaded on legacy versions of Windows that may not support the defense at runtime.
- ABI compliant. Related to the above, any defense that alters code generation or runtime interfaces must be compliant with the ABI (e.g. cannot break calling conventions or other established contracts). For example, details on the x64 ABI for Windows can be found here.
- No false positives. Defenses must not make use of heuristics or other logic that may be prone to false positives (and thus result in application compatibility issues).
Agility
Given the importance of binary compatibility and the long term implications of design decisions, we also need to take care to ensure that we are afforded as much flexibility as possible when it comes to making changes to defenses in the future. In this way, we pay close attention to the agility of the design and implementation associated with a defense. Defenses that have good properties in terms of agility are likely to rate higher in our assessment.
Adoptability
All defenses carry some cost with them that dictates how easy it will be to build them and integrate them into the platform or applications. This means we must take into account the engineering cost associated with building the defense and we must assess the taxes that may be inflicted upon developers and systems operators when it comes to making use of the defense in practice. For example, defenses that require developers to make code changes or system operators to manage complex configurations are less desirable. Defenses that have low engineering costs and minimize the amount of friction to enable them are likely to rate higher in our assessment.
Conclusion
The criteria above are intended to help provide some transparency and insight into the guidelines that we use when evaluating the properties of a defense both internally at Microsoft and for Microsoft’s Defense Bounty program. It’s certainly the case that we set a high bar in terms of what we expect from a defensive solution, but we believe we have good reasons for doing so that are grounded both in terms of the modern threat landscape and our customer’s expectations.
We strongly encourage anyone with a passion for software security to move “beyond the bugs” and explore opportunities to invest time and energy into developing novel defenses. Aside from being a challenging and stimulating problem space, there is now also the potential to receive up to $100,000 USD for your efforts in this direction through the Microsoft Defense Bounty program. The impact that these defenses can have on reducing the risk associated with software vulnerabilities and helping keep people safe is huge.
Matt Miller
Microsoft Security Response Center