This is the third and last in a series of posts that looks at how Microsoft responds to elevated threats to customers through the Microsoft Security Response Center’s (MSRC) Software and Services Incident Response Plan (SSIRP). Our previous posts discussed how Microsoft protects customers against elevated threats and the anatomy of a SSIRP incident. In this blog post we’ll provide some recommendations for building your own incident response process, drawing upon our nearly two decades of experience in security incident response process.
As the threat landscape continues to evolve, we are always learning and adjusting our incident response approach. A post-incident review of each security incident that our teams manage provides insights into how we can evolve our services and products to be more secure, improve our response processes and respond faster, and help keep customers more secure. This ensures we are continually reviewing and updating our response processes to keep in step with an evolving security landscape.
After nearly two decades in incident response, we can share some of the best practices and learnings that have come from this experience. These apply to organizations both big and small and are relevant to all security response teams. Some of these we learned the hard way - when there was no established practice across the industry, and we learned from our experience when things didn’t go exactly to plan. Any organization looking to establish their own incident response plan can benefit from the below best practices:
Plan. Have a plan and a process ready before any response is needed. Refer to NIST publication Computer Security Incident Handling Guide(800-61 Rev 2) for a detailed description of what such a plan should look like. The document is intended to assist organizations in establishing computer security incident response capabilities and handling incidents efficiently and effectively.
Stakeholder support. Formalize your plan and get executive and other stakeholder support for your incident response plan. Your plan will only be as effective as they allow it to be.
Practice. Exercise your response process before you need it. ‘Tabletop’ simulations allow you to safely run through a mock incident to uncover any deficiencies in processes, assumptions, and differences of understanding across teams, and develop collateral needed to communicate effectively to both customers and executives.
Leadership. Make sure there is clear accountability for who is leading the incident response process – in SSIRP we call this the Crisis Lead. The Crisis Lead’s primary role is to lead, direct, coordinate, and adjust the response plan as needed. They need to have an in-depth knowledge of the incident response process and to be included in any side discussions that may occur regarding the incident. Your response runs the risk of being ineffective if the Crisis Lead doesn’t have all the necessary context and isn’t involved in all the discussions through the response process.
Empower. Ensure your incident response teams have the autonomy to move fast within the bounds of the approved process and understand when to seek executive approval for extraordinary action.
Communication. Communication should be coordinated within the incident response process. All communication – including executive, employee, and customer communication – should be coordinated through the Crisis Lead who is accountable for incident resolution. Without this, communications will often be incomplete or inaccurate and only serve to confuse. Clear, accurate communication builds confidence in the incident response process, maintains trust with customers, protects your brand, and is essential for fast effective response.
Collaborate. Take a holistic approach and involve teams early. In addition to engineering teams, public relations, customer communications, customer support and legal teams may need to contribute to the incident. Bringing them in early – ideally from the start – allows them to better understand context and move quickly with good judgement.
Multithread. Split your incident response into workstreams when necessary. Large or complex incident response events should be split into separate workstreams. For example, move all engineering work, along with the appropriate teams, to an Engineering workstream. Similarly, customer support, customer communications, and public relations should form a separate Communications workstream. The Crisis Lead should be part of every workstream for coordination of the response effort.
Synch. Hold regular meetings for each workstream and for the overall response effort. Separate meetings should periodically take place for each workstream, in addition to a regular all-participant meeting where each of the workstreams reports back, so that participants gain valuable context about what is happening in the overall response effort.
Learn. Undertake a Post-Incident Review. Remember that the job isn’t finished the moment the problem is mitigated and communicated to customers. Understanding the root cause of the issue and considering how response could be improved enables you to drive durable improvements to systems, technologies, and processes that drive security for your organization and customers.
When considering cybersecurity and incident response, it truly is*“Better to have, and not need, than to need, and not have”*(F. Kafka). Crisis handling is more efficient when stakeholders are all working from the same well-rehearsed playbook.
Interested in learning more about Microsoft’s SSIRP process works to protect customers? If you are attending Black Hat this year, check out Eric Doerr’s presentation The Enemy Within: Modern Supply Chain Attacks, which will share more experiences from the SSIRP team about how we responded to a software supply chain attack.
Simon Pope, Director of Incident Response, Microsoft Security Response Center (MSRC)