The recent global technology outage reminded the world of the critical need for having documented emergency procedures. The massive outage had millions of IT teams scrambling to respond.
In our post-mortem meetings, we imagined the craziness and messiness that occurred in organizations whose crisis management documentation was haphazardly stored across SharePoint, Google Docs, and desktops.
The pandemonium as teams frantically searched for the right steps, figuring out which was the latest version, and who was in charge of what problem-solving steps. Or what about the organizational knowledge lost during the last round of layoffs or key team members retiring? How did those companies get through this?
Luckily, your Comprose team had no need to panic since our own emergency procedure manual was already built and quickly accessed in our own policy management software Zavanta. Our experts in IT immediately jumped into action with a sense of urgency, but not panic. Having a single source of truth in Zavanta eliminated any worry about where to find what we needed and assured us we were using the most recent versions of the emergency procedures and policies.
If you and your teams were affected and the process did not go as expected, we wanted to share our best practices to help you be better prepared the next time a major outage occurs.
Culture of Compliance
It should not be surprising that a company that offers policy management software designed to help clients prepare for risk and demonstrate compliance also has developed a strong culture of compliance within our own team.
This may be a big leap if you do not currently have a compliance-focused culture. Our best practices are explained in this context. Consider addressing each of these components that support a future culture of compliance:
- Leadership Commitment
- Policies and Procedures
- Audits, Training, and Monitoring
- Open Communication
- Enforcement and Accountability
- Continuous Improvement
Leadership Commitment
The executive leadership team of Comprose has prioritized compliance, ethical behavior, and integrity from day one. We have invested in the time to create all our policies and procedures and regularly review them. And through the SOC 2 and ISO certifications, we have committed resources to obtaining and maintaining compliance.
This may be the perfect time to encourage your leaders to begin a stronger focus on compliance.
It’s critical for your leaders and managers to commit the appropriate resources needed to assess and meet your compliance needs. Subject matter experts need to be motivated to take the time to write your policies and procedures. People across the organization may be involved in reviewing and approving documents. And all employees must understand this is a priority to regularly access and read the content.
Clear Policies and Procedures
At Comprose, we “practice what we preach” and have invested personnel resources into writing our own policies and procedures across the entire organization. Current Zavanta clients likely have a leg up on this as well.
Surprisingly, many of our clients do not currently include their IT teams as users in Zavanta. This may be the time to expand. (Feel free to share this blog with your IT managers!) This is exactly the scenario you need a structured, written, tested process. The scenario that only happens occasionally but is mission critical!
Because we have been so thorough, our IT response to the outage was purposeful and organized. We take our Master Service Agreements (MSAs) with our clients seriously and are committed to honoring the standards we agreed to. In this case, we know our clients trust us to limit any outage to a maximum of 10 hours or less.
Currently, our documentation states we have an RTO (recovery time objective) of 10 hours. So, the maximum outage never extends beyond 10 hours. However, we would usually fail over well before 10 hours.
The components of our emergency procedure manual include the following procedures in Zavanta. They were easily combined using Zavanta’s manual maker. Every employee is required to save a copy locally or print a paper copy as a back-up in case of an Internet outage.
- Emergency Contact Procedure
- Business Continuity Plan Summary
- Business Continuity and Disaster Recovery Plan
- Zavanta After Hours Rotation
- Submitting a SaaS Tool Outage
- Mitigating a Root Account Compromise
Regular Audits, Training, and Monitoring
Our IT team has regular training sessions and meetings to ensure everyone understands their responsibilities related to the emergency procedures. During our initial ISO 27001 and SOC 2 certifications and the subsequent renewals, these processes were also reviewed and updated as necessary. And now with both ISO and SOC 2 moving to a yearlong live audit, everyone absolutely must be 100% with the program.
We also practice tabletop drills and failover twice a year and publish the results for our clients for full transparency. We test our own procedures, utilize our checklist feature, and update as necessary.
Open Communication
Our IT team communicated with all employees immediately once the problem was identified and frequently throughout the resolution process. In addition, the action team documented their status every 30 minutes, and this detail is available in a post-incident cause analysis artifact for full transparency. This timeline demonstrates that we followed our defined process and will be used in future audits.
We also communicated with our customers through a status page on the Zavanta application every 30-60 minutes with anything we knew and our progress on restoring services.
Enforcement and Accountability
Zavanta is our hub for accountability. Read Verify is one key feature that shows compliance and provides a future audit trail. As the individual procedures were written, we created a workflow to ensure the IT team all read and understood the content.
Zavanta also fully tracks the entire review, edit, and approval processes. Everything is date and time stamped by user. In addition, our IT team holds themselves accountable through the tabletop drills and practices mentioned above.
Continuous Improvement
This last section is how we learn, grow, innovate, and improve. Within several days of the incident, we gathered groups of employees to talk about the process and discuss lessons learned and corrective actions.
The IT team uses a Zavanta picklist for document categories, which is a huge time saver in this post-incident analysis. It’s easy to filter on IT-Emergency or IT-SOC2 or IT-ISO27001. This allows Zavanta to quickly filter through hundreds of documents related to the situation under review.
We found 40 related documents for review. All of them will be tracked in Zavanta. They will have a new review and approval date, full revision history log, and be pushed to our external compliance provider we use for ISO and SOC 2 certifications.
Any client needs this for any system!
We are also in the process of adding a new procedure regarding client communications from our marketing team. During the outage, we were able to post a special web page with details. After the outage, we emailed our clients with a summary of our actions. Our new procedure will include a new proactive email to clients as soon as the problem is identified.
Marketing is another department that can benefit from using Zavanta. We offer more ideas in this article: Marketing SOP Examples to Help Boost Productivity.
Continuous improvement isn’t limited to incidents like this. If you are working towards a culture of compliance, you need to embrace continuous improvement throughout the year.
Zavanta provides the tools for you to automatically schedule regular reviews of your policies and SOPs. You can schedule these in advance and let the system handle the notifications. This is especially helpful for keeping up with changes in regulations.
ROI on Compliance?
What is the cost of this one outage for your organization? How much revenue may have been lost for every hour you were down? Can you quantify the lost productivity?
Now compare that to the cost of Zavanta or the time it takes to create your emergency procedures.
In our case, we were quickly back to business-as-usual working for our clients. We honored our MSA and RTO of 10 hours. We used our own software to successfully weather the storm and write our own Zavanta case study.
If you are a current client and would like to learn more about using Zavanta in your IT department, please talk to your CSM. If you are not yet a client, reach out so we can start a conversation.
Additional Resources