Application unavailable/slow requests

Incident Report for ProcedureFlow

Postmortem

On April 28, 2020 between 4:33 PM and 4:39 PM Eastern, we experienced slow requests, connectivity problems, and a sporadic outage lasting ~5 minutes affecting all customers. We’re very sorry this occurred and would like to share with you what happened, what we learned, and what we're doing to prevent incidents like this from happening in the future.

This incident was caused by cascading problems: reduced capacity during maintenance, normal daily mid-peak load, poor failure modes for certain low-priority requests, and a number of slow/unoptimized requests which blocked other requests.

We were making maintenance-related changes in response to an incident from the previous week. At the time of the previous incident, we thought that the root cause was different based on the deployment we were making at the time. This assumption was wrong and caused the same issue to happen again when we made these changes.

We've written a postmortem for the previous incident which describes what we've learned and what we're doing to prevent incidents like this from happening in the future.

We know how much you rely on ProcedureFlow to help your business succeed. Having 2 incidents in one week is not something we're proud of, but we will continue to analyze this event for opportunities to serve you better and earn the trust you place in us.

Posted May 04, 2020 - 17:55 UTC

Resolved

This incident has been resolved. Once we understand the full scope of this issue, we'll provide a postmortem.

Posted Apr 28, 2020 - 21:02 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Apr 28, 2020 - 20:44 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted Apr 28, 2020 - 20:40 UTC

Investigating

We are currently investigating this issue.

Posted Apr 28, 2020 - 20:37 UTC