503 errors on statics
Incident Report for Fasterize
Postmortem

Description

Between 10:18 UTC+2 and 11:20 a.m. UTC+2, the static resources of some clients responded with 503 errors. Internet users did not necessarily see these errors, but some sites may have displayed broken pages because of these missing objects, especially for Internet users who did not have these objects in their browser cache.

Facts and Timeline

  • 10:18: manual update of one of our component
  • 10:28: first alert
  • 10:36: Start of bypass of the CDN layer for the impacted domains
  • 10:52: All impacted domains bypass the CDN layer. Due to some DNS propagation delays, errors occur until 11:20
  • 13:42: Start of reconnection of impacted domains to the CDN
  • 14:04: Impacted domains are reconnected to the CDN

Analyze

The incident was caused by an update on one of our component, not supposed to be related to the production stack. An execution role needed by edge processes on the CDN layer was removed as a side-effect of this update.

Metrics

Severity: level 2 (site degradation, performance problem and/or feature broken with difficulty to bypass impacting a significant number of users)

Time To Detect: 10 min

Time To Resolve: 60min

Impacts

Only a few sites were impacted (<10).

Countermeasures

  • Short-term

    • adjust alerting on edge processes to improve diagnosis
    • adjust alert level on 5xx errors viewed from the CDN layer
  • Mid-Term

    • secure the execution role of edge processes
    • ease CDN layer unplugging for a specific customer
Posted May 22, 2020 - 16:46 CEST

Resolved
Everything is now back to normal.
Post-mortem will follow in the next hours.
Sorry for the inconvenience :-(
Posted May 22, 2020 - 15:47 CEST
Monitoring
The problem has been fixed for all impacted customers.
We are monitoring errors to assert everything is back to normal.
Posted May 22, 2020 - 14:08 CEST
Update
For some customers, statics are not served by the CDN layer anymore, we're actively working to fix this.
But in the meantime, websites are normally served.
Posted May 22, 2020 - 11:05 CEST
Identified
Some 503 errors have occured for static assets on the CDN layer. This was limited to some customers.
Posted May 22, 2020 - 11:01 CEST
This incident affected: Acceleration.