Performance degradation
Incident Report for Fasterize


On Thursday, October 19th, between 4:55 PM UTC+2 and 6:25 PM UTC+2, Fasterize european platform was unable to optimize web pages for all customers. The original version was then delivered.

We discovered that between 4:45 PM UTC+2 and 5:50 PM UTC+2, a specific request was made that caused a failure in the Fasterize engine during optimization and left the process in a non-functional state.

The number of functional processes then decreased until it fell below a critical threshold. Our engine then automatically switched to a degraded mode where pages were no longer optimized and served without delay.

At 5:29 PM UTC+2, the oncall team manually added capacity to the platform to return to a stable state, but this did not definitely improve the situation. Starting from 6:15 PM UTC+2, the optimization processes gradually resumed traffic. The engine then returned to its normal mode of operation.

To prevent any further incidents, the request has been excluded from optimizations and a fix on the optimization engine is being developed.

Action plan

Short term:

  • Fix the engine to optimize the responsible request without any crashes

Medium term:

  • Review the health check system at the engine level to automatically restart non-functional processes
Posted Oct 23, 2023 - 23:31 CEST

This incident has been resolved at 18h25 (Paris time). A post mortem will follow.
Posted Oct 20, 2023 - 09:22 CEST
We're monitoring the results but everything's fine. Seems to be related to a schema change in a storage component (to be confirmed after the RCA).
Posted Oct 19, 2023 - 18:49 CEST
We have mitigated the issue. Performance is back to normal. Still investigating for the root cause.
Posted Oct 19, 2023 - 18:35 CEST
We currently have some issues on our european infrastructure. Being fixed. Slight impact on acceleration. Some pages can have some slowdowns. Some optimizations are disabled.
Posted Oct 19, 2023 - 18:04 CEST
This incident affected: Acceleration.