Service Outage

Incident Report for Frontegg

Postmortem

Root Cause Analysis (RCA): DDOS Attack Incident

Incident Summary

On May 23, 2025, between 16:53 to 17:16 UTC, our service in the Europe region experienced a temporary outage due to a sophisticated DDOS attack. Despite mitigation efforts by Cloudflare, the scale and speed of the attack overwhelmed our system's autoscaling capabilities, leading to service unavailability for a short period.

Timeline of Events:

  • 16:53 UTC: DDOS attack begins.
  • 16:54 UTC: Monitoring system alerts the on-call team.
  • 17:03 UTC: On-call team identifies the DDOS attack.
  • 17:10 UTC: Attack characteristics scoped.
  • 17:15 UTC: Blocking and rate limit rules applied.
  • 17:16 UTC: Service recovers.

Root Cause

The attack's high volume and rapid escalation exceeded our system's ability to scale automatically in time, causing service disruption.

Incident Resolution & Next Steps:

To resolve the incident, we took the following actions:

  • We successfully blocked malicious traffic and hardened our defenses.
  • Preventive measures are being implemented, including enhancing CDN, infrastructure autoscaling, automated tools to identify attacks faster, and DDOS protection in collaboration with Cloudflare.
Posted May 28, 2025 - 07:20 UTC

Resolved

This incident has been resolved.

Timeline of Events:
16:53 UTC: DDOS attack begins
16:54 UTC: Monitoring system detects service degradation and alerts the on-call team.
17:03 UTC: On-call team identifies that a DDOS attack is ongoing.
17:10 UTC: On-call team scopes the characteristics of the attack (volume, source IPs, and traffic patterns).
17:15 UTC: The on-call team applies blocking and rate limit rules on Cloudflare to mitigate the attack.
17:16 UTC: System recovers and service is restored.
Posted May 23, 2025 - 17:16 UTC