Frontegg services are in degraded state

Incident Report for Frontegg

Postmortem

Executive summary:

On Monday 28/06/2021 around 14:40 UTC we experienced an issue with our central managed service cache hosted on our cloud provider. The issue came from a Host patching of our node, resulting in unhealthy nodes and the inability to access user refresh tokens. The managed service cache resumed its healthy status around 15:02 UTC.

Affect:

The incident impacted any users attempting to log into Frontegg's portal or refreshing a user token within their application.

Mitigation and resolution:

In an attempt to quickly resolve the issue, we initiated our process of failing over to a secondary managed service instance. In parallel, we contacted our cloud provider in regards to the active managed service instance. Before the failover completed the initial managed service instance returned to a healthy status.

Preventive steps:

We have been in touch with our cloud provider and also further scaled up our central managed service cache to ensure multiple nodes are available at all times to prevent the issue from happening again.

Additionally, we are implementing hot failover mechanisms to maintain the uptime of the platform.

Posted Jul 01, 2021 - 12:43 UTC

Resolved

The issue is closed, an RCA report will be published in the upcoming days.

Posted Jun 28, 2021 - 15:34 UTC

Update

We are continuing to monitor for any further issues.

Posted Jun 28, 2021 - 15:10 UTC

Monitoring

Services are back to operational state, our teams are closely monitoring the status

Posted Jun 28, 2021 - 15:07 UTC

Investigating

We are currently investigating this issue.

Posted Jun 28, 2021 - 14:47 UTC

This incident affected: User authentication.