On Monday 28/06/2021 around 14:40 UTC we experienced an issue with our central managed service cache hosted on our cloud provider. The issue came from a Host patching of our node, resulting in unhealthy nodes and the inability to access user refresh tokens. The managed service cache resumed its healthy status around 15:02 UTC.
The incident impacted any users attempting to log into Frontegg's portal or refreshing a user token within their application.
In an attempt to quickly resolve the issue, we initiated our process of failing over to a secondary managed service instance. In parallel, we contacted our cloud provider in regards to the active managed service instance. Before the failover completed the initial managed service instance returned to a healthy status.
We have been in touch with our cloud provider and also further scaled up our central managed service cache to ensure multiple nodes are available at all times to prevent the issue from happening again.
Additionally, we are implementing hot failover mechanisms to maintain the uptime of the platform.