Microsoft Incident - Azure Cosmos DB - Mitigated - Performance Degradation for multiple Azure services in North Europe

Incident Report for Graphisoft

Update

What happened?


Between 08:51 and 10:15 UTC on 01 April 2025, we identified customer impact resulting from a power event in the North Europe region which impacted Microsoft Entra ID, Virtual Machines, Virtual Machine Scale Sets, Storage, Azure Cosmos DB, Azure Database for PostgreSQL flexible servers, Azure ExpressRoute, Azure Site Recovery, Service Bus, Azure Cache for Redis, Azure SQL Database, Azure Site Recovery, Application Gateway, and Azure NetApp Files. We can confirm that all affected services have now recovered. 


 


What do we know so far?


During a power maintenance event, a failure on a UPS system led to temporary power loss in a single Data Center in Physical Availability Zone 2 in the North Europe region affecting multiple devices. The power has now been fully restored and all affected services have recovered.




How did we respond?


  • 08:51 UTC on 1 April 2025 – Customer impact identified from an ongoing power maintenance event.
  • 09:05 UTC on 1 April 2025 – Power was restored to affected devices.
  • 09:20 UTC on 1 April 2025 – Outage declared and customers notified via Azure Portal. Affected dependent services identified.
  • 09:40 UTC on 1 April 2025 – Dependent services report recovery.
  • 10:15 UTC on 1 April 2025 - Full mitigation confirmed.

 


What happens next?


  • Our team will be completing an internal retrospective to understand the incident in more detail. Once that is completed, generally within 14 days, we will publish a Post Incident Review (PIR) to all impacted customers.
  • To get notified when that happens, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts.
  • For more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRs.
  • The impact times above represent the full incident duration, so are not specific to any individual customer. Actual impact to service availability may vary between customers and resources – for guidance on implementing monitoring to understand granular impact: https://aka.ms/AzPIR/Monitoring.
  • Finally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness .

Posted Apr 01, 2025 - 14:48 CEST

Update

Summary of Impact: Between 08:51 and 10:15 UTC on 01 April 2025, we identified customer impact resulting from a power event in the North Europe region which impacted Virtual Machines, Storage, CosmosDB, PostgreSQL, Azure ExpressRoute and Azure NetApp Files. We can confirm that all affected services have now recovered. 




A power maintenance event led to temporary power loss in a single datacenter, in Physical Availability Zone 2, in the North Europe region affecting multiple racks and devices. The power has been fully restored and services are seeing full recovery.




An update with additional information will be provided shortly.

Posted Apr 01, 2025 - 13:01 CEST

Update

Summary of Impact: Between 08:51 and 10:15 UTC on 01 April 2025, we identified customer impact resulting from a power event in the North Europe region which impacted Virtual Machines, Storage, CosmosDB, PostgreSQL, Azure ExpressRoute and Azure NetApp Files. We can confirm that all affected services have now recovered. 




A power maintenance event led to temporary power loss in a single datacenter, in Physical Availability Zone 2, in the North Europe region affecting multiple racks and devices. The power has been fully restored and services are seeing full recovery.




An update with additional information will be provided shortly.

Posted Apr 01, 2025 - 13:00 CEST

Investigating

Impact Statement: Starting approximately at 08:51 UTC on 01 April 2025, we received an alert of an issue impacting multiple Azure services across the North Europe region. 




Current Status: All relevant teams are currently looking into this alert and are actively working on identifying any workstreams needed to mitigate all customer impact. The next update will be provided within 60 minutes, or as events warrant.

Posted Apr 01, 2025 - 12:12 CEST