Microsoft Incident - Azure Monitor - Mitigated - Azure Monitor experiencing delayed metrics and alerts

Incident Report for Graphisoft

Update

What happened?


Between 03:47 UTC and 04:50 UTC 12 September 2025, a platform issue resulted in an impact to the Azure Monitor service. Impacted customers experienced delays in viewing the most recent platform metrics, delayed or missed alert activations, and incorrect auto-scale rule evaluations based on impacted metrics.




What do we know so far?


We determined that a backend processing delay in the Azure Monitor pipeline caused a lag in metric ingestion and alert evaluation. This delay was triggered by multiple service nodes entering a degraded state. This resulted in publication delays for the subset of metrics being processed by the impacted nodes, which triggered an automated pause in alert evaluation to prevent false alerts.




What do we know so far?


  • 03:47 UTC on 12 September 2025– Customer impact began due to a platform issue affecting Azure Monitor.
  • 04:04 UTC on 12 September 2025– Service monitoring detected delayed metric ingestion and alerting anomalies.
  • 04:45 UTC on 12 September 2025– A mitigation workstream was initiated. The root cause was traced to multiple service nodes entering a degraded state, which triggered delays in metric publication and paused alert evaluations to prevent false alerts. Alerting-related impact was mitigated, though some customers may have continued to see delayed data.
  • 04:50 UTC on 12 September 2025– The mitigation involved taking affected nodes out of rotation. Service was fully restored. 

What happens next?


  • Our team will be completing an internal retrospective to understand the incident in more detail. Once that is completed, generally within 14 days, we will publish a Post Incident Review (PIR) to all impacted customers.
  • To get notified if a PIR is published, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts
  • For more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRs
  • The impact times above represent the full incident duration, so are not specific to any individual customer. Actual impact to service availability may vary between customers and resources – for guidance on implementing monitoring to understand granular impact: https://aka.ms/AzPIR/Monitoring
  • Finally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness

Posted Sep 12, 2025 - 08:38 CEST

Update

Impact Statement: Starting at 03:47 UTC on 12 September 2025, you may have been identified as a customer using Azure Monitor who may experience a delay in seeing most recent data for Azure Monitor platform metrics. You may also observe delayed or missed alert activation and incorrect auto-scale rule evaluation on impacted metrics as a result of this delay.


This impact is now mitigated; more information will be provided shortly.









Posted Sep 12, 2025 - 08:06 CEST

Investigating

Impact Statement: Starting at 03:47 UTC on 12 September 2025, you may have been identified as a customer using Azure Monitor who may experience a delay in seeing most recent data for Azure Monitor platform metrics. You may also observe delayed or missed alert activation and incorrect auto-scale rule evaluation on impacted metrics as a result of this delay.




Current Status: We detected this issue via our automated service monitoring upon identifying an outage impacting Azure Monitor metrics queries. We are aware of this issue and are actively investigating the underlying cause. The next update will be provided within 60 minutes, or as events warrant.

Posted Sep 12, 2025 - 07:34 CEST