Microsoft Incident - App Service - PIR - App Service impacting service

Incident Report for Graphisoft

Update

What happened?


Between 03:00 UTC on 19 August 2025 and 10:00 UTC on 9 September 2025, customers using the Azure App Service in Public Azure may have experienced intermittent service disruptions due to a platform issue. This resulted in intermittent 502 or 503 HTTP status codes or elevated latency affecting application workloads.




What went wrong, and why?


A recent Azure App Service release introduced a bug in an application responsible for customer app startup. This bug prevented automatic retries for certain requests needed to retrieve site startup content. The issue remained undetected until an Azure OS rollout caused some infrastructure VMs to be only partially provisioned. As a result, the application failed to retry startup requests directed to these affected VMs. Other applications successfully retried requests automatically, avoiding impact, which contributed to a delay in identifying the cause.




How did we respond?


  • 03:00 UTC on 19 August 2025 - Initial customer impact began. Short-lived (< 10 minute) spike
  • 21:00 UTC on 8 September 2025 - Azure OS rollout reached its final set of regions, resulting in significantly greater impact due to the previously mentioned VM provisioning failures.
  • 01:30 UTC on 9 September 2025 - Automated monitoring tools detected degraded availability affecting the partner team.
  • 03:00 UTC on 9 September 2025 - Mitigation efforts began for VMs that failed to provision. While most were successfully resolved through automation, a subset of permanently stuck VMs required manual intervention.
  • 10:00 UTC on 09 September 2025 - Availability to final underlying VM(s) restored, and customer impact mitigated.



What happens next?


  • This Mitigation Statement is the final communication for this incident. For details about which Azure incidents qualify for which Post Incident Reviews (PIRs), refer to https://aka.ms/AzurePIRs
  • The impact times above represent the full incident duration, so are not specific to any individual customer. Actual impact to service availability may vary between customers and resources – for guidance on implementing monitoring to understand granular impact: https://aka.ms/AzPIR/Monitoring
  • To stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts
  • For broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness

Posted Sep 12, 2025 - 19:46 CEST

Investigating

What happened?


Between 03:00 UTC on 19 August 2025 and 10:00 UTC on 9 September 2025, customers using the Windows Azure App Service in Public Azure may have experienced intermittent service disruptions due to a platform issue. This resulted in occasional 502 or 503 HTTP status codes or elevated latency affecting application workloads.


This issue is now mitigated. An update with more information will be provided shortly.

Posted Sep 12, 2025 - 19:19 CEST