What Happened?
Between 10:01 UTC on 28 October 2024 and 18:07 UTC on 01 November 2024, a subset of customers using App Service may have experienced erroneous 404 failure notifications for the Microsoft.Web/sites/workflows API from alerts and recorded into their logs. This could have potentially impacted all of App Services customers, in all Azure regions.
What went wrong and why?
The impact was on the 'App Service (Web Apps)' service and was caused by changes in a back-end service; 'Microsoft Defender for Cloud' (MDC). MDC secures App Services related resources as part of its protection suite by periodically scanning customers' environment using Azure APIs, to detect potential issues in their environment. Previously, MDC only scanned the Microsoft.Web/sites API endpoints; now, MDC is advancing its protections to include internal endpoints. This change blocked Web Apps causing the 404 responses, that in turn appeared in customers' environment logs.
How did we respond?
This incident was reported by our customers, and App Services team began their investigation of the manner to pin-point the cause. Once MDC was detected as the cause of the issue, the MDC team was brought in to support mitigation. The teams decided to throttle and block MDC’s request to minimize impact on Microsoft.Web/sites/workflows API endpoint. This prevented customers from experiencing 404 errors mitigating the problem. The issue was resolved only after the MDC team identified the exact cause of the issue and decided to roll back the change, stopping the endpoint scan altogether.
How are we making incidents like this less likely or less impactful?
How can customers make incidents like this less impactful?
How can we make our incident communications more useful?
You can rate this PIR and provide any feedback using our quick 3-question survey.
What happened?
Between 10:01 UTC on 28 October 2024 and 18:07 UTC on 01 November 2024, a subset of customers using App Service may have experienced erroneous 404 failure notifications for the Microsoft.Web/sites/workflows API from alerts and recorded into their logs.
What we know so far?
We identified a previous change to a backend service which caused backend operations to be called to apps incorrectly.
How did we respond?
We were alerted to this issue via customer reports and responded to investigate. We applied steps to limit the erroneous failures to alleviate additional erroneous alerts and logging of these. Additionally, we’ve taken steps to revert the previous change.
What happens next?
Impact Statement: Starting at 10:01 UTC on 28 October 2024, a subset of customers using App Service may have experienced erroneous 404 failure notifications for the Microsoft.Web/sites/workflows API from alerts and recorded into their logs.
Current Status: We have applied steps to limit the erroneous failures to alleviate additional erroneous alerts and logging of these. Additionally, we’re taking steps to revert a previous change which caused backend operations to be called to apps incorrectly. We’ll provide an update within the next 2 hours or as events warrant.