Prerequisites
Please answer the following questions before submitting an issue. YOU MAY DELETE THE PREREQUISITES SECTION.
- What is your product version? v9.3.0, upgrading from v6.9.1
- What is your product type (Angular or MVC)? Angular
- What is product framework type (.net framework or .net core)? .net core
If issue related with ABP Framework
- What is ABP Framework version? v5.14.0
I have a product that has been working on a dotnet core upgrade (2.2 to 3.1) since April 2021 (it's taken so long due to other feature development work, and not due to ANZ) Since August 2021, we have been running our /develop build of our product on ANZ v9.3.0 in our dev environment and it's ran great. In December 2021, we promoted our /develop build to our UAT environment, and again it ran great. Last weekend, we tried promoting to our PROD environment, and the system completely tanked.
We deploy on Azure AppServices, and we use TerraForm to govern all platform infrastructure configuration, so we know UAT and PROD were configured the same. We also deploy using Docker containers, so we know the code was the same. The only identfiable difference would be data, and we are looking at possible anomalies in PROD data that could have triggered this.
Our resource monitoring in Azure shows healthy performance for memory & cpu resources in our Azure AppServices, as well as CPU and Data I/O in our SQL resources. On our Azure AppService resource monitoring, we did not see hig request queuing or any other measurable metric available in Azure's metrics that could have identified a bottleneck or problem.
The issue we observed is extreme latency. For example the "AbpUserConfiguration/GetAll" endpoint took minutes to execute. Comparing to the current version of our product, it completes sub 200ms in PROD. The extreme latency seemed to affect all API endpoint requests.
As I didn't see any performance issues on our SQL server, I thought perhaps it was a caching issue in SettingsManager, but I confirmed that we were not configured to run Redis or any other distributed cache, so all caching would have been held in memory.
My other thought was could it have been a bottleneck somewhere in the request pipeline, perhaps in validating the JWT Token, but we observed the same latency issues hitting anonymous endpoints from Postman or other browsers in incognito mode, where no Authentication header would have been present in the request.
Lastly, I scanned our code to confirm that we aren't doing anything with sync locks or any other type of optimistic locking.
I realize that this is a broad post, and the ASNZ team isn't responsible for my Azure infrastructure.
I'm curious if anyone else has experienced anything like this? and if so, what your experiences were and possibly what other things we should examine or troubleshoot.
Thanks to the ASNZ community!
-Brian