I had a production failure of Hangfire and ther server remained halted for 1.5 day until a customer reported the issue.
I am using Sentry to report issues immediately whenever something goes wrong. But how can I catch this Hangifre issue where hangfire servers stopped randomly on production? If it happens once, it can happen again!
4 Answer(s)
-
0
Hi @ajayak
remained halted for 1.5 day
What happens when you make a request to your webapp when server halted ? Maybe you can request to an endpoint periodically to check your website's health.
-
0
Hi @ismcagdas,
the webapp was running fine. Only issue with the hangfire server that was shutdown. 0 servers and 1100+ queued background jobs
-
1
Hi @ajayak
It seems like something related to HangFire. What I can think of is, you can create a dummy job which can make a request to an external endpoint to inform that it is healty. You can see if Hangfire server works or not in that way. Maybe Hangfire already has such a feature, you can check its documentation.
-
1
you can use the new health check features in ANZ 7.1.
The Health Check does have support for hangfire.
e.g. maxium failed job count, minimum available server
see more options at https://github.com/Xabaril/AspNetCore.Diagnostics.HealthChecks/blob/master/src/HealthChecks.Hangfire/HangfireHealthCheck.cs