With physical servers and VMs scattered all over the world and across different providers, we need some way to monitor them and make sure they're behaving as they should. We also want to receive notifications when they aren't and see a historical overview of what was going on around that time so we can troubleshoot as necessary. To that end, a combination of Wikipedia>Prometheus (software) and Wikipedia>Grafana will fit perfectly. Prometheus will act as the backend, collecting and aggregating data from literally everything (applications, VMs, and physical hosts), while Grafana will take that data and present it in a useful manner through a highly configurable and extensible dashboard. Grafana also has a built-in granular alert system to fill that need as well.

The above was written a while ago. I've since decided that I don't really need a full-blown enterprise-grade monitoring stack. My services are small and I'm one person, so all I realistically need to know is whether some service is up or down. I've been using uptime-kuma for quite a while now and it's been a very pleasant experience; I have enough experience administering these services that I can quickly identify what the issue is and bring it back up within a reasonable period of time. I highly recommend it :)