Posted on March 15th, 2012
Sometimes Overseer customers will Email me and tell me that Overseer notifies them that their server is down, yet they login and everything is fine. This is something that will occur from time to time– Overseer detects a failure, but this may be caused by a network hiccup(particularly over WAN connections), or occasionally an OS issue. The first thing to do, is obviously investigate by looking at event logs, do some basic network tests(ping [host] -t to look for packet loss), etc.
However, even if something is found, it’s possible that there’s nothing you can do about it right away– but you really don’t want to be bothered by Overseer. Thankfully, there’s a feature in Overseer created just for this purpose. Simply edit the schedule for the resources in question and edit the ‘After resource has been down’ setting, as shown in this image:
Now, Overseer will wait 15 minutes before sending the first notification. So, when Overseer is checking resources using this schedule, if it fails it will see it’s been down 0 minutes, and wait. 5 minutes later, it will check again– if it’s still down, it will be down 5 minutes– which is also less than 15. It will wait until it hits that 15 minute mark(as configured above), and once it does, it will notify the administrators.