From Reactive to Proactive: How Monitoring by MAGNAPing May Improve Reputation

Outages affect not only consumers of IT services. They also affect providers. Every outage leaves a bad taste and deteriorates the provider’s reputation, to a degree. They add up, the more the longer the delay between the moment when an outage is discovered and when business-as-usual is restored.

Over the past decade or so, since the emergence of monitoring and alerting software suites, I have observed, time and time again, that many organizations either do not use them at all or that their coverage is insufficient. One of the reasons is the deployment model of many such suites. Indeed, many are installed in one, central location, sometimes outside of the system; therefore, their monitoring coverage does not equal the map of the information flows in the IT infrastructure being monitored.

On the contrary, MAGNAPing (TM) can be installed right where it matters: at each of the endpoints of every connection, at the source and destination of every communication channel, and at every data consumer and producer. The end result? Increased awareness and transparency, and reduced time to response and resolution.

Example Scenarios

1. RESTful Web Service

A RESTful web service available over HTTPS is consumed by a service application. There is a firewall between these two hosts. MAGNAPing, installed on the consumer service’s host, monitors the RESTful web service at regular intervals that may be adjusted from as short as 1 minute. No later than that interval since when the service goes down, support will be notified of the incident, and it will proceed to investigate and restore BAU. Root causes may include, for example, firewall misconfiguration, physical network failure, server freeze, or VM crash. This consumes one MAGNAPing license.

2. Database Server

The RESTful web service from #1 publishes information from a database server that runs on another host. There is also a firewall between the two. Another instance of MAGNAPing is installed on the RESTful web service’s host, and it monitors the database server. Within minutes of the database connection failure, support is notified. Possible root causes may include DB server misconfiguration or crash, firewall misconfiguration, and other causes including disk full. For the latter, another MAGNAPing instance may be needed. This time it may be installed on the database server itself, but it may also reside anywhere whence the DB server drive space can be monitored. Being very flexible, MAGNAPing helps IT administrators design its deployment in the most convenient way. This consumes one or two MAGNAPing licenses.

3. Service Application

The service application from #1 may be impacted by factors such as drive space, CPU load, and memory utilization, on its own host. If a MAGNAPing instance is installed on it, all these parameters can also be monitored, as well as the status of the service itself. This consumes one more MAGNAPing license.

Conclusion

As I have demonstrated above, the 3 or 4 highly configurable instances of MAGNAPing cover the entire infrastructure and create transparency about its state at all times, from the perspective of each of its constituent components. With its low cost and tiny footprint, MAGNAPing proactively alerts the service owner, before its consumers do. This not only helps to minimize the reputational risk, but it also reduces resource cost of investigation and communication, in the process of restoring BAU. How? Because MAGNAPing provides detailed information about the sources and properties of every failure, including IP/FQDN addresses, times, drive letters/UNC paths, etc. It takes the guesswork out of the recovery process. Imagine that you no longer have to ask your consumers whether their systems are up and running or not, or that you know precisely which of them are up or down!

IT support and operation staff no longer have to log in to many hosts, only to see if they are up and running. They can focus on bringing up the components known to have failed, before consumers even become aware of the outage. For IT organizations, MAGNAPing means reputation saved and resources conserved. At times, the outage may not even be the responsibility of the service provider. MAGNAPing can cover this scenario as well, by alerting the respective 3d party providers.

Is MAGNAPing a magic bullet? It is not, just like no IT solution ever is. MAGNAPing is only as reliable as its communication channel for alerts is. But most of the time, such communication channels are redundant, unlike the IT components being monitored. Overall, MAGNAPing helps its users keep abreast of the state of their infrastructure, most of the time, and this is a huge win over other types of monitoring solutions, needless to say, over no monitoring.