For Organizations with huge data centers having a lot of servers hosting numerous applications, it is always a major problem to monitor if each of the servers is up and functional all the time. The problem is acuter during late night shifts when the usual number of network/systems engineers working is less.
Usually, when organizations host the applications on their servers on behalf of their clients, they sign-up a service level agreement (SLA), specifying the allowed downtime for each of the applications. Any lack of commitment on the part of the organizations in meeting the SLA could result in loss of business or legal action or both.
So, it becomes very important for the organizations to know if a server is down or non-functional and take corrective action immediately. Unfortunately, for some less time-critical applications, it is usually the client who informs that there is a problem with the server when he/she tries to log in to the application Organizations would be very interested in knowing about these server failures immediately and take corrective action before the client starts complaining.
There is a need for a web-based application which can capture all the organization and data center details and remotely check if each of servers is up and running all the time. This monitoring piece of the application keeps pinging each of the servers at the specific intervals and based on the rules setup and response received it sends out SMS to a predefined list of specialists whenever there is a failure. This SMS will also contain the information related to the server that has failed and also the time at which it had failed.
See More Reports:
- Access Management process to provide Admin privileges to selected personnel
- Organization and Servers information capturing Process
- Server Monitoring Criteria setup Process
- Automated Server Monitoring Process
- Server Monitoring status and failure logging process
- Failure Notification SMS process
- Corrective Action Completion Process
- Monthly Management Reporting Process
- Historical data archiving and cleanup process
Organization Information management Module: Allows Admin users to capture and update Organization information related to Users, Specialists, Servers, IP addresses etc
- Access management Module: Allows Admin users to give admin privileges to other users as well as managing the userid/passwords of all the Network/System engineers
- Automated Server Monitoring Module: Runs continuously to check if each of the servers is up and running and logs failures into a database.
- SMS Failure Notification Module: Sends out SMS to the specified list of mobile numbers along with the failed server information.
- Corrective Action Module: Allows Network/Systems engineers to put in the corrective action they have taken to rectify the failure.
- Management Reporting: Allows Admin users to run reports based on Organization, Servers, Specialists and Corrective action taken.
- Archiving and cleanup Module: Allows Admin users to Archive/Cleanup old data on the system