Linda Bissum

Quick Index


Case Study: Remote Data Center

The Business Challenge

The business in this case study had a small Linux server farm implemented at a remote co-location facility in an ad-hoc and disorganized manner. Because of this, in spite of a lot on-going maintenance, the servers were often failing, resulting in high unscheduled down time. The Linux system administrator spend about 100 hours a month keeping it running, with a major part of this time spend in transit between co-location facility and the corporate head quarter. Because of the many failures, the business experienced large maintenance cost and lack of confidence in its service offerings from its partners.

The Action

In order to address this issue, the company assigned me as system infrastructure architect and project leader. The project scope required me to address poor uptime performance and the high maintenance cost. I was also required to design an infrastructure that could grow with the business demands without undue bottlenecks.

My team and I created an infrastructure design of the datacenter to achieve the following:

  • Achieve at least 98% uptime

  • Full access of all system from the corporate office to all system

  • Ability to perform all software operations remotely

  • Only one weekly trip to co-location, primarily to rotate backup tapes in tape library, but also perform any other necessary hardware maintenance

  • Monitoring of all equipment with pager alerts for all failures

  • Easily expandable infrastructure

The Result

After the project was completed, we saw a better 99% uptime during the next ten month. At that time, we expanded the infrastructure to provide high availability which gave us a better than 99.99 % uptime. After the project was completed, we performed all hardware repairs and updates as part of the weekly tape rotation trip. Similarly, we did all software maintenance remotely, including both application and operating system installation. That included installation of operating system and applications, all server and application restarts, and all server warm and cold reboots.

Because of this project, the system uptime was increased, and the business partners confidence issues were resolved. The reliable infrastructure and effective monitoring system reduced the work keeping the systems running to routine maintenance. The system administration staff running the installation was therefore able to take on other duties, resulting in a more effective use of their time.