Modern enterprise data centers are some of the most technically sophisticated business activities on earth. Ironically enough, they are also often bastions of inefficiency, with equipment utilization much below ten percent and 30 percent of the servers in those facilities being comatose(using electricity but performing no useful information services). The operators of these facilities also struggle to keep pace with rapid changes in deployments of computing equipment.
These problems have led to much attention being paid to improving data center management. While almost every enterprise data center has taken steps to improve its operations, virtually all are much less efficient, much more costly, and far less flexible than they could be. Those failings ultimately prevent data centers from delivering maximum business value to the companies that own them.
Well-managed data centers use what I call the three pillars of modern data center operations: tracking, procedures, and physical principles.
Running a data center requires accurate real-time measurements of temperature, humidity, and airflow, as well as detailed inventories of equipment characteristics, vintage, and performance. Most Data Center Infrastructure Management (DCIM) tools deliver this information using sensors spread throughout each facility. DCIM software often requires customization to be most effective in any particular application, but it has become much more sophisticated over time.
The most advanced facilities use radiofrequency identification (RFID) technology to “tag” each piece of IT equipment, physical tags that tie each server to a particular spot on the rack, and “over the network” tracking of equipment status. Whenever equipment is moved, RFID readers help update equipment status in the central tracking database, and when equipment conditions change, the devices update the central database with that new information over the network.
DCIM software is like the dashboard of a car, which gives information on vehicle speed and engine temperature in real time. Many data center managers mistakenly think that once they have a DCIM tool, they have all they need to manage their facilities, but nothing could be further from the truth. Such tools are necessary (because they offer a detailed picture of the current status of the data center) but they are not sufficient.
Because the equipment in data centers is constantly changing, sometimes in unpredictable ways, well-managed facilities need well-defined and empirically grounded procedures for design, deployment, maintenance, and decommissioning of computing and infrastructure equipment. That means procedures based on best practices as defined by Lawrence Berkeley National Laboratory, The Green Grid, Open Compute Project, ITI, the TBM Council, and others.
RFID tracking tools and over-the-network data collection (as described above) make procedures for accurate inventory tracking much easier to implement. DCIM sensor measurements can make it easier to define operational procedures for the data center. In the most interesting case, sensor data can be combined with machine learning algorithms to automate some data center operations, thus simplifying the design of human procedures.
Such procedures are like simple rules that drivers use to maintain safety, like “if you see a pedestrian, slow down” or “turn into the skid.”
The last of the three pillars involves applying knowledge of the physical laws, engineering designs, and technological constraints affecting reliable delivery of power and cooling in the facility. Because data centers are constantly changing, and because of the complexity of air and heat flows, it is essential to apply engineering simulation tools to both data center design and operations. That means taking the information from tracking tools and incorporating it into software that simulates airflow, power distribution, and heat transfer.
The best of these tools rely in part on sophisticated Computational Fluid Dynamics software, extensive libraries of the power and airflow characteristics of thousands of different kinds of IT equipment, and visual analyzers to simply and accurately predict the effects of changes in IT deployments. These computer models need to be calibrated with real measurements from a data center to ensure they accurately characterize the facility’s operations, but once calibrated, they can be used to predict the effects of changes in IT equipment configurations on airflow, temperature, efficiency, reliability, available capacity, and cost without having to actually move or install that equipment.
When properly used, such engineering simulation tools are like the headlights of a car, showing clearly what’s on the road ahead. They show the costs and risks of operational plans and are just as important as careful tracking and appropriate procedures for proper management of data center operations.
The three pillars, taken together, constitute the most reliable means of delivering business value from the data center. No modern data center manager should be without them.