30 April 2024
|
Sam Bainborough, Sales Director EMEA-Strategic Segment Colocation & Hyperscale at Vertiv, explains how the rise of AI is changing the face of data centre cooling and what areas need to be addressed to cope with growing demand.
There’s no denying that AI, mobile and cloud technologies dominate operations, and hybrid computing is the new norm. The surge in AI applications alone has created an immense demand for computing power, putting significant pressure on data centres to rapidly adapt to evolving needs. As a result, the importance of innovation in data centres has never been more apparent.
For organisations to successfully navigate this dynamic landscape, a holistic approach is required - not least because data centre architects are confronted with multiple challenges such as climate change, surging power requirements and heightened heat generation. Embracing a holistic design philosophy enables data centres to not only meet but thrive amidst the burgeoning demands of the AI-driven era. Prioritising sustainability and efficiency is even more paramount, if operators are to lay the groundwork for data centres to lead the charge in a time defined by growth and technological innovation.
Two key areas that need to be addressed are power, to cope with the demands of AI workloads, and thermal management, to ensure that the critical digital infrastructure operates as efficiently as possible.
Power
High Performance Computing is changing with the rise of AI, causing a significant increase in power demands, fuelled by the adoption of specialised processors essential for managing complex tasks. Typical data centre computer racks are expected to increase from 5 kW to 7 kW today (equivalent to the size of a small residential backup generator) to 50 kW or more in the not-too-distant future, according to Omdia’s 2022 Data Center Thermal Management Market Analysis report. Addressing this challenge demands that data centres adopt creative solutions that are capable of delivering increasing power, whilst employing energy-efficient methods.
The strategic emphasis on power efficiency goes beyond immediate operational needs; it aligns with the broader imperative of promoting sustainability in the face of escalating energy consumption. This includes expanding the use of alternative energy, smart grids, hybrid grids and innovative data centre designs to deliver reliable solutions for customers, while lessening the negative impacts on our planet in the process. By taking a forward-thinking stance on power efficiency, data centres can not only meet the challenges posed by burgeoning AI workloads but also contribute to a more environmentally conscious and sustainable future.
Thermal management and cooling
The second critical challenge stems from the heightened heat generated by the advancements in processor technology such as high performance Central Processing Units (CPUs) and Graphics Processing Units (GPUs) essential for handling intricate AI workloads. As supercomputers continue to shrink and become more power-dense, the data centre industry is constantly looking at how it can keep them cool, whilst concurrently tapping alternative power sources to support the increased energy demand.
Throughout the years, data centre designs have progressed from chilled water systems to indirect adiabatic systems. Today, there are three typical approaches to thermal management:
- Air cooling: This involves the use of rear-door heat exchangers in conjunction with air cooling. This solution has the ability to displace heat outside the servers.
- Immersion cooling: This involves submerging servers and other components in a thermally conductive dielectric liquid or fluid.
- Direct-to-chip liquid cooling: This process delivers cooling liquid through cold plates that lay atop the heat sources within the computers, drawing the heat away when the liquid circulates.
There is currently a resurgence of interest in chilled water systems with three distinct options for liquid cooling at the rack level. The first option involves directing liquid to the server itself, using a room-based heat exchanger to reject heat back into the air. This modular system allows seamless integration without substantial changes to existing infrastructure. The second option introduces a Cooling Distribution Unit (CDU), directly circulating liquid from the server or GPU, connecting to a chilled water system. The third option is an interchangeable liquid-to-gas system. This approach incorporates a remote condenser on the roof or building, utilising gas-to-liquid heat exchangers for deployment flexibility.
It is most likely that air-cooled and liquid-cooled solutions will co-exist and data centres will need to tightly orchestrate both to optimise the overall environment within the facility. Even within liquid-cooled servers, elements necessitating air cooling persist, highlighting the nuanced nature of the evolving thermal management landscape.
It’s also important to note that, alongside technical intricacies of cooling systems lies a crucial aspect of sustainability. Implementing efficient thermal management solutions not only enables optimal performance but also contributes to reducing the environmental footprint of data centres. By minimising energy consumption and maximising resource utilisation, innovative cooling practices play a pivotal role in mitigating the ecological impact of data centre operations.
A holistic approach to success
To ensure success in the realm of AI, it’s necessary to take a holistic approach to data centre architecture. It is good practice to involve all stakeholders, recognising the importance of collaboration and communication across diverse disciplines. Engaging not only power and cooling specialists but also those responsible for facility management, storage and technology deployment fosters a comprehensive understanding of the data centre’s complex requirements.
As data centres embrace denser configurations, the holistic approach extends to decision-making timelines. While operators may be inclined to defer decisions to the final stages of design, a balance must be struck to avoid risks associated with delayed investments and potential loss of market shares. Holistic design, therefore, involves streamlining decision-making processes while considering lead times and involving stakeholders at every stage.
In a dialogue with industry experts, the importance of technology interchangeability surfaces as a critical consideration for clients. In some areas we have seen a slowdown in direct deployments by hyperscalers, which may reflect a strategic pause to understand what technology changes and specifications are required. Challenges arise in finding the optimal operating conditions for CPUs and GPUs, with manufacturers defining specifications and clients striving to plan for a diverse technology landscape over the next five to ten years.
In this pursuit of future-ready design principles, clients encounter design pitfalls and challenges. The balance between CPU and GPU environments, coupled with defining optimal operating conditions, requires a meticulous approach to allow adaptability over an extended operational lifespan. As the industry grapples with these complexities, a holistic design ethos remains the compass guiding operators through the dynamic terrain of data centre evolution.
Looking ahead
AI is a fascinating technology that’s poised to change our world, but it’s impossible to predict exactly how it will evolve and what it will do. However, its potential is only as great as the world’s data centres’ capacity to support the computational intelligence it will require. The data centre industry must continue to evolve to provide the dynamic and innovative cooling and power solutions needed to support evolving data centre challenges and maximise AI’s true potential.
This proactive approach not only enables data centres to meet the burgeoning demands of AI applications but also positions them as catalysts for progress. By prioritising sustainability, data centres not only mitigate their environmental impact but also enhance their resilience in the face of evolving challenges such as climate change and resource scarcity. Moreover, by optimising efficiency through innovative practices and technologies, data centres can provide optimal performance while minimising energy consumption and operational costs.