S&P Global Offerings
Featured Topics
Featured Products
Events
S&P Global Offerings
Featured Topics
Featured Products
Events
S&P Global Offerings
Featured Topics
Featured Products
Events
S&P Global Offerings
Featured Topics
Featured Products
Events
Corporations
Financial Institutions
Banking & Capital Markets
Economy & Finance
Energy Transition & Sustainability
Technology & Innovation
Podcasts & Newsletters
Corporations
Financial Institutions
Banking & Capital Markets
Economy & Finance
Energy Transition & Sustainability
Technology & Innovation
Podcasts & Newsletters
Research — 15 May, 2023
By Perkins Liu
After a decade of continuous investigation and testing, branded server vendors are increasingly incorporating liquid-cooling approaches into their products to meet the growing demand for more efficient and effective cooling in datacenters and high-density servers. Many of these vendors are partnering with liquid-cooling technology providers to develop and deploy more liquid-cooling-enabled server offerings that meet the specific needs of their customers.
The current rack-based datacenter architecture, in which air cooling is dominant, has taken decades to develop, involving the entire ecosystem: chip manufacturers, server vendors, technology vendors, infrastructure suppliers and end users. However, the rapid adoption of artificial intelligence, machine learning and other applications that require higher-density chips and servers has meant that air cooling may no longer be as suitable or efficient as liquid cooling. As the main body of computing that consumes a great amount of energy, the server is a key component in the transition from air to liquid cooling. The server vendors themselves will likely help determine the direction the liquid-cooling market takes and how quickly the technology spreads.
Context
The inside of a server typically has been cooled by using fans to blow air across the components. The resulting hot air is then expelled from the server and dealt with as part of a larger-scale datacenter cooling approach. However, the expanding use of high-powered graphics processing unit chips has put pressure on typical cooling systems since the volume of air that can be blown across these chips may not be enough to cool them. This is also increasingly a problem even for regular CPU chips as their power consumption rises. A multi-core server now can easily consume 350-500 W, sometimes even approaching 700 W. This is not only a challenge within the server, but it also becomes difficult to cool at scale as overall rack density goes beyond 20-30 kW per rack and reaches 50 kW or more.
Liquid, which is more efficient than air at transferring heat, is starting to be used closer to where the heat is produced to improve thermal management. There are several ways of doing this, such as direct-to-chip liquid cooling (DLC) and full immersion of the IT components in liquid — both of which can require changes to the server hardware itself. There are also rear-door heat exchangers that bring liquid to the back of the server rack, although the inside of the server is still air-cooled, and the server design does not require modification — although there could still be thermal challenges if the chips inside the server cannot be cooled by airflow alone.
As the energy use of chips increases and a larger number of servers are required to work together in close proximity, cooling becomes a key concern for chip manufacturers, server manufacturers, and hardware providers in general. Thus, we are seeing branded server manufacturers beginning to test cooling options in order to offer recommendations or even set up partnerships so that cooling does not become a limiting factor for their customers.
Lenovo
One of Lenovo Group Ltd.'s earliest projects involving liquid cooling with customers was SuperMUC-NG in 2018. The SuperMUC-NG project was a collaboration between Lenovo and the Leibniz Supercomputing Centre (LRZ) in Germany, a leading research center for high-performance computing (HPC). The project aimed to develop a liquid-cooling solution for the SuperMUC-NG supercomputer at LRZ, one of the most powerful supercomputers in Europe. The solution was a warm-water cooling approach, which circulated water at a temperature of up to 45 degrees C (113 degrees F) through a closed-loop system to remove heat from the servers. The warm water was then transferred to a heat exchanger, where the heat was dissipated into the surrounding air. The SuperMUC-NG project successfully demonstrated the energy efficiency and performance benefits of liquid cooling for HPC, and it helped to pave the way for Lenovo's continued investment in liquid-cooling approaches for its products and customers.
Since then, Lenovo has developed its Neptune liquid-cooling offering, including direct-to-node (DTN), rear-door heat exchanger (RDHX) and thermal transfer module (TTM) options. DTN is essentially the same as direct-to-chip liquid cooling, using a warm-water cooling system to remove heat from the processors and memory modules. The warm water is circulated through a closed-loop system, absorbing the heat from the components before being cooled and recirculated. RDHX technically brings liquid to only the rear door of a rack, not the individual server, while TTM integrates a hermetically sealed liquid-filled heat pipe inside a traditional heat sink.
DTN is used in Lenovo ThinkSystem SD650 V2 and SD650-N V2. SD650 V2 is Lenovo's two-node dual-socket server offering. For density purposes, this places four CPUs in 1U of rack space on a single tray. The SD650-N V2, instead of being a two-node tray, is a single node with the second node's space occupied by NVIDIA A100 SXM4 GPUs. The liquid coolant is pumped directly into the server and circulated through a closed-loop system, absorbing the heat from the components before being cooled and recirculated.
TTM is used in Lenovo ThinkSystem SD530. This server uses liquid in the heat sink to transfer heat away from the CPU to an area with more space to disperse heat, allowing for higher-wattage CPUs with more cores/computational power (205 W), while keeping fan speeds modest.
Dell
Dell Technologies Inc.'s first trial of liquid cooling was also HPC-related, the Triton project in 2015. The Triton project was a collaboration between Dell and the Texas Advanced Computing Center (TACC), which is a research center focusing on HPC. The project aimed to develop a liquid-cooling solution for the Stampede supercomputer at TACC, one of the fastest in the world. Dell custom-designed a liquid-cooled rack approach to circulate water at a temperature of up to 45 degrees C (113 degrees F) through a closed-loop system to remove heat from the servers.
Since then, Dell has developed several liquid-cooling platforms with DLC technology, as well as rack-level infrastructure products to support these servers. The company works with CoolIT Systems Inc. as its prime partner for the DLC solution, among others. DLC uses the natural thermal conductivity of liquid to provide dense, concentrated cooling. The Dell EMC PowerEdge C6520 and C6525 servers leverage rack DLC to replace typical metal heat sinks with closed‑loop water flow, enabling the servers to support higher-wattage processors and higher rack densities for more performance with lower power use.
Dell also offers RDHX solutions through original equipment manufacturers with Motivair Corp. The company is working with Green Revolution Cooling Inc. on its all-in-one immersion cooling server system as well.
Hewlett Packard Enterprise
One of the earliest projects in which Hewlett Packard Enterprise Co. explored liquid-cooling solutions with customers was the Greenlake Liquid Cooling Co-Design Project in 2018. This project was a collaboration between HPE and the National Renewable Energy Laboratory (NREL) in the US, which is a leader in research and development for renewable energy and energy efficiency. The Greenlake project aimed to develop and test a liquid-cooling solution for HPE servers that would improve energy efficiency and reduce the environmental impact of datacenters. The project involved co-designing the liquid-cooling system with NREL, testing it in a lab environment and then deploying it in a production datacenter at NREL's facility in Colorado.
In its Apollo HPC family, the company enables Apollo DLC System on HPE 20 system, HPE Apollo 2000 Gen10 Plus system and the HPE Cray EX supercomputer, while HPE Apollo 6500 Gen10 Plus is under development. HPE's Apollo 6000 system with warm-water cooling uses warm water at a temperature of up to 45 degrees C (113 degrees F) to cool the servers. The warm water is pumped through a closed-loop system, which removes the heat from the servers and transfers it to a heat exchanger.
Obstacles and outlook
Liquid cooling has been used in the datacenter space for over a decade but is still largely seen in HPC-related deployments. It has yet to surpass air cooling in scale. While many liquid-cooling technology vendors are pushing hard to break through, most server equipment, datacenter infrastructure and practitioners such as designers, contractors, and technicians have been using air-cooled systems for decades. As liquid cooling moves from a fringe technology toward the mainstream, the industry will need to address practical obstacles as well as provide education and training.
First, there is no one type of liquid-cooling technology that solves all problems. Some require specialized infrastructure or different skill sets to manage, and a variety of options may be needed to meet diverse local regulatory, environmental, and safety requirements. The lack of open standards for server form, connecting interface, rack or tank mechanism, facility infrastructure, and related regulations are all barriers to be addressed.
Second, liquid-cooled options are not widely available in server vendors' off-the-shelf portfolios. Most deployments require custom design or adaptation, which can add cost or affect the server warranty. However, server vendors have started to standardize liquid-cooled server options, initially for HPC customers but these are increasingly available on the market.
Third, a certain amount of investment and decision-making will be required to ease adoption among those used to the dominant air-cooled datacenter architecture in order to realize the total cost of ownership and environmental, social, and governance benefits that liquid cooling could bring.
It is not by accident that all three big-name server vendors picked DLC as the option to add to their standard HPC servers since DLC technology implies the least level of disruption to the existing rack-based architecture and supporting infrastructure. With the power density increase, this option can be expected to be available beyond HPC servers. While a typical immersion cooling approach requires a tank to house servers, companies are also developing rack-based chassis options. Both rack-based DLC and immersion cooling technology could be good choices for the retrofit of existing datacenters. Due to the large installed base as well as green builds out there and the variety of applications, load density, and infrastructure of each project, it is not a simple decision to choose which technology. Liquid cooling and air cooling will coexist for a long time yet.
This article was published by S&P Global Market Intelligence and not by S&P Global Ratings, which is a separately managed division of S&P Global.
451 Research is part of S&P Global Market Intelligence. For more about 451 Research, please contact 451ClientServices@spglobal.com.