Wednesday, 18 December 2024 | Login
IBM's world-class Summit supercomputer gooses speed with AI abilities

IBM's world-class Summit supercomputer gooses speed with AI abilities Featured

Making a world-class supercomputer with thousands of processors and the power appetite of a small town isn't as easy as it used to be.

But IBM's latest machine, called Summit, is a big step forward for those who need gargantuan computing power. Unveiled Friday at Oak Ridge National Laboratory in Tennessee, it's a top contender for fastest of the fast, a crown that will be bestowed later this month by organizers of the Top500 supercomputer list. For now, though, Big Blue stops a little short of calling it the world's fastest machine by that measurement.

Instead, Summit is "the most powerful, smartest supercomputer in the world," said Dave Turek, vice president of high-performance computing and cognitive systems at IBM. It'll crunch through something like 200 quadrillion mathematical calculations each second, a speed called 200 petaflops. That as fast as each of the planet's 7.6 billion people doing 26 million calculations per second on a hand calculator.

The system and its twin at Lawrence Livermore National Laboratory in California were funded in 2014 as part of a $325 million Department of Energy program called Coral, but Summit took years to develop. Goosing supercomputer speeds has been tough this decade as increases in processor speed began tailing off.

It's got a high price tag, but Summit shows what it takes to stay at the cutting edge of computing. The US lags China today for both the top supercomputer and the most total supercomputer capacity on the Top500 list, but Summit will deliver a speed boost with new processor designs, fast storage and internal communications, and a design that can use artificial intelligence methods to zero in on the right computing calculations to be running in the first place.

Supercomputing meets AI

"The marketplace is beginning to recognize that AI and high-performance computing are not separate domains but things that need be viewed as integrated," Turek said. "The incorporation of machine learning dramatically reduces the amount of simulation that needs to be done to get to optimal results."

What's Summit good for? At Oak Ridge, it'll be scientific research into subjects like designing chemical formula, exploring new materials, studying links between cancer and genes on a very large scale, investigating fusion energy, researching the universe through astrophysics and simulating the earth's changing climate.

Lawrence Livermore, a lab funded chiefly by DOE's National Nuclear Security Administration, has a more militaristic mission, including stockpiling stewardship research to ensure nuclear weapons reliability.

Summit supercomputer is big

Summit divides work among 4,608 interconnected computer nodes housed in refrigerator-sized cabinets and liquid-cooled by pumping 4,000 gallons of water per minute through the system.

It takes up an eighth of an acre -- the size of two tennis courts. Its peak energy consumption is about 15 megawatts, enough to power more than 7,000 homes.

Each node has two IBM Power9 chips running at 3.1GHz, and each of those has 22 processing cores that run in parallel. Connected to each pair of Power9 chips are six Nvidia Tesla V100 graphics chips. Nvidia rose to power selling graphics chips, but it's capitalized on the fact that those chips can be adapted for supercomputing and AI work, too -- especially the computationally intense training phase in which AI systems learn to detect patterns from real-world data.

Each node has 1.6 terabytes of memory, too, a design that keeps data handy for fast access when needed by a processor. Data produced by calculations can be saved at a whopping 2.2 terabytes per second on a 250-petabyte storage system.

If those numbers leave you flummoxed, that's 100 times the memory of a high-end laptop and about 1,000 times the storage capacity.

Measuring supercomputer performance

The Top500 list gauges performance with a single benchmark called Linpack; China's Sunway TaihuLight, at the National Supercomputing Center in Wuxi, claims the top Linpack score of 93 petaflops. It's not bad, but for years supercomputer makers have made it clear it's not a full reflection of all the sorts of work a supercomputer needs to perform.

Indeed, Summit was designed for 30 different applications.

"There's got to be more than one figure of merit that characterizes value," Turek said. "You can put a huge number of servers together and conjure up a high Linpack score, but to get them to scale on real applications is a real art."

One candidate for expanding beyond Linpack is the High Performance Conjugate Gradients (HPCG).

IBM has been building DOE supercomputers for decades, and Summit isn't the end of the lineage. Instead, it's a step toward a later goal of reaching an exascale system, one that can perform a quintillion calculations per second, about 5 times that of Summit.

Turek said he wasn't sure such a thing was possible when the Summit design began in 2014. Now he's confident it is, though Summit will be used to figure out exactly how.

"This is a signpost on the way to exascale, and we want to see what works and doesn't work," Turek said "We are absolutely convinced we can get to exascale."

 

Additional Info

  • Origin: cnet/GhAgent