Skip to main content
SHARE
Blog

OLCF teams fine-tune Frontier for science

  • From left, System Acceptance and User Environment Group Leader Verónica Melesse Vergara; Operations Section Head Ashley Barker; User Access, Outreach, and Communication Group Leader Katie Bethea; and User Assistance Group Leader Chris Fuson. Image credit: Carlos Jones, ORNL

  • Operations Section Head Ashley Barker. Image credit: Carlos Jones, ORNL

  • From left, System Acceptance and User Environment Group Leader Verónica Melesse Vergara; Operations Section Head Ashley Barker; User Access, Outreach, and Communication Group Leader Katie Bethea; and User Assistance Group Leader Chris Fuson. Image credit: Carlos Jones, ORNL

  • Operations Section Head Ashley Barker. Image credit: Carlos Jones, ORNL

The world’s first exascale supercomputer rocked the computing world with record speeds last year, but Frontier fully opened for scientific business only in 2023.

The HPE Cray system shot to No. 1 on the TOP500 list of the world’s fastest supercomputers in May 2022 with a record speed of 1.1 exaflops, or 1 quintillion calculations per second, capping more than a decade of work to break the exascale barrier. But that announcement didn’t mean the job was done.

Work continued through the second half of the year, led by the Oak Ridge Leadership Computing Facility’s scientific engagement and user acceptance experts, to certify that Frontier met all the standards to enable the world-changing discoveries promised by exascale.

“There have been a lot of people involved, and it’s been a long process, especially for all the users eager to start running their projects,” said Ashley Barker, who oversees the OLCF’s Operations Section. “Everyone wants to know when they can get on Frontier, so our lives became a little more exciting at year’s end than when we started some of this work. But the time we spent testing on the front end will minimize any suffering later. We had to make sure the network would be stable and everything would run smoothly.”

Ensuring Frontier runs as advertised spanned two fronts: the hardware and software, handled mainly by the Operations Section, and the codes developed to run the various simulations, handled mainly by the Scientific Engagement Section. Some codes had already been run on Summit, Frontier’s predecessor, and most have also been run on Crusher, the prototype system for Frontier — equivalent to about one-and-a-half of Frontier’s 74 cabinets, or 192 of its more than 9,400 nodes — by codes teams working with users at the OLCF’s Center for Accelerated Application Readiness.

“It’s a big difference from how the changeover was handled with previous systems,” said Verónica Melesse Vergara, who leads the System Acceptance and User Environment Group. “On Summit, we never had any real outside users on the system until it opened. This way we had more time to try everything out and uncover errors we could fix before the final unveiling. Early on is always the time to find those errors, because otherwise we’d be finding them while people are running projects.”

Those projects range from national security challenges to probing renewable energy sources to fundamental questions of physics, chemistry, biology and astronomy. Some rely on new codes made to order, others on codes that date back to the first generation of supercomputers.

“We’re always looking to improve performance,” said Matt Norman, who leads the Advanced Computing for Life Sciences and Engineering Group. “Some codes have been ported across machines for decades and may do fine but need a boost. On some teams, our experts are almost part of the study as we work with the principal investigators to help them understand what kind of scientific approach will get the best performance.”

The codes powering those studies can run to millions of lines of complex equations, intended to simulate scenarios like functioning windmills, nuclear reactions and exploding stars. The shift to Frontier involved some major changes in programming language that called for alterations.

“These codes are like information pipelines,” said Dayle Smith, who leads the Advanced Computing for Chemistry and Materials Group. “The equations are like interlocking segments working together, so we couldn’t necessarily just swap one part out for another. There can be a challenge adjusting what worked well on Summit to run even better on Frontier.”

Rewriting an established code from scratch tends to be the least-preferred option.

“It’s an iterative process,” said Tom Beck, who oversees scientific engagement. “Our experts offer guidance to the vendors and users on potential tweaks and trade-offs. This work will be crucial to get their studies up and running, now and in the future.”

As 2022 wound down, the finish line came into sight.

“It’s been a long road, but we were always optimistic,” Vergara said. “There were just so many nodes to test. Now that we’ve finished the tests, we hope the users will find it worth the wait. We’ve cleared the path so they can work at top speed.”