Skip to main content
SHARE
Research Highlight

Preparing Transformer models for Frontier

Deep learning with Transformer language models is increasingly key to biomedical research:

  • Molecular design, such as in drug discovery
  • Cancer phenotyping for personalized medicine

Testing and optimizing on Crusher, an OLCF testbed with Frontier hardware:

  • Standard Transformer libraries: Megatron, DeepSpeed, 🤗 Hugging Face
  • Varied document lengths, from molecular SMILES (short) to clinical text (long)

Node-level speedup of Crusher versus Summit:

  • Expectation of node-level speedup from compute-bound FP16/BF16 roofline: 2.25x
  • Currently measured speedups: 1.8 – 1.85x
Preparing Transformer models for Frontier CSED ORNL

Transformer language models provide state-of-the-art accuracy in a range of learning tasks, ranging from natural language processing to non-traditional applications such as molecular design. For domain-specific applications like clinical text or molecular design, Transformer models must be 'pre-trained' on a very large, domain-relevant data corpus. However, this pre-training procedure is very computationally intensive, requiring significant GPU compute. For instance, the pre-training of a Transformer for molecular design was the subject of our 2021 COVID-19 Gordon Bell finalist (see Andrew Blanchard's slide). Consequently, being able to efficiently train and scale these models on Frontier will be crucial. We've been working to port and optimize large Transformer libraries, such as Megatron and DeepSpeed, to Crusher and the Frontier node architecture. We're currently seeing a 1.8-1.85x speedup for a Crusher node versus a Frontier node. As a roofline model for FP16 ops predicts a 2.25x speedup, we think this is an encouraging result.

Team members: John Gounley (ORNL), Andrew Blanchard (ORNL), Mayanka Chandra Shekar (ORNL), Isaac Lyngaas (ORNL), Xiao Wang (ORNL), Hyunseung Yoo (ANL)