Skip to content

MTP: Multi-Token Prediction

Super and Ultra incorporate MTP layers for improved long-form text generation efficiency and better model quality.

Key Benefits

  • Predicts multiple future tokens simultaneously
  • Enhances long-context coherence and consistency
  • Improves generation quality for extended sequences
  • Reduces autoregressive generation latency