Skip to content

HybridMoE: Nemotron 3 Family Architecture

Nemotron 3 family of models utilize a hybrid Mamba-Transformer MoE architecture to provide best-in-class throughput while having better or on-par accuracy than standard Transformers.

Key Benefits

  • Combines Mamba's efficient state space modeling with Transformer's attention mechanisms
  • Mixture of Experts (MoE) activates only relevant experts per token
  • Achieves superior throughput compared to dense or standard MoE models
  • Maintains or improves accuracy benchmarks