Welcome to Nemotron 3 Super NVFP4 Deployment Guide

This guide provides comprehensive instructions for deploying and hosting NVIDIA Nemotron 3 Super in NVFP4 (4-bit floating point) format across various platforms and cloud providers.

🤖 Nemotron 3 Super

Nemotron 3 Super is NVIDIA's latest open-source LLM featuring a hybrid Mamba-Transformer architecture with Mixture of Experts (MoE) routing for enhanced agentic reasoning capabilities.

Core Features

Learn More

Official NVIDIA Introduction

🚀 Getting Started

Select a section below to begin:

Inference Engines - Detailed setup for vLLM, SGLang, TensorRT-LLM, and Triton
NeoClouds Providers - Deployment guide for Modal, SimplePod, Verda, and Nebius

📖 About This Guide

This documentation covers:

NVFP4-specific configuration for optimal performance on Blackwell architecture GPUs
Step-by-step deployment commands for each supported platform
Comparative analysis of different inference engines
Cloud provider integration patterns for transient GPU workloads

Welcome to Nemotron 3 Super NVFP4 Deployment Guide ​

🤖 Nemotron 3 Super ​

Core Features ​

Learn More ​