NeoClouds Provider Overview

The hybridai-NVFP4 TUI application integrates with specialized GPU cloud providers (Neo-Clouds) via APIs/SDKs to provision transient computing power optimized for NVFP4 workloads on Blackwell architecture GPUs.

These providers offer GPU-optimized infrastructure with flexible pricing models, ideal for the "one-off" high-compute tasks that the hybridai-NVFP4 system is designed to handle.

Provider Comparison

Provider	Website	Primary Integration	Best For	Typical GPU Offerings
Modal	modal.com	SDK/API	Serverless, bursty workloads	H100, A100, L40S, custom GPU tiers
SimplePod	simplepod.com	Kubernetes pods	Predictable, sustained loads	H100, H200, Blackwell GPUs
Verda	verda.com	REST API instances	Low-latency, interactive use	H100, HGX H100, custom configurations
Nebius	nebius.com	Cloud platform	Enterprise, scalable deployments	H100, B200, custom GPU clusters

Core Requirements Across All Platforms

To successfully deploy Nemotron 3 Super in NVFP4 format on any NeoCloud provider, ensure the following:

Blackwell GPU Architecture Support (sm_100 compute capability)
- Required for native NVFP4 support
- Look for instances mentioning Blackwell, B200, B300, GB200, GB300, or DGX Spark
CUDA Version 12.9+ or 13.x
- Essential for NVFP4 quantization and inference
- Verify in provider documentation or instance specifications
Docker with NVIDIA Container Toolkit
- Required for containerized deployment of inference engines
- Most GPU-focused providers include this by default
High-Bandwidth Networking & Storage
- NVMe storage recommended for model caching (~200GB+)
- Sufficient network bandwidth for multi-GPU communication if using tensor parallelism
Flexible Instance Lifecycle Management
- Ability to start/stop instances on demand
- Support for spot/preemptible instances for cost optimization
- API/SDK provisioning for automation

Cost Optimization Strategies

Spot/Preemptible Instances

All four providers offer discounted instances with interruption potential:

Modal: Functions can be interrupted but designed for fault tolerance
SimplePod: Spot pods with termination grace periods
Verda: Preemptible instances with warning periods
Nebius: Preemptible VMs with configurable notice

Right-Sizing Recommendations

For Nemotron 3 Super 120B NVFP4:

Minimum: 2x Blackwell GPUs (e.g., 2x RTX PRO 6000) for tensor parallel size 2
Recommended: 4x Blackwell GPUs for better performance with tensor parallel size 4
Maximum: 8x Blackwell GPUs for maximum throughput (tensor parallel size 8)

Storage Considerations

Model storage: ~200GB for Nemotron 3 Super NVFP4 + working space
Recommended: 500GB+ NVMe for comfortable operation
For evals/benchmarks: 1TB+ NVMe
For model development: 2TB+ NVMe+

Getting Started

Select a provider below for detailed deployment instructions:

Modal - Serverless platform for bursty GPU workloads
SimplePod - Kubernetes-based GPU provisioning
Verda - REST API for GPU instance management
Nebius - Full cloud platform with GPU optimization

Authentication & Secrets Management

Each provider requires secure handling of credentials:

Hugging Face Token: Read-only access for gated models like Nemotron 3 Super
API Keys: Provider-specific authentication for instance provisioning
SSH Keys: Optional for secure instance access
Registry Credentials: For pulling Docker images from private repositories

Best practices:

Use provider secret management systems when available
Rotate tokens regularly
Limit permissions to minimum required
Audit access regularly

NeoClouds Provider Overview ​

Provider Comparison ​

Core Requirements Across All Platforms ​

Cost Optimization Strategies ​

Spot/Preemptible Instances ​

Right-Sizing Recommendations ​

Storage Considerations ​

Getting Started ​

Authentication & Secrets Management ​