Multi-Tenancy In AI Clouds

Network Isolation with Netris®

Whitepaper
Abstract

This paper presents an architectural model for implementing hard multi-tenancy in AI factory environments built on NVIDIA® DGX™ systems. It outlines how enterprises and NVIDIA Cloud Partners (NCPs) can establish consistent tenant isolation across heterogeneous connectivity domains, including East–West fabrics based on NVIDIA Spectrum-X™ or InfiniBand™, Ethernet-based North–South fabrics, NVLink® Multi-Node rack-scale fabrics, and optional DPU-augmented domains.

The document describes an architectural model in which tenant intent is defined once and consistently applied across heterogeneous fabrics. It illustrates how Netris® orchestrates tenant-related network configuration within this model using supported NVIDIA management APIs. The paper examines the technical considerations involved in aligning Ethernet VRFs, InfiniBand PKeys, and NVLink partition boundaries; coordinating lifecycle updates across fabrics; and maintaining operational consistency through topology-aware validation and configuration verification.

This guidance is intended for CTOs, AI/Cloud architects, NCP operators, and enterprise infrastructure teams deploying shared DGX-based clusters that require predictable tenant isolation and consistent network behavior.