NVIDIA UFM Integration Plugin for Netris Controller¶
Overview¶
The Netris-UFM plugin provides seamless integration between Netris Controller and NVIDIA UFM (Unified Fabric Manager) for AI infrastructures with hybrid InfiniBand and Ethernet networks. This integration allows infrastructure operators to define compute multi-tenancy in a single place through Netris, significantly simplifying management across both network types.
Key Benefits¶
Unified Management Interface: Define tenant isolation by simply listing servers in a server-cluster object
Automated Provisioning: Automatically configure both Ethernet (via Netris) and InfiniBand (via UFM) networks
Simplified Operations: Eliminate the need to manage SwitchPorts, VLANs, VRFs on Ethernet and GUIDs, PKeys, SHARP groups on InfiniBand separately
Architecture¶
The Netris-UFM plugin acts as the integration layer between Netris Controller and NVIDIA UFM:
Netris Controller: Orchestrates the Ethernet switches and provides the primary user interface
NVIDIA UFM: Manages the InfiniBand switches and provides specialized InfiniBand functionality
Netris-UFM Plugin: Synchronizes configurations between both systems
When you define a server cluster in Netris, the plugin automatically:
Discovers InfiniBand port GUIDs from UFM
Creates and manages appropriate PKeys in UFM
Sets up SHARP reservations for high-performance operations
Prerequisites¶
Before installing the Netris-UFM plugin, ensure:
A functioning Netris Controller environment
A properly configured NVIDIA UFM installation
Network connectivity between both systems
Appropriate access credentials for both platforms
Installation¶
Option 1: Deploy within an existing Netris Controller Kubernetes cluster¶
This option is recommended if you already have a Netris Controller running in a Kubernetes environment.
Download the Kubernetes deployment YAML file:
wget https://get.netris.io/netris-controller-ufm.yaml
Edit the YAML file to update the secret values based on your environment:
apiVersion: v1 kind: Secret metadata: name: netris-controller-nvidia-ufm-agent-envs namespace: netris-controller type: Opaque stringData: NETRIS_CONTROLLER_ADDR: "https://netris.example.com" NETRIS_CONTROLLER_LOGIN: "netris" NETRIS_CONTROLLER_PASSWORD: "newNet0ps" NETRIS_VERIFY_SSL: "true" NETRIS_SITE_NAME: "Site" UFM_ADDR: "https://ufm.example.com" UFM_LOGIN: "admin" UFM_PASSWORD: "123456" UFM_VERIFY_SSL: "false" UFM_ID: "ufm-lab" UFM_PKEY_RANGE: "100-7ffe"
Apply the configuration to your Kubernetes cluster:
kubectl apply -f netris-controller-ufm.yaml
Option 2: Deploy as a standalone Docker container¶
This option is ideal for environments without Kubernetes or when you want to deploy on a separate host.
Create an environment file (e.g.,
env
) with the following content:NETRIS_CONTROLLER_ADDR="https://netris.example.com" NETRIS_CONTROLLER_LOGIN="netris" NETRIS_CONTROLLER_PASSWORD="newNet0ps" NETRIS_VERIFY_SSL="true" NETRIS_SITE_NAME="Site" UFM_ADDR="https://ufm.example.com" UFM_LOGIN="admin" UFM_PASSWORD="123456" UFM_VERIFY_SSL="false" UFM_ID="ufm-lab" UFM_PKEY_RANGE="100-7ffe" LOG_LEVEL="info"
Run the Docker container:
docker run -d \ --env-file=env \ --name=netris-ufm \ --entrypoint "/app/servicebin" \ netrisai/bare-metal-netris-ufm-agent:0.1.1
Configuration Parameters¶
Netris Controller Configuration¶
Parameter |
Description |
Example |
---|---|---|
NETRIS_CONTROLLER_ADDR |
The URL of your Netris Controller |
|
NETRIS_CONTROLLER_LOGIN |
Username for authenticating with Netris Controller |
netris |
NETRIS_CONTROLLER_PASSWORD |
Password for authenticating with Netris Controller |
newNet0ps |
NETRIS_VERIFY_SSL |
Whether to verify SSL certificates when connecting to Netris Controller |
true or false |
NETRIS_SITE_NAME |
The name of the site in Netris Controller to manage |
Datacenter-1 |
NETRIS_VNET_OWNER |
The owner for virtual networks in Netris |
Admin |
NVIDIA UFM Configuration¶
Parameter |
Description |
Example |
---|---|---|
UFM_ADDR |
The URL of your NVIDIA UFM server |
|
UFM_LOGIN |
Username for authenticating with UFM |
admin |
UFM_PASSWORD |
Password for authenticating with UFM |
123456 |
UFM_VERIFY_SSL |
Whether to verify SSL certificates when connecting to UFM |
true or false |
UFM_ID |
Unique identifier for this UFM instance |
ufm-lab |
UFM_PKEY_RANGE |
Range of PKey IDs that can be allocated to clusters, in hexadecimal format |
100-7ffe |
Agent Configuration¶
Parameter |
Description |
Default |
Example |
---|---|---|---|
LOG_LEVEL |
Logging level for the agent |
info |
info or debug |
RECONCILE_INTERVAL |
Interval in seconds between reconciliation operations |
10 |
10 |
Usage Guide¶
After successfully installing and configuring the Netris-UFM agent, follow these steps to set up and use the integration:
1. Server Configuration in Netris¶
The first step is to create servers in the Netris Controller inventory that match exactly with the servers in UFM:
In Netris Controller, navigate to Network → Topology → +Add.
Create servers with identical names as they appear in UFM (this is crucial for proper GUID mapping)
Once created, the Netris-UFM agent will automatically sync the InfiniBand GUIDs from UFM into Netris
Important
Server names must match exactly between UFM and Netris Controller for the integration to work properly.
2. Create a Server Cluster Template¶
Next, create a Server Cluster Template that defines the network configuration:
Navigate to Services → Server Cluster Template.
Click Add to create a new template
Configure the template using JSON with specific sections for different network fabrics
Here’s an example template that configures:
InfiniBand East-West fabric (managed by UFM)
Ethernet North-South fabric (for in-band and storage traffic)
OOB Management network
[
{
"postfix": "East-West",
"type": "netris-ufm",
"ufm": "ufm-lab",
"pkey": "auto"
},
{
"postfix": "North-South-in-band-and-storage",
"type": "l2vpn",
"vlan": "untagged",
"vlanID": "auto",
"serverNics": [
"eth9",
"eth10"
],
"ipv4Gateway": "192.168.7.254/21"
},
{
"postfix": "OOB-Management",
"type": "l2vpn",
"vlan": "untagged",
"vlanID": "auto",
"serverNics": [
"eth11"
],
"ipv4Gateway": "192.168.15.254/21"
}
]
Understanding the Template Structure¶
East-West Fabric (UFM):
"type": "netris-ufm"
- Identifies this as an InfiniBand fabric managed by UFM"ufm": "ufm-lab"
- Specifies the UFM instance identifier"pkey": "auto"
- Automatically assigns an appropriate PKey from the configured range
North-South Fabric (Ethernet):
Standard Netris L2VPN configuration
"serverNics": ["eth9", "eth10"]
- Specifies which NICs (9, 10) connect to this fabric
OOB Management:
Separate network for out-of-band management
"serverNics": ["eth11"]
- Specifies NIC 11 for this network
3. Create Server Clusters¶
After setting up the template, create actual server clusters:
Navigate to Services → Server Cluster
Click Add to create a new cluster
Select Site and Admin
Set VPC to ‘create new’
Select the template you created in the previous step
Add the servers that should be part of this cluster
Submit the configuration
8. Verification¶
Once the server cluster is created:
The Netris-UFM agent will automatically:
Identify the InfiniBand GUIDs associated with the servers in the cluster
Provision appropriate PKeys in UFM
Create necessary SHARP reservations if applicable
Verify the configuration:
Check the Netris Controller UI for successful cluster creation
Examine the UFM UI to confirm PKey assignments
Test connectivity between servers in the cluster via InfiniBand
3. Monitoring Integration Status¶
To monitor the status of the integration:
Check the Netris-UFM agent logs (as described in the Monitoring section)
Verify the synchronization state:
# For Kubernetes kubectl logs -f deployment/netris-controller-nvidia-ufm-agent -n netris-controller # For Docker docker logs -f netris-ufm
Functional Workflow¶
Discovery Phase:
Plugin connects to both Netris Controller and NVIDIA UFM
InfiniBand port GUIDs are discovered from UFM and stored in Netris inventory
Cluster Creation:
When a server cluster is created or modified in Netris Controller
Plugin identifies affected servers and their InfiniBand GUIDs
Appropriate PKeys are automatically provisioned in UFM
SHARP Integration:
For high-performance network operations, SHARP reservations are created
These correspond to the server clusters defined in Netris
Continuous Reconciliation:
Plugin periodically synchronizes between Netris and UFM
Ensures consistency between Ethernet and InfiniBand configurations
Reconciliation interval is configurable (default: 10 seconds)
Monitoring and Troubleshooting¶
Viewing Logs¶
For Kubernetes deployment:
kubectl logs -f deployment/netris-controller-nvidia-ufm-agent -n netris-controller
For Docker container:
docker logs -f netris-ufm
Common Issues and Solutions¶
Connection Issues to Netris Controller or UFM¶
Symptoms:
Log messages indicating connection timeouts or authentication failures
Missing data in Netris inventory
Solutions:
Verify network connectivity between the plugin and both systems:
ping netris.example.com ping ufm.example.com
Check credentials in the configuration:
Verify username/password combinations for both systems
Ensure API permissions are sufficient
Verify SSL certificate settings:
If using self-signed certificates, set NETRIS_VERIFY_SSL/UFM_VERIFY_SSL to “false”
For production, use valid certificates and set verification to “true”
PKey Assignment Issues¶
Symptoms:
Server clusters don’t have proper isolation in InfiniBand
Errors about PKey allocation failures in logs
Solutions:
Ensure the UFM_PKEY_RANGE has sufficient available IDs:
Check current PKey usage in UFM
Adjust the range if needed
Verify server naming consistency:
Server names must match exactly between Netris and UFM
Check for any server name discrepancies
Examine the PKey allocation process in debug logs:
# For Kubernetes kubectl logs -f deployment/netris-controller-nvidia-ufm-agent -n netris-controller | grep "PKey" # For Docker docker logs -f netris-ufm | grep "PKey"
SHARP Reservation Issues¶
Symptoms:
InfiniBand performance is not at expected levels
SHARP reservations are not being created
Solutions:
Verify SHARP is enabled on all relevant switches in UFM
Check UFM configuration for SHARP support
Ensure the plugin has permission to create SHARP reservations
Set LOG_LEVEL to “debug” for more detailed information:
LOG_LEVEL="debug"
Synchronization Delays¶
Symptoms:
Changes in Netris don’t appear quickly in UFM
Inconsistent behavior after making configuration changes
Solutions:
Adjust the RECONCILE_INTERVAL to a shorter time period for faster synchronization
Check for high CPU or memory usage on the plugin host
Verify network latency between the plugin and both systems
Restart the plugin service if synchronization issues persist:
kubectl rollout restart deployment/netris-controller-nvidia-ufm-agent -n netris-controller
or
docker restart netris-ufm
Version Compatibility¶
Netris Controller Version |
NVIDIA UFM Version |
Plugin Version |
---|---|---|
4.4.1+ |
6.15.4+ |
0.1.1+ |
Getting Started Guide¶
Quick Setup Example¶
Install the plugin using the Kubernetes or Docker method above
Verify the plugin is running properly:
# For Kubernetes kubectl get pods -n netris-controller | grep ufm # For Docker docker ps | grep netris-ufm
Create a Server Cluster Template in Netris Controller UI or API
Create Server Cluster with the servers that have InfiniBand connections
Verify PKey assignments in UFM:
Check the UFM UI for PKey assignments
Verify servers in the cluster can communicate via InfiniBand
Additional Resources¶
—
You are welcome to join our Slack channel to get additional support from our engineers and community.