Install Zymtrace Profiler

The zymtrace profiler is a lightweight, OpenTelemetry-compliant agent that collects performance profiles from your applications and systems with minimal overhead (<1% CPU and ~256MB RAM ). It can be deployed in various ways to suit your infrastructure needs. This guide covers installation using Kubernetes manifests, Helm charts, Docker containers, or direct binary installation.

tip

Please review the prerequisites before beginning, particularly if you are working in an airgapped environment.

Installation methods

Choose the installation method that best suits your environment. Each method provides the same functionality with different deployment characteristics.

Kubectl
Docker
Binary

Install with Kubectl

The profiler agent is deployed as a DaemonSet.

Create a namespace

kubectl create namespace zymtrace
Deploy

kubectl apply -n zymtrace -f https://helm.zystem.io/k8s-manifests/profiler/deploy.yaml

tip

Important: If you need GPU profiling capabilities, we recommend using the Helm installation method instead, which provides more configuration options for enabling GPU profiling.

tip

Collection Agent Configuration

By default, the collection agent is set to zymtrace-gateway.zymtrace.svc.cluster.local:80.

Remember that the collection agent should point to the zymtrace gateway service. If you're installing the profiling agent on a different cluster than the one hosting the backend services, you'll need to modify this setting. In this case, we recommend downloading the configuration file first.

curl -O https://helm.zystem.io/k8s-manifests/profiler/deploy.yaml

  </TabItem>
  <TabItem value="helm" label="Helm">

### Install with Helm \{#install-helm}

:::info[Helm Chart Source]
The Helm chart source code is available on GitHub: [zystem-io/zymtrace-charts](https://github.com/zystem-io/zymtrace-charts/tree/main/charts/)

# Add the zystem repository
helm repo add zymtrace https://helm.zystem.io

# List available charts and versions
helm search repo zymtrace --versions

# Install
helm install profiler zymtrace/profiler \
 --create-namespace \
 --namespace zymtrace \
 --set profiler.args[0]="--collection-agent=zymtrace-gateway.zymtrace.svc.cluster.local:80" \
 --set profiler.args[1]="--disable-tls" \
 --set profiler.args[2]="--project=colossus" \
 --set profiler.args[3]="--tags=prod;us-east" \
 --set "profiler.env.HTTPS_PROXY=http://username:password@proxy:port"

Enabling GPU Profiling

To enable GPU profiling capabilities with Helm, add the cudaProfiler.enabled setting.

Optionally, include the --enable-gpu-metrics flag to collect GPU metrics as well as shown below:

# Install with GPU profiling and metrics enabled
helm install profiler zymtrace/profiler \
 --create-namespace \
 --namespace zymtrace \
 --set profiler.cudaProfiler.enabled=true \
 --set profiler.args[0]="--collection-agent=zymtrace-gateway.zymtrace.svc.cluster.local:80" \
 --set profiler.args[1]="--disable-tls" \
 --set profiler.args[2]="--enable-gpu-metrics" \
 --set profiler.args[3]="--enable-vllm-metrics"  \
 --set profiler.args[4]="--nvml-auto-scan"  # Optional: use if NVML library path is unknown

This will automatically:

Deploy the necessary GPU profiling libraries
Configure the volume mounts for sharing libraries between containers
Extract the profiling libraries to the shared host path
Make GPU profiling available to all containers that mount the shared volume
If defined, enable GPU metrics collection (power usage, memory utilization, temperature, and performance metrics)

tip

The --enable-gpu-metrics flag is recommended for comprehensive GPU monitoring, but you can remove it if you only want CUDA profiling without metrics collection. You can profile CUDA applications without collecting GPU metrics.

You can also collect only GPU metrics without profiling.

The --enable-vllm-metrics flag enables automatic metrics collection for vLLM-based LLM inference applications. The metrics are correlated with GPU Profilers.

NVML Library Path: Use --nvml-auto-scan to automatically detect the NVML library path. If you know the exact path, use --nvml-path=/path/to/libnvidia-ml.so instead. This is only required for GPU metrics collection.

Using custom-values.yaml

Using custom values

You can create a custom values file with your configurations:

profiler:
 args:
   - "--collection-agent=zymtrace-gateway.zymtrace.svc.cluster.local:80"  # Point to your gateway service
   - "--disable-tls"
   - "--enable-gpu-metrics"
   - "--enable-vllm-metrics"
   - "--nvml-auto-scan"  # Optional: use if NVML library path is unknown
   - "--project=my-project-123"
   - "--tags=prod;ha100;fp16"

 # Enable GPU profiling
 cudaProfiler:
   enabled: true
   hostMountPath: "/var/lib/zymtrace/profiler"  # Default path

 env:
   HTTPS_PROXY: "http://user:pass@proxy.company.com:8080"

Then install using:

helm install profiler zymtrace/profiler \
 --create-namespace \
 --namespace zymtrace \
 -f custom-values.yaml

Next Step: Hook up your CUDA application

After enabling the GPU profiling module in the profiler, connect it to your CUDA application by referring to the GPU Profiler documentation.

Install with Docker

docker run --cgroupns=host --pid=host --privileged --net=host \
-v /etc/machine-id:/etc/machine-id:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /sys/kernel/debug:/sys/kernel/debug:ro \
--rm -d --name zymtrace-profiler ghcr.io/zystem-io/zymtrace-pub-profiler:26.4.4 \
--disable-tls --collection-agent <host>:8080

tip

Be sure to update the collection agent value to point to your gateway service.

Enabling GPU Profiling

To enable GPU profiling capabilities with Docker, mount an additional volume. Optionally, include the --enable-gpu-metrics flag to collect GPU metrics as shown below.

docker run --cgroupns=host --pid=host --privileged --net=host \
-v /etc/machine-id:/etc/machine-id:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /sys/kernel/debug:/sys/kernel/debug:ro \
-v /var/lib/zymtrace/profiler:/opt/zymtrace-cuda-profiler \
--rm -d --name zymtrace-profiler ghcr.io/zystem-io/zymtrace-pub-profiler:26.4.4 \
--disable-tls \
--collection-agent <host>:8080 \
--enable-gpu-metrics \
--nvml-auto-scan  # Optional: use if NVML library path is unknown

This will automatically:

Extract the necessary CUDA profiler libraries to the host path (/var/lib/zymtrace/profiler)
Make GPU profiling available to your GPU workloads
Enable GPU metrics collection when the flag is included

tip

You can also collect only GPU metrics without profiling.

To verify the profiler libraries were extracted correctly:

# Check the contents of the shared volume directory
ls -la /var/lib/zymtrace/profiler

# You should see
# libzymtracecudaprofiler.so

Next Step: Hook up your CUDA application

After enabling the GPU profiling module in the profiler, connect it to your CUDA application by referring to the GPU Profiler documentation.

Install with Binary

x86_64
ARM64

x86_64 Installation

# Download the binaries
curl -LO https://dl.zystem.io/zymtrace/26.4.4/amd64/zymtrace-profiler.tar.gz

# Extract them
sudo tar -xzvf zymtrace-profiler.tar.gz -C / --no-same-owner

# Run the agent with optional GPU metrics
sudo /opt/zymtrace/profiler/zymtrace-profiler --collection-agent localhost:8080 --disable-tls --enable-gpu-metrics --nvml-auto-scan

Enabling GPU Profiling

Unlike Helm and Docker installations, there are no explicit steps to enable GPU profiling with the binary installation. You can directly start profiling your CUDA application:

env RUST_LOG="zymtracecudaprofiler=info" \
   CUDA_INJECTION64_PATH="/opt/zymtrace/profiler/libzymtracecudaprofiler.so" \
   python -u matmul.py

tip

The --enable-gpu-metrics flag is recommended for GPU monitoring, but you can remove it if you only want CUDA profiling without metrics collection. You can profile CUDA applications without collecting GPU metrics.

You can also collect only GPU metrics without profiling.

ARM64 Installation

# Download the binaries
curl -LO https://dl.zystem.io/zymtrace/26.4.4/arm64/zymtrace-profiler.tar.gz

# Extract them
sudo tar -xzvf zymtrace-profiler.tar.gz -C / --no-same-owner

# Run the agent with optional GPU metrics
sudo /opt/zymtrace/profiler/zymtrace-profiler --collection-agent localhost:8080 --disable-tls --enable-gpu-metrics --nvml-auto-scan

tip

The --enable-gpu-metrics flag enables GPU monitoring (power, memory, temperature, performance) on systems with GPUs. Remove it if you don't need GPU metrics collection.

tip

Be sure to update the --collection-agent value to point to your zymtrace gateway service.

Management

These commands help you monitor and maintain your zymtrace profiler installation. Use them to check the agent's status and logs.

Kubectl
Helm
Docker
Systemd

Kubectl Management

# Check agent status
kubectl get pods -n zymtrace -l app=zymtrace,component=profiler

# View agent logs
kubectl logs -f -n zymtrace -l app=zymtrace,component=profiler

Helm Management

Preserve your values on upgrade

When using --set (e.g. updating image tags), always add --reset-then-reuse-values to avoid resetting your chart values to defaults.

Use --reset-then-reuse-values when upgrading with --set (e.g. image tags)
Use -f <values-file> when doing a full deployment — no reuse flag needed in that case

# Check profiler status (not agent)
helm status profiler -n zymtrace

# Upgrade charts, force a new image pull
helm upgrade profiler zymtrace/profiler -n zymtrace --set global.imagePullPolicy=Always --reset-then-reuse-values \
 --debug \
 --atomic

# Get pod status
kubectl get pods -n zymtrace -l app=zymtrace,component=profiler

# Upgrade profiler (not agent)
helm upgrade profiler zymtrace/profiler -n zymtrace --reset-then-reuse-values \
 --debug \
 --atomic

# View profiler values (not agent)
helm get values profiler -n zymtrace

# Uninstall profiler (not agent)
helm uninstall profiler -n zymtrace

Docker Management

# View running agents
docker ps | grep zymtrace/profiler

# View agent logs
docker logs <container-id>

# Stop agent
docker stop <container-id>

# Remove agent container
docker rm <container-id>

Setting up systemd service

To manage the profiler as a systemd service:

Create the zymtrace directory:

sudo mkdir -p /opt/zymtrace
Move the downloaded agent binary to the installation directory:

sudo mv zymtrace-profiler /opt/zymtrace/
Make the binary executable:

sudo chmod +x /opt/zymtrace/zymtrace-profiler
Create a systemd service file:

sudo vi /etc/systemd/system/zymtrace.service

Copy and paste the following configuration into the file:

[Unit]
Description=zymtrace profiler service
After=network.target

[Service]
Type=simple
ExecStart=/opt/zymtrace/zymtrace-profiler --collection-agent localhost:8080 --disable-tls
Restart=always
RestartSec=10
WorkingDirectory=/opt/zymtrace

[Install]
WantedBy=multi-user.target

Enabling GPU Metrics Collection

To enable GPU metrics collection, add the --enable-gpu-metrics and --nvml-auto-scan flags to the ExecStart line:

[Unit]
Description=zymtrace profiler service
After=network.target

[Service]
Type=simple
ExecStart=/opt/zymtrace/zymtrace-profiler --collection-agent localhost:8080 --disable-tls --enable-gpu-metrics --nvml-auto-scan
Restart=always
RestartSec=10
WorkingDirectory=/opt/zymtrace

[Install]
WantedBy=multi-user.target

This enables GPU metrics collection, including power usage, memory utilization, temperature, and performance metrics. For more details, see the GPU Metrics documentation.

Note: Use --nvml-path=/path/to/libnvidia-ml.so instead of --nvml-auto-scan if you know the exact NVML library path.

tip

Be sure to update the --collection-agent value in the systemd service file to point to your zymtrace gateway service.

Enable the service

# Reload daemon
sudo systemctl daemon-reload

# Start agent
sudo systemctl start zymtrace

# Enable agent start on boot
sudo systemctl enable zymtrace

# Check agent status
sudo systemctl status zymtrace

Management commands

Once the systemd service is set up, use these commands to manage the profiler:

# Check systemd service status
sudo systemctl status zymtrace

# View service logs
sudo journalctl -u zymtrace -f

# Restart service
sudo systemctl restart zymtrace

# Stop service
sudo systemctl stop zymtrace

Installation methods​

Install with Kubectl​

Enabling GPU Profiling​

Using custom-values.yaml​

Next Step: Hook up your CUDA application​

Install with Docker​

Enabling GPU Profiling​

Next Step: Hook up your CUDA application​

Install with Binary​

x86_64 Installation​

Enabling GPU Profiling​

ARM64 Installation​

Management​

Kubectl Management​

Helm Management​

Docker Management​

Setting up systemd service​

Enabling GPU Metrics Collection​

Management commands​

Installation methods

Install with Kubectl

Enabling GPU Profiling

Using custom-values.yaml

Next Step: Hook up your CUDA application

Install with Docker

Enabling GPU Profiling

Next Step: Hook up your CUDA application

Install with Binary

x86_64 Installation

Enabling GPU Profiling

ARM64 Installation

Management

Kubectl Management

Helm Management

Docker Management

Setting up systemd service

Enabling GPU Metrics Collection

Management commands