Skip to main content

zymtrace Changelog

v25.12.3​

  • ui: fixed an issue that could prevent navigation from the "Top Entities" section on the Efficiency IQ page
  • profiler: remove special coloring of cuda launch frames

v25.12.2​

  • profiler: fix bug that could cause cuda launch function frames to appear on CPU flamegraph
  • ui: adjust truncation of legend entries in charts to use end truncation instead of middle truncation
  • ui: apply sorting to tooltip entries in metric charts
  • ui: add select all and deselect all buttons to language filter dropdown
  • gateway: update Envoy to v1.36.3
    • Versions ≥ v1.50.0 support automatically raising ulimits to their hard limit, which is important in K8S clusters using containerd ≥ 2.0, where default limits are very conservative now
  • storage: don't use EXCHANGE TABLE ClickHouse DDL
    • allows deploying ClickHouse on filesystems without support for atomic file renames (renameat2)

v25.12.1​

  • ui: add light mode support
  • ui: show description of SASS instruction on hover for GPU profiles
  • ui: fixed an issue that could cause chart legend entries to not be truncated correctly in Firefox
  • ui: enable transport compression for flamegraph WASM blob
  • backend: replace gRPC health checks with HTTP health checks
  • symdb: retry auto upload for broken executables after a day
    • interval can be configured via SYMDB__BROKEN_EXECUTABLE_RETRY_AFTER env variable (in seconds)
  • profiler: make auto upload size limits configurable
    • these limits only apply if -dwarf is given
    • ZYMTRACE_MAX_SYMBFILE_SIZE (for the symbol file) and ZYMTRACE_MAX_INPUT_FILE_SIZE (for the size of the binary file that we extract symbols from) can be used to configure this
    • both are in bytes
  • profiler: add support for NVIDIA MIG devices (for metrics)
  • profiler: fix bug that could cause available compute graph (and idle time of a machine) to drop significantly, if GPU profiling was active and GPU was heavily used
  • cudaprofiler: fix rare segfaults in CUDA caused by incompatible version of CUPTI already being loaded in some processes
    • PyTorch based processes sometimes would load an incompatible libcupti.so, which could lead to segfaults
  • cudaprofiler: enable sampling of kernel launches by default
  • cudaprofiler: expose stack trace sampling config via env vars

v25.11.6​

  • mcp: improve time ranges and limit retries if no data found
  • profiler: fix unwinding for LTO/PGO builds of interpreters
    • .cold parts of a split interpreter loop are now supported
  • profiler: add source location mapping for GPU profiling with PC sampling
  • profiler: add arguments to control how NVML is located
    • -nvml-path allows the NVML path to be specified explicitly
    • -nvml-auto-scan allows opting into an automatic scan for the NVML library
  • profiler: fixed an issue where in rare cases the profiler could crash during symbol extraction
  • cudaprofiler: flush PC samples more frequently, and track hardware buffer being full
  • cudaprofiler: support disassembling more SASS instructions
  • web/profiler: allow top GPU stall reasons to be viewed in top functions list
  • ui: show description of stall reason in tooltip for GPU flamegraph
    • makes the stall reasons more descriptive, and offers possible solutions for reducing their impact
  • ui: improve click-to-copy handling for text elements
  • ui: add script/container name support for GPU consumer metrics
  • ui: persist aggregation and language settings in both URL and local storage
    • allows sharing links without overwriting local settings; conflicting settings can be accepted, discarded, or kept diverged

v25.11.5​

  • profiler: fix bug that caused GPU implant instrumentation to not work when profiler runs in docker container

v25.11.4​

  • profiler: support zing 25.01 with JDK 11 and 17 (21 was already supported)
  • symblib: fix Rust function name demangling which failed in some cases
  • mcp: suggest using local time instead of UTC
  • profiler: reduce allocations in parseFDE()
  • profiler: place uprobes on GPU implant dynamically (https://github.com/zystem-io/zymtrace/pull/1453)
    • implant no longer needs to be mapped into the kubernetes container (for the profiler)
  • ui: fix rendering bug in flamegraph where child node could be wider than its parent
  • cudaprofiler: several improvements (https://github.com/zystem-io/zymtrace/pull/1448)
    • fix bug with force flushing incomplete kernels
    • don't delay process synchronizations during startup for CUDA processes, leading to better CPU stack traces for the first few frames
    • support building with CUDA 13
    • allow to bypass GPU presence checks in profiler
  • profiler: support zing 25.01 with jdk 11 and 17 (21 was already supported) (https://github.com/zystem-io/zymtrace/pull/1441)
  • symblib: Fix Rust function name demangling which failed in some cases
  • mcp: Suggest using local time instead of UTC
  • profiler: Reduce allocations in parseFDE()
    • brings parseFDE down from 60% of all allocs to 0.3%
  • mcp: enable collapse_go_system_frames, collapse_jvm_threads, filter_error_events, filter_unreported to reduce number of tokens in the flamegraph response
  • ui: add new display option collapse_go_system_frames
    • default enabled and inverted in UI: "Show Go system frames"
    • aggregates GC frames into "Garbage Collector"
    • aggregates scheduler frames into "Scheduler"
  • ui: fix frame filter for java standard library functions
    • it was previously filtering out functions too aggressively
  • profiler: improve zing symbolization
    • adds support for the GC in zing, leading to more debug information being resolved, and thus deeper / longer stack traces
  • profiler: place uprobes on GPU implant dynamically
    • implant is now detected and instrumented regardless of its path
  • cudaprofiler: several improvements
    • improve flushing of CUPTI activity records
    • don't delay process synchronizations during startup for CUDA processes, leading to better CPU stack traces for the first few frames
    • support building with CUDA 13
    • allow to bypass GPU presence checks in profiler

v25.11.3​

  • mcp: change flamegraph culling to root-based culling as in the UI
  • ui: new display option 'filter_error_frames'
    • filters error frames by default, thus allows the profiler to send error frames by default to improve CPU usage accuracy
  • all: add oidc/local auth support along with service tokens

v25.11.2​

  • profiler: increase the maximum number of unwound frames from 128 to 256
    • this avoids unwind errors with long stack traces and thus improves CPU attribution
  • profiler: improve log message when falling back from BTF to binary analysis
  • profiler: print out system info if attaching to tracepoints fails
  • mcp: add topentities as tool, resource and resource template

v25.11.0​

  • profiler: reworked PID reporting mechanism, significantly reducing CPU usage
    • Especially high impact on systems that spawn many short-lived processes
  • profiler: prefer DWARF symbols over Go symbols during automatic symbol upload
    • Improves symbol quality for Go executables with DWARF debug info when running the profiler with -dwarf argument
  • profiler: more efficient stack delta extraction for native executables
    • Significantly reduces the peak memory usage of the profiler in the presence of large native executables
  • ui: add new display option collapse_jvm_threads
    • aggregates GC threads into "Garbage Collector" (see with grouped by "Thread Name" in the flamegraph)
    • aggregates all GC frames into a single one called "Garbage Collector"
    • also aggregates JVM JIT frames and threads in the same fashion
    • turned on by default

v25.10.12​

  • profiler: optimize performance

v25.10.11​

  • ui: switch to relative mode if matches is used on column incompatible with absolute mode

v25.10.10​

  • profiler: fixed an issue that prevented the script name attribute to be set for GPU traces
  • ui: support matches for regex matches in advanced query mode

v25.10.9​

  • web: better errors if regex syntax is invalid in matches CEL query
  • backend: add local login and RBAC support with CRUDs
  • profiler: fix version matching logic for zing offsets
  • profiler: add support for more zing versions
    • we now additionally support these versions
      • JDK 1.8.X with zing 24.02.X
      • JDK 1.8.X with zing 24.08.X
      • JDK 1.8.X with zing 25.02.X
      • JDK 11.0.X with zing 23.02.X
      • JDK 11.0.X with zing 23.08.X
      • JDK 11.0.X with zing 24.02.X
      • JDK 11.0.X with zing 24.08.X
      • JDK 11.0.X with zing 25.02.X
      • JDK 17.0.X with zing 24.02.X
      • JDK 21.0.X with zing 24.01.X
      • JDK 21.0.X with zing 24.03.X
      • JDK 21.0.X with zing 24.04.X
      • JDK 21.0.X with zing 24.05.X
      • JDK 21.0.X with zing 24.07.X
      • JDK 21.0.X with zing 24.09.X
      • JDK 21.0.X with zing 24.10.X
      • JDK 21.0.X with zing 24.12.X
      • JDK 21.0.X with zing 25.01.X

v25.10.8​

  • ui: fix possible crash during flamegraph rendering

v25.10.7​

  • profiler: add support for OpenJDK 25
  • profiler: fix retry logic for fetching container name(s)
  • gpu: support CUkernels properly in addition to CUfuncs
  • ui: add detail pages for namespace, pod and deployment
  • profiler: fix aggregations for metrics (script names were wrongly merged)