StrategyBrain CTO Joins KSUG.AI Toronto to Explore AI Observability and OpenTelemetry at Scale

Cloud-native infrastructure and AI apps are coming together. Groups like KSUG.AI are important in shaping the future of smart systems. Formerly K8SUG, KSUG.AI is now the KubeSmart & AI User Group. This community focuses on Kubernetes and cloud...
Liu, BoMay 21, 2025
StrategyBrain CTO Joins KSUG.AI Toronto to Explore AI Observability and OpenTelemetry at Scale

Cloud-native infrastructure and AI apps are coming together. Groups like KSUG.AI are important in shaping the future of smart systems. Formerly K8SUG, KSUG.AI is now the KubeSmart & AI User Group. This community focuses on Kubernetes and cloud-native architectures. It also explores multi-cloud strategies and the growth of AI and ML workloads.

This week, our CTO Kyle attended the 20th KSUG.AI meetup at AWS Toronto’s YYZ18 office. He shared ideas with engineers, developers, and architects. They are shaping the future of distributed AI infrastructure.

A Community of Innovators at the Intersection of Cloud and AI

The event featured expert talks from AWS and Randoli, including:

  • “Observability on EKS” by Preetam Rebello from AWS
  • “OpenTelemetry 101” by Rajith from Randoli provides developers with a solid base to use OTel in Kubernetes.

These sessions covered real-world uses and monitoring tools for microservices. They focused on how these connect with AI platforms and developer-friendly observability tools.

StrategyBrain’s Engagement

At StrategyBrain, we create AI Agent HR products that automate recruitment and HR workflows for global teams. Our systems use complex infrastructures. They depend on Kubernetes, observability, and fault-tolerant event streaming. For us, OpenTelemetry and EKS observability are vital.

During the meetup, Kyle engaged with AWS engineers and Kubernetes peers. He explained how StrategyBrain uses OpenTelemetry in multi-agent AI pipelines. In these systems, microservices talk to each other asynchronously. This setup helps maintain real-time responsiveness.

Technical Questions Raised by Kyle

In the Q&A, Kyle asked two important questions that highlight our engineering challenges:

In AI agent setups that use several EKS services and message queues, how can you pass trace context across asynchronous queues?

Here are some best practices using OpenTelemetry SDKs for better event-driven observability:

  • Use OpenTelemetry’s context propagation features.
  • Attach trace context to message headers.
  • Ensure each service extracts and injects context correctly.
  • Monitor all queues for trace context continuity.
  • Use consistent naming for trace and span IDs.
  • These steps can help maintain visibility across your systems.

AWS suggests finding a balance between sampling strategies and complete tracing for telemetry data in multi-agent AI workloads.

Here’s how:

  • Use sampling to reduce overhead and improve performance.
  • Implement complete tracing where precise data is crucial.
  • Adjust your approach based on workload needs and system capabilities.
  • This way, you can optimize data collection while maintaining system efficiency.

This is especially important for human-in-the-loop systems and for regulatory audits.

These questions sparked valuable discussions with AWS specialists and other participants.

Building in Public, Connecting with the Community

StrategyBrain is proud to be part of Toronto’s active technical community. Events like KSUG.AI are great for sharing lessons. They help discuss best practices and shape the future of AI observability.

We can’t wait to join more KSUG.AI sessions. We’re excited to share our journey of scaling AI agent systems together.

Tags:

Let's grow together

Experience the full power of our platform with a 10-day free trial

©Strategybrain Technology Ltd.
info@st​​rategybrain.ca