Kubex is seeking a Head of AI Optimization Engineering to lead the technical direction and hands-on development of our AI infrastructure optimization capabilities. This is a senior, hands-on technical leadership role reporting directly to the CTO.
You will act as a principal-level architect and engineer , owning the design and evolution of Kubex’s optimization solutions for Kubernetes-based environments running AI workloads, with a strong emphasis on GPU-accelerated inference . This role carries broad technical ownership and organizational influence, and we are looking for candidates interested in a position that provides both hands on and people-leadership opportunities.
This role is ideal for someone who combines deep, practical experience with GPU infrastructure and Kubernetes with the ability to reason about system-level trade-offs, optimization strategies, & real-world customer environments, and who remains excited to write and ship production code.
- Own the technical vision and architecture for Kubex’s AI infrastructure optimization capabilities, with a focus on Kubernetes-based environments running GPU-accelerated workloads.
- Lead the design of systems that automate the optimization of resource configurations and allocations across containers, nodes, GPUs, and autoscaling groups.
- Serve as a senior technical authority within the organization, guiding architectural decisions and influencing broader engineering strategy.
- Contribute directly to production code, remaining deeply hands-on in the design, implementation, and evolution of core platform components.
- Collaborate closely with other senior engineers to coordinate and execute complex software development initiatives.
- Prototype, validate, and productionize new technical approaches related to AI workload optimization.
- Apply deep expertise in NVIDIA GPU ecosystems , including:
- CUDA and GPU programming models
- Tensor vs. non-tensor core trade-offs
- Multi-Instance GPU (MIG) configurations and advanced GPU sharing strategies
- Device plugins, telemetry, and instrumentation required to support optimization algorithms
- Understand how customers deploy and operate AI workloads in production, from container configuration through node-level and cluster-level design.
- Work with Kubernetes autoscaling technologies (e.g., native autoscaling, Karpenter, …) and understand their interaction with GPU-backed nodes.
- Work with Kubex’s existing optimization frameworks and patented technologies, quickly building fluency and contributing to their evolution.
- Collaborate with internal experts on optimization algorithms while bringing strong systems intuition and real-world constraints into solution design.
- Identify opportunities to extend Kubex’s value beyond inference workloads, including potential future optimizations for training or hybrid workloads.
- Partner with Product Management to translate customer needs and market opportunities into actionable technical solutions.
- Engage directly with customers on architecture and design discussions.
- Represent Kubex externally through technical discussions, thought leadership, and industry engagement as appropriate.
- Champion high standards for engineering quality, correctness, observability, and operational excellence.
- Embrace and promote the use of AI-assisted development tools and workflows to accelerate software delivery and improve developer effectiveness.
- 10+ years of professional software engineering experience, including significant experience building complex, production systems.
- Deep, hands-on experience with GPU-accelerated infrastructure , particularly NVIDIA-based environments.
- Strong knowledge of Kubernetes, including how GPU-backed workloads are scheduled, scaled, and operated in real-world clusters.
- Practical experience with CUDA, GPU telemetry, and performance considerations for AI workloads.
- Proven ability to design and build systems that balance performance, cost efficiency, and operational reliability.
- Strong coding skills and a demonstrated commitment to remaining hands-on with production code.
- Excellent communication skills, with the ability to explain complex technical concepts to both internal and external audiences.
- Experience optimizing or operating large-scale AI inference platforms.
- Familiarity with advanced GPU sharing strategies, including MIG, and their implications for scheduling and performance.
- Exposure to optimization-based systems, scheduling, bin-packing, or resource allocation problems.
- Experience working with autoscaling frameworks such as Kubernetes HPA/VPA or Karpenter.
- Background in high-performance computing, large-scale distributed systems, or AI platforms at scale.
- Experience mentoring or leading senior engineers, with interest in future people leadership.
- Play a key role in shaping the future of AI infrastructure optimization.
- Work on technically challenging problems at the intersection of Kubernetes, GPUs, and AI workloads.
- Collaborate with a highly experienced, deeply technical team.
- Influence product direction, architecture, and external technical positioning.
- Flexible, remote-first culture focused on impact and innovation.
- Competitive compensation, equity, and benefits.