AI Platform Engineer

Posted:
5/13/2026, 3:42:41 AM

Location(s):
Bengaluru, Karnataka, India ⋅ Karnataka, India

Experience Level(s):
Junior ⋅ Mid Level ⋅ Senior

Field(s):
AI & Machine Learning ⋅ DevOps & Infrastructure ⋅ Software Engineering

Workplace Type:
Hybrid

At eBay, we're more than a global ecommerce leader — we’re changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We’re committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts.

Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work — every day. We're in this together, sustaining the future of our customers, our company, and our planet.

Join a team of passionate thinkers, innovators, and dreamers — and help us connect people and build communities to create economic opportunity for all.

About eBay AI Platform

At eBay, we are building a next-generation AI platform to power intelligent, AI-driven experiences across our global marketplace. Our platform runs on large-scale Kubernetes-based compute infrastructure spanning on-premise GPU clusters, high-performance training environments, and hybrid cloud bursting to deliver GPU capacity at scale.

We focus on building resilient, high-performance Kubernetes-native infrastructure—spanning Custom Resource Definitions and operators, a custom AI-aware GPU scheduler, multi-NIC RDMA networking, GPU pool management, and distributed Ray compute via KubeRay—deployed across multiple availability zones with a global dispatcher for unified workload placement across training and inference pools.
 

About the Role

We are looking for an experienced Software Engineer specializing in Kubernetes and GPU Infrastructure to design and operate the foundational systems that power eBay's AI platform. You will own critical layers of our infrastructure—from Kubernetes CRD-based automation and a custom AI-aware GPU scheduler to RDMA-optimized multi-NIC GPU clusters and large-scale training environments—enabling our ML teams to train and serve AI models at eBay scale.

You will work on Kubernetes operator development, Gateway API and hybrid cloud networking, multi-NIC RDMA fabric design, a global GPU scheduler with cross-availability-zone dispatch, GPU pool management with provisioned throughput integration, topology-aware workload placement, and KubeRay infrastructure—partnering closely with ML Platform, AI Research, and Networking teams.
 

Key Responsibilities

  • Design and build Kubernetes Custom Resource Definitions (CRDs) and operators for ML workloads, GPU node pools, and RayService CRDs.
  • Architect and operate Kubernetes networking layers including Gateway API, Service Load Balancers, and API gateways for hybrid cloud connectivity.
  • Design, deploy, and operate multi-NIC Kubernetes clusters with RDMA-enabled networking.
  • Implement and optimize RDMA networking using GPUDirect RDMA, RoCE, and InfiniBand for distributed GPU workloads.
  • Build and operate a custom AI-aware GPU scheduler with topology-aware placement, preemption, and GPU defragmentation.
  • Design and manage GPU pool management systems across on-premise and cloud-burst GPU environments.
  • Deploy and operate KubeRay infrastructure for distributed Ray clusters supporting training and inference workloads.
  • Implement cloud bursting and spot or preemptible GPU scheduling to improve utilization.
  • Automate GPU asset provisioning, node configuration, and cluster lifecycle management using Infrastructure-as-Code and GitOps.
  • Implement networking policies, multi-tenant isolation, RBAC, and security controls across the Kubernetes environment.
  • Build observability for GPU utilization, NCCL communication, scheduler decisions, and network throughput.
  • Collaborate with ML Platform, AI Research, and Networking teams to optimize infrastructure for training and online inference.

     

What We’re Looking For

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • 5+ years of experience building distributed systems or infrastructure platforms with deep Kubernetes expertise.
  • Strong programming skills in Go and/or Python.
  • Familiarity with Kubernetes controller development frameworks such as Kubebuilder, Operator SDK, or controller-runtime.
  • Deep understanding of Kubernetes internals including the API server, etcd, scheduler, controller manager, and kubelet.
  • Hands-on experience designing Kubernetes networking including Gateway API, CNI plugins, service load balancing, and hybrid cloud architectures.
  • Experience designing and operating multi-NIC Kubernetes clusters using NVIDIA Network Operator, SR-IOV device plugins, or equivalent tooling.
  • Strong understanding of RDMA networking protocols and NCCL configuration for distributed GPU workloads.
  • Experience with Kubernetes GPU scheduler frameworks and GPU pool management, including MIG partitioning and preemption policies.
  • Hands-on experience deploying and operating KubeRay for distributed Ray workloads.
  • Experience with GPU asset lifecycle management, bare-metal provisioning automation, and GitOps-based CD tooling such as Argo CD.
  • Familiarity with GPU technologies including NVIDIA CUDA, NVLink, NVSwitch, device plugins, and the GPU Operator ecosystem.
  • Experience with observability tooling such as Prometheus, Grafana, and OpenTelemetry.
  • Strong debugging and performance optimization skills across GPU driver stacks, RDMA networking, and distributed Kubernetes infrastructure.

Additional Details

eBay is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex, sexual orientation, gender identity, veteran status, and disability, or other legally protected status. If you have a need that requires accommodation, please contact us at [email protected]. We will make every effort to respond to your request for accommodation as soon as possible. View our accessibility statement to learn more about eBay's commitment to ensuring digital accessibility for people with disabilities.

 

We use cookies to enhance your experience and may use AI tools for administrative tasks in the hiring process. To learn how we handle your personal data and use AI responsibly, please visit our Talent Privacy Notice, Privacy Center, and AI Hiring Guidelines.