ML Infrastructure Engineer, AI Research Team

Posted:
12/9/2024, 1:36:12 AM

Location(s):
California, United States ⋅ Mountain View, California, United States

Experience Level(s):
Mid Level

Field(s):
AI & Machine Learning ⋅ Software Engineering

Workplace Type:
Hybrid

Who we are

Gatik, the leader in autonomous middle-mile logistics, is revolutionizing the B2B supply chain with its autonomous transportation-as-a-service (ATaaS) solution and prioritizing safe, consistent deliveries while streamlining freight movement by reducing congestion. The company focuses on short-haul, B2B logistics for Fortune 500 retailers and in 2021 launched the world’s first fully driverless commercial transportation service with Walmart. Gatik's Class 3-7 autonomous trucks are commercially deployed across major markets, including Texas, Arkansas, and Ontario, Canada, driving innovation in freight transportation. 

The company's proprietary Level 4 autonomous technology, Gatik Carrier™, is custom-built to transport freight safely and efficiently between pick-up and drop-off locations on the middle mile. With robust capabilities in both highway and urban environments, Gatik Carrier™ serves as an all-encompassing solution that integrates advanced software and hardware powering the fleet, facilitating effortless integration into customers' logistics operations. 

About the role

We're currently looking for a motivated ML infrastructure engineer to build, maintain, and improve scalable distributed ML training and inference. In this pivotal role, you'll be instrumental in designing and refining the data and ML pipelines for scaled distributed training and validation of ML models. You will collaborate with a team of experts in AI, robotics, and software engineering to push the boundaries of what's possible in autonomous trucking.

What you'll do

  • Own and lead the exploration of the latest technology of distributed (multiNode multiGPU) training and inference optimization
  • Build scalable and robust distributed ML training and inference pipelines
  • Develop model benchmarking processes and tools
  • Adjust frameworks and interfaces to accelerate machine learning development and maximize the utilization of hardware capabilities
  • Develop the infrastructure for data augmentation pipelines and synthetic data generation
  • Collaborate closely with the AI Research team and DevOps team on preparing required assets and tools
  • Adopt state-of-the-art open-source models in AV into the distributed training and inference pipelines

What we're looking for

  • 3+ years of production or research experience in ML Infra, distributed training, model inference or GPU programming
  • Ability to understand deep learning algorithms, e.g. in computer vision, natural language processing, behavior planning, mapping
  • Familiarity with Azure/AWS/GCP cloud products for MLOps pipelines 
  • Proficiency with Kubernetes clusters and distributed compute assets
  • Experience with DDP and model parallelization techniques
  • Strong foundation in data structures, algorithm design, and complexity analysis
  • Expertise in programming languages and tools critical for high-performance computing in Python/C++ and machine learning including Deep Learning frameworks like TensorFlow/PyTorch/JAX
  • Strong communication and teamwork skills
  • Readiness to explore and promote cutting edge technologies in ML Infrastructure domain and beyond
  • You are passionate about Autonomous Driving!

Bonus Points

  • Experience with Azure AML and related products
  • Experience with CUDA, Cublas, Cudnn or any other Nvidia SDKs
  • Experience with model quantization or pruning
  • Experience with compilers, esp. ML compilers (e.g. TensorRT, Triton, XLA, Clang)
  • Experience with AI algorithms and hardware codesign (e.g. Depthwise Conv, Flash Attention, Sparse and Deformable Attention)
  • Experience with distributed training speedup (e.g. FSDP, DeepSpeed, Horovod)

More about Gatik

Founded in 2017 by experts in autonomous vehicle technology, Gatik has rapidly expanded its presence to Mountain View, Dallas-Fort Worth, Arkansas, and Toronto. As the first and only company to achieve fully driverless middle-mile commercial deliveries, Gatik holds a unique and defensible position in the AV industry, with a clear trajectory toward sustainable growth and profitability.

We have delivered complete, proprietary AV technology - an integration of software and hardware - to enable earlier successes for our clients in constrained Level 4 autonomy.  By choosing the middle mile – with defined point-to-point delivery, we have simplified some of the more complex AV challenges, enabling us to achieve full autonomy ahead of competitors. Given extensive knowledge of Gatik’s well-defined, fixed route ODDs and hybrid architecture, we are able to hyper-optimize our models with exponentially less data, establish gate-keeping mechanisms to maintain explainability, and ensure continued safety of the system for unmanned operations.

Visit us at Gatik for more company information and Careers at Gatik for more open roles.

Notable News

Taking care of our team

At Gatik, we connect people of extraordinary talent and experience to an opportunity to create a more resilient supply chain and contribute to our environment’s sustainability. We are diverse in our backgrounds and perspectives yet united by a bold vision and shared commitment to our values. Our culture emphasizes the importance of collaboration, respect and agility.

We at Gatik strive to create a diverse and inclusive environment where everyone feels they have opportunities to succeed and grow because we know that together we can do great things. We are committed to an inclusive and diverse team. We do not discriminate based on race, color, ethnicity, ancestry, national origin, religion, sex, gender, gender identity, gender expression, sexual orientation, age, disability, veteran status, genetic information, marital status or any legally protected status.