GPU Kernel Dev & Perf Analysis Architect

Posted:
3/11/2025, 5:00:00 PM

Location(s):
Shanghai, Shanghai, China ⋅ Shanghai, China

Experience Level(s):
Mid Level

Field(s):
AI & Machine Learning ⋅ Software Engineering

NVIDIA is developing processor and system architectures that accelerate machine learning, automotive and high performance computing (HPC) applications. We are seeking a strong candidate to do GEMM kernel development and performance analysis for NVIDIA's new architectures. Your work will play a critical role in shaping the future of deep learning hardware and software, ensuring optimal performance for next-generation AI applications.  This position offers the opportunity to make a meaningful impact in a fast-moving, technology focused company.

What you'll be doing:

  • Design, develop, and optimize GEMM (General Matrix Multiply) kernels for NVIDIA's new architectures.

  • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs.

  • Conduct in-depth performance analysis of GPU kernels, including GEMM and other critical operations.

  • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency

  • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations.

  • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders.

  • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.

What we need to see:

  • 4+ years of industry experience in GPU programming or performance optimization for DL applications.

  • Hands-on experience in developing and optimizing GEMM (General Matrix Multiply) kernels.

  • Demonstrated experience in analyzing and improving the performance of GPU kernels, with measurable results (e.g., performance improvements, efficiency gains).

  • Expertise in CUDA programming for GPU acceleration.

  • Experience with performance profiling tools (e.g., NVIDIA Nsight).

  • Excellent communication skills, both written and verbal.

  • Strong organizational and time management abilities, with the ability to prioritize tasks effectively.

NVIDIA

Website: https://www.nvidia.com/

Headquarter Location: Santa Clara, California, United States

Employee Count: 10001+

Year Founded: 1993

IPO Status: Public

Last Funding Type: Grant

Industries: Artificial Intelligence (AI) ⋅ GPU ⋅ Hardware ⋅ Software ⋅ Virtual Reality