Deep Learning Solution Architect

Posted:
4/3/2026, 5:05:08 AM

Location(s):
Beijing, China ⋅ Shanghai, China

Experience Level(s):
Senior

Field(s):
AI & Machine Learning ⋅ Software Engineering

NVIDIA are seeking dynamic Solution Architects with specialized expertise in training Large Language Models (LLMs), implementing RAG workflows, and agentic inference. You will leverage the full NVIDIA software & hardware ecosystem to design, optimize, and deliver production-grade generative AI solutions for enterprise customers. With competitive salaries and a generous benefits package, we are widely considered to be one of the world’s most desirable employers! We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing. If you're a creative and autonomous person with a real passion for technology, we want to hear from you.

What You Will Be Doing:

  • Architect end-to-end solutions focused on LLM pretraining, fine-tuning, high-performance inference, RAG workflows, and agentic inference orchestration using NVIDIA’s hardware and software platforms.

  • Collaborate with customers to understand their LLM-related business challenges and design tailored solutions aligned with the NVIDIA ecosystem.

  • Lead LLM training, distributed optimization, and performance tuning to achieve optimal throughput, latency, and memory efficiency.

  • Design and integrate RAG workflows and agentic inference pipelines into customer systems; provide technical guidance on best practices.

  • Collaborate with NVIDIA engineering teams to provide feedback and support pre-sales technical activities (workshops, demos).

What We Need to See:

  • Master’s / Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience.

  • 4+ years hands-on experience in AI, focusing on open-source LLM training, fine-tuning, and production inference optimization.

  • Deep understanding of mainstream LLM architectures and proficiency in LLM customization via PyTorch, Hugging Face Transformers.

  • Solid knowledge of GPU computing, cluster architecture, and distributed parallel training/inference for LLMs.

  • Competency in agentic inference design and using AI agents to solve business challenges.

  • Strong communication skills, able to articulate complex technical concepts to technical and non-technical stakeholders.

Ways to Stand Out from the Crowd:

  • Hands-on experience with NVIDIA’s generative AI ecosystem (TRT-LLM, Megatron-LM, NVIDIA NeMo).

  • Advanced skills in LLM optimization (quantization, KV Cache tuning, memory footprint reduction).

  • Experience with Docker, Kubernetes for containerized LLM and agent workflow deployment on-prem.

  • In-depth knowledge of multi-GPU parallelism and large-scale GPU cluster management.

#deeplearning

NVIDIA

Website: https://www.nvidia.com/

Headquarter Location: Santa Clara, California, United States

Employee Count: 10001+

Year Founded: 1993

IPO Status: Public

Last Funding Type: Grant

Industries: Artificial Intelligence (AI) ⋅ GPU ⋅ Hardware ⋅ Software ⋅ Virtual Reality