NVIDIA is looking for a versatile Performance Research and Analysis student to join our Performance group. The ideal candidate will participate the pioneering ETH AI technologies and workloads, involving performance testing, profiling, analysis, research and optimizations focus on the collectives communication, Networking, and Congestions Control algorithms.
What you'll be doing:
- You will work on NVIDIA cutting edge technologies and solutions with high scale of AI fabrics with multiple GPUs, SuperNICs, and Switches
- Define performance test planning for various solutions, set performance expectations, and work to reach this speed of light targets.
- Test performance of various benchmarks and new features and analyze the results.
- Develop automation tools, and frameworks for better and efficient performance research, data collection and analysis.
- Performance analysis and diagnostics for SW / HW / System / Networking to root cause performance bottlenecks
- Work closely with different HW and SW groups to propose and drive performance optimizations
What we need to see:
- Current Computer Science or Computer Engineering student.
- Programming Languages: Python, Bash and C languages
- Linux knowledge: Have a general understanding of Linux operation system concepts
- Networking protocols knowledge and experience
- AI training and inference knowledge
- Computer structures knowledge
- Quick learning ability, and strong analytical skills with attention to details.
- Strong problems solving and debugging skills
- Independent worker, which promote his tasks and ownerships
- Great teammate with good communication and social skills
- Clear verbal and written communication
Ways to stand out from the crowd:
- In-depth knowledge in Networking protocols
- Knowledge in Neural Networks and LLMs Deep Learning domains.
- Full System knowledge and understanding (CPU, GPU, Memory, PCI)
- Knowledge in performance analysis methodologies and SW tools.
- Strong Python programming and scripting skills