Sustainable Talent is partnering with Nvidia a global leader who's been transforming computer graphics, PC gaming, and accelerated computing for over 25 years. We are looking for a Data Center Engineer to support our client's on-premise, private cloud infrastructure team. This is a W-2 full-time contract based in Santa Clara, CA. We offer competitive pay $80-$90/hr based on factors like experience, education, location, etc. and provide full benefits, PTO, and amazing company culture!
In this role, you will be faced with the challenge of providing and maintaining a compute farm of systems which includes Builders, Packagers, and Testers that act as a test-bed for our developers worldwide to test various Nvidia hardware and software prior to release. The environment is huge, the scale massive, and the ask enormous! We need YOU to help US maintain and drive our world-class DCs/Labs to produce timely, deterministic results for our Engineers and expectant Users worldwide!
What You'll Do:
- Collaborate closely with engineering teams (system architects, hardware/software engineers, QA, and more) to design, develop, debug, and release next-generation products.
- Manage and maintain a high-performing Compute Farm of builders, packagers, testers, and core infrastructure.
- Ensure availability targets are consistently met and lead system recovery efforts.
- Deploy and qualify systems while supporting exciting new technology bring-ups.
- Oversee inventory and lifecycle management for NVIDIA's assets across data centers and labs.
- Gather critical metrics and create Standard Operating Procedures (SOPs) documentation.
- Maintain a world-class, safe, and well-organized environment in our data centers and labs.
- Troubleshoot Linux/Windows, hardware, and infrastructure issues alongside engineers and platform operations teams.
- Plan, deploy, and maintain on-premises private cloud infrastructure, collaborating with datacenter and network engineering teams.
- Implement efficiency improvements to maximize availability, throughput, and test accuracy while meeting SLAs and KPIs.
- Represent the team in meetings with internal stakeholders and contribute to global operations.
What We Need to See:
- Associate’s or Bachelor’s Degree in Engineering/Technical Major (or equivalent experience).
- 5+ years of experience in data centers or large engineering labs.
- Familiarity with SCMs like GIT/Perforce.
- Proficiency in DCIM (Nautobot, etc.) and scripting (shell, Python, Ansible).
- Working knowledge of protocols/services like TCP/IP, DNS, NFS, SSL, etc.
- Experience with Windows, Linux, and Mac operating systems.
- Hands-on experience with PCBs, GPUs, and system deployments.
- Exceptional communication skills, both written and verbal.
- Ability to explain technical concepts to non-technical audiences.
- Strong problem-solving skills and a collaborative spirit.
What Makes You Stand Out:
- Experience managing HPC clusters using tools like BCM and Slurm.
- Hands-on knowledge of OpenStack.
- Relevant certifications such as CCNA or equivalent.
- Strong background in Windows and Linux administration, with an understanding of dense datacenter design, including compute, storage, and networking.
- Experience with hypervisors and VM applications.
- Knowledge of DC infrastructure with an emphasis on liquid cooling.
- A track record of technical curiosity and innovation.
- Mechanically inclined and comfortable with tools and physical tasks.
- Energetic, enthusiastic, and the understanding of what it takes to get the team to the finish line.
- Willing to go the extra mile to get the job done!
- This is an onsite contract position, and will require local travel to DCs within Santa Clara.
Sustainable Talent is a M/F+, disabled, and veteran equal employment opportunity and affirmative action employer.