Senior DevOps Engineer, Kubernetes – Datacenter Power

Posted:
10/13/2024, 2:45:57 PM

Location(s):
Texas, United States ⋅ California, United States

Experience Level(s):
Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

Workplace Type:
Remote

NVIDIA is looking for outstanding software engineers to help us expand our enterprise GPU management and monitoring tools. In this role, you will work closely with the broader NVIDIA team to design and build Linux-based management agents, CLI tools, and end-to-end integration solutions that combine GPUs with the rest of the data center software management ecosystem. You will also contribute to improving the development and release infrastructure of the team. We are focused on supporting NVIDIA products across HPC, cloud, and enterprise on both bare metal and virtualized platforms as the role of GPUs in all of these environments expands. Your contributions will ensure reliable and secure delivery of our data-center monitoring products. You will achieve those goals by maintaining and improving CI/CD pipelines on Jenkins, GitLab, and GitHub to build our tools and ensure their quality. You will also improve the quality of NVIDIA's offerings by maintaining and growing the list of static analysis tools used to test our products.

To succeed, you must have a strong Linux background, familiarity with state-of-the-art CI/CD, Docker, Shell/Python scripting, a proven work ethic, and strong attention to detail. You will be expected to jump in quickly and provide valuable contributions from day one. This is a dynamic work environment with many exciting opportunities awaiting. NVIDIA GPUs are central to many hot enterprise, cloud, and data-center trends. Come join us as we craft the future of accelerated computing and AI.

What you'll be doing:

  • Create and Maintain Helm Charts for custom software deployment. Create and Maintain development environments that use technologies such as k3d, kind, tilt, helmfile, etc.

  • Utilize and implement best practices for software delivery in Kubernetes environments.

  • Create and maintain CI/CD pipelines on Jenkins, GitLab, and/or GitHub

  • Improve and maintain integrations with static-analysis tools such as Coverity to ensure the quality of our products. Improve the reliability of CI/CD pipelines by handling platform issues

  • Interface with internal NVIDIA tooling to enable the signing and publishing of our products

  • Configure CI/CD runners and integrations with version control systems

  • Create and manage Infrastructure-as-Code tools like Terraform or Ansible for provisioning and managing infrastructure.

  • Collaborate and communicate with development team to understand requirements and implement efficient DevOps practices. Collaborate and communicate with system owners to understand deployment environments and requirements.

What we need to see:

  • BS or higher in Computer Science or equivalent experience.

  • 5+ years of meaningful industry experience with a strong DevOps background

  • Experience maintaining and debugging CI/CD pipelines on Jenkins/GitLab/GitHub. Experience with containerized environments (Docker, cri-o, podman).

  • Business level English. Outstanding written and verbal interpersonal skills

  • Strong motivation and commitment to learn new skills

  • Execute all aspects of the software development lifecycle

  • Ability to manage time in a fast, heavily multitasked environment

  • Experience with container orchestration platforms like Kubernetes, including availability and scaling solutions.

Ways to stand out from the crowd:

  • Development experience with Python, Go, C, C++, and/or Rust. Fluency in Bash scripting. Background with containers and common orchestration frameworks. Experience with generating and using static-analysis reports.

  • Familiarity with dynamic analysis tools and fuzzing techniques. Familiarity with C/C++ build environments and dependency management. Familiarity with Go build environments and dependency management.

  • Experience with Kubernetes and running Jenkins on Kubernetes. Knowledge of docker and runc internals. Knowledge of logging and monitoring solutions in Kubernetes.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you are creative and autonomous, we want to hear from you!

The base salary range is 148,000 USD - 276,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

NVIDIA

Website: https://www.nvidia.com/

Headquarter Location: Santa Clara, California, United States

Employee Count: 10001+

Year Founded: 1993

IPO Status: Public

Last Funding Type: Grant

Industries: Artificial Intelligence (AI) ⋅ GPU ⋅ Hardware ⋅ Software ⋅ Virtual Reality