Senior DevOps Engineer

Posted:
8/10/2024, 5:00:00 PM

Location(s):
Raanana, Center District, Israel ⋅ Center District, Israel

Experience Level(s):
Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

NVIDIA is searching for a highly motivated DevOps engineer for the NVIDIA NMX team that is building a next gen Network management and Telemetry system in cloud and on-prem using modern design principles at internet scale.  NVIDIA NMX is a highly scalable, modern network operations toolset that provides visibility, troubleshooting, validation and telemetry of NVLink/NVSwitch InfiniBand and Ethernet fabrics in real time. NMX utilizes telemetry and delivers actionable insights about the health of a data center network, integrating the fabric into the DevOps ecosystem.  

What you'll be doing:  

  • The person will be part of the NVIDIA NMX team that is building the SaaS platform and the on-premise solution for network management and telemetry.

  • The responsibility specifically is for Devops, infrastructure and Site Reliability Engineering (SRE)  requirements for NMX.

  • Focus on efficiency by automating repetitive workflows.

  • Working on microservices based architecture.

  • Deploying and troubleshooting non-disruptive cloud operations with an emphasis on secure production infrastructure.

  • Continuous evaluation of existing system and driving improvements.

  • Managing deployment/upgrade for Operating Systems, Kubernetes(k8s) clusters and/or or other orchestration tools.

  • Day to day support for engineering activities with CI/CD tools like git, jenkins.

  • Efficiently multi-tasking on the different tracks to efficiently address evolving priorities .

What we need to see:  

  • 5+ years of experience in complex microservices based architectures  

  • Highly skilled in Kubernetes and Docker

  • Having good programing background in one high level language like Golang or python or equivalent experience  

  • Strong knowledge of NoSQL DB (e.g.  MongoDB), Kafka/Kafka Streams.

  • Experienced with modern deployment architecture for non-disruptive cloud operations including blue green and canary rollouts 

  • Infrastructure as code (IaC) skills in frameworks like Ansible & Terraform 

  • Expert in AWS

  • Knows best practices and discipline of managing and monitoring a highly available and secure production infrastructure 

  

Ways to stand out from the crowd:  

  • Skills in Linux/Unix Administration 

  • Experience with Prometheus/Grafana.

  • Experience with APM tools like Dynatrace, Datadog, AppDynamics, New Relic, etc.

  • Implemented highly scalable log aggregation systems in past using ELK stack or similar 

  • Implemented robust metrics collection and alerting infrastructure  

  

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative, passionate and self-motivated, we want to hear from you! NVIDIA is leading the way in ground-breaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.

NVIDIA

Website: https://www.nvidia.com/

Headquarter Location: Santa Clara, California, United States

Employee Count: 10001+

Year Founded: 1993

IPO Status: Public

Last Funding Type: Grant

Industries: Artificial Intelligence (AI) ⋅ GPU ⋅ Hardware ⋅ Software ⋅ Virtual Reality