Site Reliability Engineer, AVP

Posted:
7/14/2025, 5:00:00 PM

Location(s):
Gurugram, Haryana, India ⋅ Karnataka, India ⋅ Haryana, India

Experience Level(s):
Expert or higher ⋅ Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

Join us as a Site Reliability Engineer

  • You’ll manage the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ)
  • We’ll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of CCJ across applications
  • This is a great chance to work in a supportive environment with opportunities to advance your personal and career development
  • We're offering this role at associate vice president level

What you'll do

As a Site Reliability Engineer, you’ll collaborate with feature teams to understand application changes, participate in delivery activities, and address production issues to assist in the delivery of change that does not negatively affect the customer experience. You'll contribute to site reliability operations which will include production support, incident response, on-call rota, toil reduction, and application performance. You'll also proactively lead improvement to release quality into production and provide highly available, performing, and secure production systems.

Other responsibilities will include:

  • Delivering automation solutions to minimise and eliminate manual tasks associated with maintaining and supporting the applications
  • Ensuring in-depth understanding of the full tech stack on which the application resides and depends on
  • Identifying alerting and monitoring requirements for an application, based on sound understanding of customer journeys
  • Evaluating the resilience of the end-to-end tech stack on which the applications depend, and addressing weaknesses
  • Seeking to reduce frequency of hand-offs in the end-to-end resolution of customer-impacting incidents

The skills you'll need

To succeed in this role, you’ll need experience of supporting live production services serving customer journeys with a demonstrable knowledge of ITIL processes and IT Security principles along with tools and techniques to prevent compliance breaches. You'll have hands on experience with Azure Cloud and full-stack observability using tools such as Log Analytics, Application Insights, and Grafana.

You’ll also need:

  • Deep understanding of SRE concepts, including SLIs, SLOs, SLAs, error budgets, and reliability engineering best practices.
  • Expertise in observability tools such as Prometheus, Grafana, CloudWatch.
  • Strong hands-on experience with any of the monitoring tools with a proven ability to set up and manage monitoring and alerting systems.
  • Proficiency in cloud platforms
  • Strong scripting and automation skills, with proficiency in Python and Bash.
  • Hands-on experience with infrastructure operations and observability.
  • Significant experience with Kubernetes, including running, managing, and troubleshooting containerized workloads.
  • Experience working with version control systems like GitHub and implementing CI/CD pipelines is a plus.

Hours

45

Job Posting Closing Date:

28/07/2025