Site Reliability Engineer

Posted:
8/20/2025, 10:15:03 PM

Location(s):
Haryana, India ⋅ Tamil Nadu, India ⋅ Chennai, Tamil Nadu, India ⋅ Gurugram, Haryana, India

Experience Level(s):
Mid Level ⋅ Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

Join us as a Site Reliability Engineer

  • In this key role, you’ll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services
  • You’ll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to deliver change in a safe and secure way
  • This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development
  • We’re offering this role at senior analyst level

What you'll do

As a Site Reliability Engineer, you’ll be supporting colleagues and feature team members to meet defined service level objectives and continually improve systems and environments. You’ll also be proactively contributing new ideas and innovations to meet short term and longer term goals while balancing and managing risk.

We’ll look to you to ensure the availability, performance, and scalability of the services, as well as monitoring systems and applications to proactively identify and resolve issues before they impact end-users. You’ll also be responding to incidents promptly and effectively, using ITIL principles to manage escalations, root cause analysis, and resolution.

A typical day will involve:

  • Documenting incidents thoroughly for future reference and improvement
  • Implementing and enhancing monitoring, logging, and alerting systems to provide full visibility into the health and performance of applications
  • Automating repetitive, manual tasks and processes to reduce toil and improve operational efficiency
  • Working closely with development and operations teams to understand application flow and provide support in troubleshooting complex technical issues
  • Participating in post-incident reviews and propose actionable improvements based on findings

The skills you'll need

We’re looking for someone with at least four years of experience as a Site Reliability Engineer or similar role, ideally in a banking domain, with a solid understanding of production support. You’ll need basic proficiency in SQL for database interactions, as well as an understanding of application flow such as Java Microservices, and architecture to troubleshoot effectively.

You’ll bring incident, problem and change management experience, paired with production support experience. You’ll also need knowledge of Cloud Services, preferably AWS, as well as experience of monitoring and observability tools such as Splunk, DX-APM or similar technologies.

Additionally, you'll need:

  • Familiarity with ITIL frameworks, particularly in incident and problem management
  • Knowledge of scripting languages, such as Python or Bash, to automate repetitive tasks and improve operational functions
  • The ability to provide on-call support on rotation basis
  • Excellent problem-solving abilities, strong communication skills, and a collaborative mindset to work effectively within teams
  • Experience of route cause analysis of incidents, as well as coordinating with development, infrastructure platform teams
  • Experience of non-production and production environment deployments, and CI/CD support

Hours

45

Job Posting Closing Date:

28/08/2025