Senior Site Reliability Engineer (SRE) – DBaaS Platform (Automation)

Posted:
2/26/2026, 7:24:35 PM

Location(s):
Karnataka, India ⋅ Bengaluru, Karnataka, India

Experience Level(s):
Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

Job Title: Senior Site Reliability Engineer (SRE) – DBaaS Platform (Automation)
Location: Bangalore
Department: Customer Success
Reports To: VP Customer Success
Role Overview
We are seeking a highly skilled Senior SRE to lead reliability engineering for our cloud-
native Database-as-a-Service (DBaaS) platform. This role will drive automation-first
operations, SRE agent architecture, AI-enabled incident acceleration, and SLO-driven
reliability governance across AWS, Azure, and GCP environments.
You will operate at the intersection of platform engineering, cloud infrastructure,
database reliability, and automation — building self-healing, scalable, and cost-efficient
systems.

Key Responsibilities
1. SRE Agent Architecture & Technical Ownership
 Design and own SRE automation agents for proactive monitoring, remediation,
and performance optimization.
 Build event-driven reliability frameworks integrated with observability platforms.
 Define extensible architectures for auto-detection, auto-healing, and intelligent
alert reduction.
2. Automation Roadmap Leadership
 Own the automation strategy across DBaaS lifecycle (provisioning, scaling,
patching, backup, DR).
 Drive infrastructure and operational automation maturity.
 Eliminate toil through scripting, tooling, and CI/CD integration.
3. Engineering-Driven Reliability & SLO Governance
 Define and manage SLIs, SLOs, and error budgets.
 Implement reliability scorecards and availability governance.
 Partner with Product and Engineering to embed SRE practices into platform
design.
4. AI-Enabled Operational Acceleration
 Integrate AI/ML-based anomaly detection and predictive scaling.
 Enable automated RCA enrichment using log analytics and telemetry intelligence.
 Drive AI-assisted runbooks and decision frameworks.
5. Strong Programming Expertise
 Develop automation frameworks using Python and/or Go.
 Build scalable microservices for reliability orchestration.
 Contribute to platform APIs and reliability tooling.

6. Infrastructure as Code (IaC) Mastery
 Architect and manage infrastructure using Terraform.
 Implement policy-as-code and compliance automation.
 Ensure consistent multi-cloud deployments.
7. Multi-Cloud Expertise
 Deep hands-on experience with AWS, Azure, and GCP.
 Design high-availability, multi-region architectures.
 Implement secure, scalable network and storage solutions across clouds.
8. Containerization & Orchestration
 Strong hands-on with Docker and Kubernetes.
 Build and manage stateful workloads in Kubernetes.
 Implement scaling, failover, and resilience patterns.
9. Cloud Networking & Security
 Strong understanding of VPC/VNet, peering, routing, firewalls, IAM, encryption.
 Implement Zero-Trust and least-privilege access models.
 Embed security into reliability workflows.
10. Database Reliability & High Availability
 Experience managing HA architectures for relational and NoSQL databases.
 Strong knowledge of replication, failover, backup, DR, PITR.
 Performance tuning and capacity planning expertise.
11. Incident Leadership & RCA Excellence
 Lead critical incident response (P1/P2).
 Conduct structured RCA and preventive action planning.
 Build post-incident automation improvements.
12. Cost Optimization & Operational Efficiency
 Implement FinOps practices for DBaaS workloads.
 Optimize compute, storage, and licensing costs.
 Drive performance-per-dollar improvements.
13. Cross-Team Technical Leadership
 Mentor junior SREs and platform engineers.
 Collaborate with Product, DBA, Security, and Dev teams.
 Influence architecture decisions with reliability-first mindset.

Required Qualifications
 8+ years in SRE / DevOps / Platform Engineering roles.
 3+ years in multi-cloud production environments.

 Strong programming expertise in Python and/or Go.
 Deep experience with Terraform and infrastructure automation.
 Hands-on Kubernetes production experience.
 Experience managing large-scale database platforms.
 Strong understanding of observability (metrics, logs, traces).

Preferred Qualifications
 Experience in DBaaS or SaaS platform companies.
 Experience with AI-driven monitoring/operations.
 Knowledge of distributed systems internals.
 Experience implementing SRE best practices at scale.

Key Competencies
 Systems thinking
 Automation-first mindset
 Bias for engineering over manual ops
 Data-driven decision making
 Strong ownership and accountability
 Executive-level communication during incidents