Posted:
3/2/2026, 6:36:48 PM
Location(s):
Karnataka, India ⋅ Bengaluru, Karnataka, India
Experience Level(s):
Senior
Field(s):
DevOps & Infrastructure ⋅ Software Engineering
Job Title: SRE Lead – DBaaS Platform
Role Overview
We are seeking an experienced Site Reliability Engineering (SRE) Lead to strengthen
production reliability ownership for our Database-as-a-Service (DBaaS) platform. This role
will bring hyperscaler-grade (RDS-level) operational expertise to drive deep product
debugging, reliability engineering, and Dev collaboration across cloud-native database
services.
The SRE Lead will own platform stability, availability, performance, and incident excellence
across Azure/AWS/GCP-hosted database workloads.
Location :- Hyderabad
Department :- Customer Success
Reporting :- Senior Director Customer Success/SRE
Key Responsibilities
1. Production Reliability Ownership
Own end-to-end reliability, availability, and performance of the DBaaS platform.
Define and enforce SLIs, SLOs, and SLAs across all supported database engines.
Lead production incident response (P1/P2), RCAs, and long-term resilience
improvements.
Drive error budget governance with Engineering and Product teams.
2. Hyperscaler-Level Operational Excellence
Bring RDS/Cloud SQL/Azure SQL Managed Instance operational patterns into the
platform.
Implement automation-first operations (self-healing, auto-remediation, failover
orchestration).
Standardize HA/DR architectures across multi-region deployments.
Improve backup reliability, replication integrity, and failover predictability.
3. Deep Product Debugging & Dev Collaboration
Partner with Product Engineering for deep database engine-level debugging.
Troubleshoot complex performance bottlenecks (IO, CPU, locking, replication lag).
Support root cause analysis involving cloud infrastructure, storage, networking, and
database internals.
Influence platform architecture for operability and reliability.
4. Observability & Reliability Engineering
Build unified observability across DBaaS (metrics, logs, traces).
Define golden signals for database reliability.
Improve proactive anomaly detection and capacity forecasting.
Drive chaos testing and resilience validation practices.
5. Automation & Platform Hardening
Lead reliability automation (runbooks → code).
Improve provisioning, patching, upgrade, and scaling reliability.
Standardize configuration management and drift detection.
Enhance security posture aligned to enterprise compliance needs.
6. DevOps & Platform Governance
Champion SRE best practices across engineering teams.
Establish production readiness review frameworks.
Define release reliability gates for DBaaS components.
Mentor junior SREs and build a reliability-first culture.
Technical Requirements
Cloud Platforms (Mandatory – Multi-Cloud Preferred)
Deep hands-on experience with:
o AWS RDS / Aurora
o Azure SQL MI / Azure Database Services
o GCP Cloud SQL / AlloyDB
Strong understanding of cloud networking, storage, IAM, HA architectures.
Database Expertise
Strong operational knowledge of:
o Oracle
o PostgreSQL
o MySQL
o SQL Server
Experience handling large-scale production databases (TB+ workloads).
Performance tuning, replication troubleshooting, and backup recovery validation.
SRE & Platform Skills
Strong scripting: Python / Bash / Go.
Infrastructure as Code (Terraform / ARM / CloudFormation).
CI/CD pipelines and release automation.
Observability stack (Prometheus, Grafana, ELK, Datadog, etc.).
Kubernetes exposure preferred.
Leadership Expectations
10+ years overall experience, 5+ in SRE/Platform roles.
Prior experience in hyperscaler environments or cloud-native SaaS products.
Strong incident leadership and executive communication skills.
Ability to influence cross-functional stakeholders.
Experience building and leading SRE teams preferred.
Success Metrics (First 12 Months)
Reduction in P1/P2 incidents by X%.
Improved MTTR by X%.
Defined SLO framework implemented across all DBaaS services.
Automation coverage >70% of repeat operational tasks.
Zero critical audit non-compliance findings.
Why Join Us
Opportunity to build hyperscaler-grade DBaaS reliability.
Direct impact on mission-critical enterprise workloads.
Multi-cloud platform engineering exposure.
High visibility role working with Product, Engineering, and Leadership.
Website: https://www.tessell.com/
Headquarter Location: San Ramon, California, United States
Employee Count: 51-100
Year Founded: 2021
IPO Status: Private
Last Funding Type: Series A
Industries: Database ⋅ PaaS ⋅ SaaS ⋅ Software