Site Reliability Engineer -SRE-SMTS

Posted:
2/22/2026, 11:09:20 PM

Location(s):
Hyderabad, Telangana, India ⋅ Telangana, India

Experience Level(s):
Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.

Job Category

Software Engineering

Job Details

About Salesforce

Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn’t a buzzword — it’s a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate about bettering business and the world through AI, driving innovation, and keeping Salesforce's core values at the heart of it all.

Ready to level-up your career at the company leading workforce transformation in the agentic era? You’re in the right place! Agentforce is the future of AI, and you are the future of Salesforce.

Senior Site Reliability Engineer Job Description

Job Category

Products and Technology

Job Details

Location: Hyderabad

Senior Site Reliability Engineer

Salesforce is seeking an engineering candidate to join the Site Reliability organisation in Hyderabad. Working closely with counterparts in the Infrastructure and R&D organisations, this organisation provides a global team of engineers monitoring cloud service availability and ready to swiftly repair any service-impacting issues. Seven days a week, 24 hours a day, in a follow-the-sun model, the Site Reliability team keeps the Salesforce cloud and our customers protected. As an SRE, you will be a key member of a team driving Salesforce’s operational resilience by engineering solutions that blend automation, observability, and AI-powered platforms. You will not only respond to incidents but proactively design systems that prevent them, applying software engineering principles to operations to reduce toil and improve reliability at scale. By leveraging cutting-edge DevOps practices within SRE function and AI-driven insights, you will help transform how services are built, monitored, and operated — ensuring that Salesforce delivers always-on, high-performance experiences to customers worldwide.

Mission

Build and run reliable, scalable, and efficient systems by applying software engineering principles to operations. Our mission is to ensure services are highly available, performant, and resilient — while continuously improving the balance between operational work and engineering innovation.

What We Do

Reliability as the Priority: Ensure that systems meet defined Service Level Indicators (SLIs) and Service Level Objectives (SLOs), using error budgets to guide engineering and release decisions.
Engineering for Operations: Apply software engineering practices — automation, monitoring, self-healing systems — to eliminate toil and improve operational efficiency.
Incident Management: Lead the coordinated response to incidents, drive fast recovery (low TTR), and ensure lasting improvements through blameless postmortems.
Continuous Improvement: Identify and remove sources of toil, enhance observability, and optimize systems to reduce Time to Detect (TTD) and Time to Restore (TTR).
Collaboration with Development: Partner with product and engineering teams early in the lifecycle to design, build, and operate systems that are reliable by default.
Long-Term Focus: Leverage AI-driven automation to eliminate manual workflows, enabling the team to focus on complex problem-solving and strategic innovation while reducing operational overhead to less than 20% of capacity..

Role Description:

Lead incident detection, response, and resolution—driving root cause analysis, postmortems, and proactive measures to ensure high uptime, rapid recovery, and prevention of future issues.
Drive the design and implementation of automation and self-healing systems to minimize manual intervention and eliminate recurring production issues.
Drive the development and enhancement of observability by improving monitoring, logging, and tracing with tools such as Grafana, ELK, and Datadog to enable proactive detection and resolution of issues.
Drive operational excellence by applying AI-driven automation and prompt engineering techniques to optimize workflows, eliminate toil, and enhance playbooks and runbooks.
Drive optimization of system performance, reliability, and cost-effectiveness through proactive monitoring and tuning.
Partner across clouds to align processes and foster a unified Salesforce SRE team.
Drive service resilience by leading post-incident reviews, systemic fixes through CARs, and ensuring customer-facing services maintain peak performance and reliability.
Ensuring that work carried out by the Site Reliability team is executed in such a way as to comply with the company’s internal compliance policy and directives.
Identifying opportunities and driving the creation of comprehensive technical epics that include well-defined problem statements, detailed project and implementation documentation, and clearly measurable business outcomes aligned with team objectives.
Work with and lead other members of the team in staying on top of key industry innovation and technology, and assist in team development growth.
Collaborate with engineering and product teams to define and uphold SLAs/SLOs, driving improvements in service reliability and customer experience.
Understanding of AI/ML concepts applied to operations (e.g., anomaly detection, predictive analysis).

Basic Requirements:

Proven experience in systems engineering and software engineering for large-scale, internet-facing services.
Hands-on expertise with containerized architectures (Docker, Kubernetes) and orchestration platforms.
Strong knowledge of distributed systems and Linux/Unix internals, with experience tuning performance and troubleshooting at scale.
Familiarity with large-scale internet service architectures (DNS, HTTP, Load Balancing, caching, etc.).
Proficiency in at least one programming language (Python, Go, Java, or C++) and strong skills in scripting/automation (Bash, Python, etc.).
Practical experience with observability stacks (Grafana, Prometheus, ELK, Datadog, or similar) to drive proactive monitoring and alerting.
Solid background in incident management, including on-call participation, root cause analysis, and postmortem practices.
Understanding of DevOps and SRE principles: SLIs/SLOs, error budgets, toil reduction, blameless culture.
Experience with workflow/orchestration tools (e.g., Airflow, Temporal, Argo Workflows, Luigi) for automating operational and data pipelines.
Familiarity with AI/ML in operations (e.g., anomaly detection, predictive analysis) and prompt engineering to optimize AI-driven automation workflows.
Strong communication skills, with the ability to lead during high-pressure incidents and collaborate effectively across teams.
Ability to work in a 24/7 global operations model, managing multiple priorities under time-sensitive conditions.
Growth mindset with curiosity to explore new technologies and drive continuous improvement.

Education:

BS or higher degree in Computer Science or Electrical Engineering plus relevant job-related experience

Preferred Qualifications:

Python certification
Red Hat Certification
AWS/GCP
Prior Chef/Puppet or automated deployment experience

Posting Statement

Salesforce.com and Salesforce.org are Equal Employment Opportunity and Affirmative Action Employers. Qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Headhunters and recruitment agencies may not submit resumes/CVs through this Web site or directly to managers. Salesforce.com and Salesforce.org do not accept unsolicited headhunter and agency resumes. Salesforce.com and Salesforce.org will not pay fees to any third-party agency or company that does not have a signed agreement with Salesforce.com or Salesforce.org.

Unleash Your Potential

When you join Salesforce, you’ll be limitless in all areas of your life. Our benefits and resources support you to find balance and be your best, and our AI agents accelerate your impact so you can do your best. Together, we’ll bring the power of Agentforce to organizations of all sizes and deliver amazing experiences that customers love. Apply today to not only shape the future — but to redefine what’s possible — for yourself, for AI, and the world.

Accommodations

If you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.

Posting Statement

Salesforce is an equal opportunity employer and maintains a policy of non-discrimination with all employees and applicants for employment. What does that mean exactly? It means that at Salesforce, we believe in equality for all. And we believe we can lead the path to equality in part by creating a workplace that’s inclusive, and free from discrimination. Know your rights: workplace discrimination is illegal. Any employee or potential employee will be assessed on the basis of merit, competence and qualifications – without regard to race, religion, color, national origin, sex, sexual orientation, gender expression or identity, transgender status, age, disability, veteran or marital status, political viewpoint, or other classifications protected by law. This policy applies to current and prospective employees, no matter where they are in their Salesforce employment journey. It also applies to recruiting, hiring, job assignment, compensation, promotion, benefits, training, assessment of job performance, discipline, termination, and everything in between. Recruiting, hiring, and promotion decisions at Salesforce are fair and based on merit. The same goes for compensation, benefits, promotions, transfers, reduction in workforce, recall, training, and education.

Salesforce

Website: https://www.salesforce.com/

Headquarter Location: San Francisco, California, United States

Employee Count: 10001+

Year Founded: 1999

IPO Status: Public

Last Funding Type: Post-IPO Equity

Industries: Apps ⋅ Cloud Computing ⋅ CRM ⋅ Enterprise Software ⋅ Information Technology ⋅ iOS ⋅ Mobile Apps ⋅ SaaS ⋅ Sales Enablement ⋅ Software

Senior Software Engineer

Global Payments • 5/6/2026 ⋅ India

Custom Software Engineer

Accenture • 5/19/2026 ⋅ India

Analista de Infrastrutura Backup

iOSCM • 2/3/2026 ⋅ Brazil

Senior AI Full‑Stack Engineer

Manulife • 5/19/2026 ⋅ Canada

Software Development / Tester

ABB • 5/4/2026 ⋅ Canada

Notify

postings

pricing

login

Site Reliability Engineer -SRE-SMTS

Senior Site Reliability Engineer Job Description

Job Category

Job Details

Mission

What We Do

Posting Statement

Salesforce

Related Postings

Senior Software Engineer

Custom Software Engineer

Analista de Infrastrutura Backup

Senior AI Full‑Stack Engineer

Software Development / Tester

Notify

postings

our prices

login

contact us

privacy policy