Lead Site Reliability Engineer - Hyderabad

Posted:
6/25/2024, 5:00:00 PM

Location(s):
Telangana, India ⋅ Hyderabad, Telangana, India

Experience Level(s):
Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.

Job Category

Software Engineering

Job Details

About Salesforce

We’re Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too — driving your performance and career growth, charting new paths, and improving the state of the world. If you believe in business as the greatest platform for change and in companies doing well and doing good – you’ve come to the right place.

About Salesforce

We’re Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too — driving your performance and career growth, charting new paths, and improving the state of the world. If you believe in business as the greatest platform for change and in companies doing well and doing good– you’ve come to the right place.

Your Impact

Your ideas will help shape the direction of our organization - picture yourself working on groundbreaking technologies, solving thought-provoking technical problems, and driving customer success. For this role, the Salesforce Site Reliability Engineering team is looking for help to support our Marketing Cloud Growth and Account Engagement products.

You will be responsible for developing and maintaining our observability products, building tools and automation to reduce manual toil, directly responding to customer-impacting system failures and complex outages, and proactively address and improve performance and/or availability. You will focus on influencing and developing an availability and service ownership mindset at Salesforce, helping develop service level objectives, be involved in system design and focus on improving the overall reliability of our product.

Success in this role will entail a strong focus on technical influence, an engineering mindset, demonstrated experience with automation, as well as large-scale/distributed production engineering experience. We blend automation and engineering standard methodologies to remove operational toil, driving self-healing and resilience initiatives including game day exercises. Come help us build and grow the trust and reliability of Salesforce’s software infrastructure at one of the world’s largest Enterprise Cloud Computing companies!

Responsibilities

  • Complete service ownership, right from influencing product architecture to operating service seamlessly in production
  • Analyze and remediate production incidents for the Core Application Server and asynchronous processing platform
  • Develop deeper insights into platform incidents and influence with engineering backlog to address repeat incidents and prevent incidents proactively
  • Leverage AIOps platform to continuously improve anomaly detection, automate runbooks and drive our MTTD & MTTR goals
  • Understand customer use cases leveraging our platform and services and collaborate with the rest of the engineering organization to identify opportunities to achieve our availability goals
  • Engage with engineers developing features on our platform and provide consultative support and onboarding guidance
  • Collaborate with Systems engineering team for activities such as providing inputs for OS patching, JDK upgrade and software configuration
  • Collaborate with technical writers to create, update and review documentation for users and operators
  • Participate in the team’s 24x7 on-call rotation to address complex problems in real-time and keep services operational and highly available
  • Continuously raise standards of engineering excellence by implementing best DevOps practices
  • Champion a culture and work environment that promotes diversity and inclusion
  • Lead, collaborate, communicate, and mentor

Required Skills

  • Knowledge of OO programming and concepts and experience coding in Java, C++ or Python
  • Ability to debug complex distributed systems to understand system design with an eye for performance and scalability bottlenecks and provide recommendations to optimize code
  • In-depth, hands-on experience with Linux, networking, server, and cloud architectures
  • Exposure to container related technologies such as Kubernetes, Docker, etc.
  • Proficiency with source control, continuous integration, and testing pipelines

Preferred Skills

  • Overall 10+ years experience and 5+ years in a production engineering/DevOps/SRE or similar role working on high scale distributed systems
  • Strong background in open source software is preferred
  • Experience analyzing heap dumps
  • Experience instrumenting code and profiling applications
  • Experience evaluating and interpreting large volumes of production data to know efficiency, latency, memory and CPU utilization
  • Experience with messaging platforms
  • Experience with AWS or another cloud PaaS provider
  • Experience in configuration management technologies such as Chef, Puppet or Ansible
  • Strong problem-solving, troubleshooting and analytical skills clearly demonstrated in past projects
  • Solid understanding of configuration, deployment, management and maintenance of large cloud-hosted systems; including auto-scaling, monitoring, performance tuning, troubleshooting and disaster recovery
  • Understanding of Java Virtual Machine technology and ability to tune and debug issues related to compilers, Garbage collectors

Accommodations

If you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.

Posting Statement

At Salesforce we believe that the business of business is to improve the state of our world. Each of us has a responsibility to drive Equality in our communities and workplaces. We are committed to creating a workforce that reflects society through inclusive programs and initiatives such as equal pay, employee resource groups, inclusive benefits, and more. Learn more about Equality at www.equality.com and explore our company benefits at www.salesforcebenefits.com.

Salesforce is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Salesforce does not accept unsolicited headhunter and agency resumes. Salesforce will not pay any third-party agency or company that does not have a signed agreement with Salesforce.

Salesforce welcomes all.

Salesforce

Website: https://www.salesforce.com/

Headquarter Location: San Francisco, California, United States

Employee Count: 10001+

Year Founded: 1999

IPO Status: Public

Last Funding Type: Post-IPO Equity

Industries: Apps ⋅ Cloud Computing ⋅ CRM ⋅ Enterprise Software ⋅ Information Technology ⋅ iOS ⋅ Mobile Apps ⋅ SaaS ⋅ Sales Enablement ⋅ Software