SRE

Posted:
3/11/2025, 9:55:04 PM

Location(s):
Arizona, United States ⋅ Scottsdale, Arizona, United States

Experience Level(s):
Mid Level ⋅ Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

Important Information

Experience: More than 4 years
Job Mode: Full-time
Work Mode: Hybrid

Job Summary

Site Reliability Engineering (SRE) is a discipline that blends software engineering with infrastructure and operations, aimed at building scalable and highly reliable software systems.
Focus on application monitoring, emergency response, and change management to ensure reliability and efficiency.
Collaborate with development teams throughout the software lifecycle to solve system-related issues and automate routine tasks.
Enhance system reliability, scalability, and performance by leveraging modern tools and processes.

Responsibilities and Duties

Application Monitoring: Utilize tools and automation for continuous application monitoring and reliability.
Emergency Response: Respond promptly to emergency incidents, perform root cause analysis, and resolve ongoing production issues.
Change Management: Manage and streamline release and change management processes to improve system performance.
Collaboration: Partner with development teams to solve system issues, automate routine tasks, and eliminate toil.
Reliability and Scalability: Ensure systems are highly reliable, scalable, and efficient to meet performance standards.

Qualifications and Skills

Strong understanding of monitoring tools such as Azure Monitoring, App Insights, Prometheus, and Grafana.
Experience with Infrastructure as Code tools like Terraform, ARM/Bicep, or Pulumi.
Proficiency in release management tooling such as ArgoCD, Harness, and Octopus.
Familiarity with incident alert tools like PagerDuty or Opsgenie.
Expertise in container orchestration tools like Kubernetes and AKS.
Proficiency in scripting (C#, Python, Bash, PowerShell -one of them is mandatory)
Strong collaboration and problem-solving abilities to resolve system issues effectively.
Knowledge of project tracking and version management tools like JIRA, SVN, and GitHub.

Role-specific Requirements

Proven experience in application monitoring and automated reliability processes.
Strong background in managing system reliability and performing root cause analysis during emergency responses.
Hands-on experience in change management processes and production environment releases.
Advanced knowledge of tools and practices for infrastructure automation and incident handling.
Familiarity with scalable system architecture principles and best practices.

Technologies

Monitoring Tools: Azure Monitoring, App Insights, Prometheus, Grafana
Infrastructure as Code: Terraform, ARM/Bicep, Pulumi
Release Management Tools: ArgoCD, Harness, Octopus
Incident Alert Tools: PagerDuty, Opsgenie
Container Orchestration: Kubernetes, AKS
Project Management Tools: JIRA, SVN, GitHub
Scripting: C#, Python, Bash or PowerShell

Skillset Competencies

Advanced monitoring and incident management techniques.
Infrastructure as Code and automation of routine workflows.
Expertise in release and change management processes.
Strong knowledge of container orchestration and scalable system design.
Excellent communication, collaboration, and problem-solving skills.
Ability to work effectively in cross-functional and virtual teams.

About Encora

Encora is a trusted partner for digital engineering and modernization, working with some of the world’s leading enterprises and digital-native companies. With over 9,000 experts in 47+ offices worldwide, Encora offers expertise in areas such as Product Engineering, Cloud Services, Data & Analytics, AI & LLM Engineering, and more. At Encora, hiring is based on skills and qualifications, embracing diversity and inclusion regardless of age, gender, nationality, or background.

Encora Digital Inc

Website: https://encora.com/

Headquarter Location: Scottsdale, Arizona, United States

Employee Count: 10001+

Year Founded: 2003

IPO Status: Private

Last Funding Type: Private Equity

Industries: Big Data ⋅ Cloud Computing ⋅ Software

Senior RL/ML Software Engineer

Applied Intuition • 2/5/2025 ⋅ United States

Engineering Manager - Infrastructure Lifecycle SRE

Klaviyo • 3/7/2025 ⋅ United States

Sr. Solutions Engineer (Full Remote/WorkFromHome)

Hashicorp • 1/19/2025 ⋅ Japan

Architecture Assessment Lead

Accenture • 3/23/2025 ⋅ India

Sr. SWE, Order Management

TCGplayer Inc • 2/24/2025 ⋅ United States

Notify

postings

pricing

login

SRE

Encora Digital Inc

Related Postings