Senior Principal Site Reliability Engineer (SRE) (Remote/Flexible)

Posted:
10/29/2024, 2:41:38 PM

Location(s):
Massachusetts, United States

Experience Level(s):
Expert or higher ⋅ Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

Workplace Type:
Hybrid

Insulet started in 2000 with an idea and a mission to enable our customers to enjoy simplicity, freedom and healthier lives through the use of our Omnipod® product platform. In the last two decades we have improved the lives of hundreds of thousands of patients by using innovative technology that is wearable, waterproof, and lifestyle accommodating.

We are looking for highly motivated, performance driven individuals to be a part of our expanding team. We do this by hiring amazing people guided by shared values who exceed customer expectations. Our continued success depends on it!

Job Profile Title: Engineering Advisor

Business Title: Senior Principal Site Reliability Engineer (SRE)

Department:  8140 - G&A - Global Technology Ops & Security

FLSA Status:  Exempt            

Insulet started in 2000 with an idea and a mission to enable our customers to enjoy simplicity, freedom, and healthier lives through the use of our Omnipod® product platform. In the last two decades we have improved the lives of hundreds of thousands of patients by using innovative technology that is wearable, waterproof, and lifestyle accommodating.

We are looking for highly motivated, performance driven individuals to be a part of our expanding team. We do this by hiring amazing people guided by shared values who exceed customer expectations. Our continued success depends on it!

Position Overview:

Insulet is a mission driven company that develops extraordinary, innovative products that directly impact people’s lives and health. We are developing connected consumer medical solutions with a combination of hardware, software, mobile, cloud, and wearables for people living with diabetes and the people that support them. Our mission is to both simplify people’s lives while improving their outcomes.

Senior Principal Site Reliability Engineer (SRE) with a strong software engineering background. This role is pivotal in ensuring the reliability, scalability, and performance of our critical systems and services. The ideal candidate will have a deep understanding of SRE principles, a passion for automation, and a proven track record of leading technical teams.

Work closely with cross-functional teams to design, build, and operate robust systems that meet the needs of our customers and business objectives.

Responsibilities

  • Lead the adoption and implementation of SRE practices across the organization, promoting a culture of reliability and continuous improvement.
  • Develop and implement automation tools and frameworks to enhance system reliability and operational efficiency.
  • Design and maintain comprehensive monitoring and alerting systems to ensure the health and performance of applications and infrastructure.
  • Lead the response to high-severity incidents, conduct root cause analysis, and implement corrective actions to prevent recurrence.
  • Analyze system performance and reliability data to identify areas for improvement and implement optimization strategies.
  • Work closely with development, operations, and product teams to ensure seamless integration of SRE practices and to drive reliability improvements.
  • Mentor and train junior engineers in SRE best practices, develop a culture of knowledge sharing and continuous learning.
  • Conduct capacity planning and demand forecasting to ensure systems can handle future growth and spikes.
  • Maintain detailed documentation of SRE processes, tools, and best practices to ensure knowledge continuity and operational excellence.

Key Decision Rights

  • Authority to define and implement the technical strategy for SRE practices, including tooling, automation, and monitoring solutions.
  • Lead and make final decisions during high-severity incident responses, including root cause analysis and remediation actions.
  • Decide on the allocation of resources, including team assignments and budget for SRE initiatives.
  • Set and enforce performance standards and service level objectives (SLOs) for systems and applications.
  • Identify and implement process improvements to enhance system reliability and operational efficiency.
  • Evaluate and select third-party tools and services that support SRE practices and objectives.
  • Develop and approve training programs for the SRE team to ensure continuous skill development and knowledge sharing.

Required Leadership/Interpersonal Skills & Behaviors

  • Ability to set a clear vision and inspire the team to achieve long-term goals.
  • Making informed, timely decisions, especially during high-pressure situations.
  • Guiding and developing junior engineers, develop a culture of continuous learning and improvement.
  • Planning and executing strategies that align with organizational goals and drive reliability improvements.
  • Clearly articulating ideas, actively listening, and translating complex technical problems into understandable terms for non-technical stakeholders.
  • Working seamlessly with cross-functional teams, understanding different personalities and leveraging diverse skills to achieve common goals.
  • Navigating and resolving conflicts constructively, maintaining a positive team dynamic.
  • Proactively identifying issues and developing innovative solutions to complex problems.
  • Taking responsibility for the team’s performance and outcomes, ensuring high standards are maintained.

Required Skills and Competencies

  • Experience with observability tools such as Datadog, Prometheus, Dynatrace, Grafana, ELK Stack, or similar.
  • Proficiency in programming languages such as Python, Go, or Java.
  • Strong understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and container orchestration technologies (e.g., Docker , Kubernetes).
  • In-Depth knowledge of AWS services including VPC, Lambda, IAM, ELB, EC2, ECS, CloudWatch, API Gateway, S3, SQS, SNS, WAF and Route53
  • Experience with infrastructure as code tools such as Terraform, Ansible, or similar.
  • Excellent troubleshooting and problem-solving skills.
  • Strong communication and leadership skills, with the ability to collaborate effectively with cross-functional teams.
  • Experience leading and mentoring engineering teams is highly desirable.
  • Knowledge of security best practices and experience implementing security controls and measures.
  • Experience with chaos engineering and resilience testing.
  • Familiarity with AI/ML applications in operational processes.
  • Knowledge of security best practices and compliance requirements.

Education and Experience

  • Bachelor’s in computer science, Engineering, or a related field.
  • 14 years of experience in the field including 6+ Site Reliability Engineering, DevOps, or a similar role.
  • Proven experience architecting and managing highly available, scalable, and fault-tolerant systems.
     

Additional Information

  • This position is eligible for 100% remote working arrangements (may work from home/virtually 100%; may also work hybrid on-site/virtual as desired).
  • Travel is estimated at 10% but will flex depending on business need.

(Remote/ Flexible): This position is eligible for 100% remote working arrangements (may work from home/virtually 100%; may also work hybrid on-site/virtual as desired).

Additional Information:

The US base salary range for this full-time position is $163,700.00 - $246,050.00. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position in the primary work location in the US. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your Talent Acquisition Specialist can share more about the specific salary range for your preferred location during the hiring process. Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits.

At Insulet Corporation all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.