Director Reliability Engineering Squad

Posted:
12/18/2024, 4:25:22 AM

Location(s):
Connecticut, United States ⋅ North Carolina, United States ⋅ Hartford, Connecticut, United States ⋅ Charlotte, North Carolina, United States

Experience Level(s):
Senior

Field(s):
DevOps & Infrastructure ⋅ IT & Security ⋅ Software Engineering

Director & Reliability Engineering - IE06IE

We’re determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we help shape the future.   

         

The Hartford is seeking an experienced and highly motivated Technology Leader who will be responsible for leading a “(Site) Reliability Engineering (RE) Squad” within the Developer Ecosystem space. This leader will be accountable for building, adopting, and maturing the RE tools, practices, and automation of DevSecOps, Quality, and Release Engineering. The Leader and his/her team will be responsible for building, optimizing, and maintaining the cloud automation capabilities to enable infrastructure provisioning, application availability, testing, quality, application deployment, resiliency, recovery, and efficiency of IT applications. 

Successful candidates will have experience in driving cloud transformation initiatives and sustaining top quartile operating standards by leveraging leading market practices. They will have proven delivery experience in Agile-based, cloud-centric operating models in/across large enterprises. In addition to the operating model experience, preferred candidates will possess advanced AWS Cloud provider engineering and governance skills. As a strategic leader, you will drive a culture of problem-solving and innovation throughout the design, development, and maintenance lifecycle of application and infrastructure in the cloud. Your deep engineering expertise will be instrumental in driving the creation of developer signals that prioritize security, efficiency, and resiliency of our business operations. Key performance indicators will include the stability of our customer’s services, deployment interval and quality, technical debt reduction, and proactive measures to enhance asset resiliency and mitigate risk. Additionally, you will establish and monitor developer efficiency metrics such as golden signals, error budgets, and patterns for cloud-native services, driving optimal performance and productivity across our engineering teams. 

This role will have a Hybrid work arrangement, with the expectation of working in an office location (Hartford, CT; Charlotte, NC) 3 days a week (Tuesday through Thursday). Candidates must be authorized to work in the US without company sponsorship. The company will not support the STEM OPT I-983 Training Plan endorsement for this position.

Responsibilities: 

  • Lead the strategic planning and operational management of Site Reliability Engineering (SRE) teams, overseeing the design and implementation of robust and scalable solutions. 
  • Work within a matrix organization of RE Squads across the IT business, ensuring alignment with organizational objectives and priorities. 
  • Collaborate with stakeholders to define and prioritize functional and non-functional requirements, driving the adoption of best practices and methodologies. 
  • Foster a culture of innovation and excellence within RE Squads, promoting collaboration and knowledge sharing across teams. 
  • Utilize advanced engineering expertise to architect and maintain highly available and resilient systems, prioritizing security, efficiency, and reliability. 
  • Implement and monitor key performance indicators (KPIs) and metrics to optimize the performance and productivity of RE Squads. 
  • Lead the establishment of top quartile operating norms, continuously refining SRE practices to meet evolving industry standards. 

Qualifications: 

  • Bachelor's or advanced degree in Computer Science, Engineering, or a related field. 
  • Proven leadership experience in managing production Site Reliability Engineering (SRE) teams, with a focus on strategic planning and operational management. 
  • Expertise in cloud-native technologies, including containerization, microservices architecture, and serverless computing. 
  • Strong familiarity with SRE principles and methodologies, demonstrated through the implementation of proactive monitoring strategies and performance optimization techniques. 
  • Exceptional communication and collaboration skills, enabling effective engagement with stakeholders at all levels of the organization. 
  • Track record of driving innovation and continuous improvement in SRE practices, fostering a culture of excellence and accountability. 
  • Experience in building and managing a matrix organization of RE Squads, promoting collaboration and knowledge sharing across teams.

Compensation

The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford’s total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is:

$163,040 - $244,560

Equal Opportunity Employer/Females/Minorities/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age

About Us | Culture & Employee Insights | Diversity, Equity and Inclusion | Benefits