Mgr-Site Reliability Engineering

Posted:
10/7/2024, 5:00:00 PM

Location(s):
Florida, United States ⋅ Orlando, Florida, United States

Experience Level(s):
Mid Level ⋅ Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

Job Posting Title:

Mgr-Site Reliability Engineering

Req ID:

10101530

Job Description:

Manager-Site Reliability Engineer  

“We Power the Magic!” That’s our motto at Disney Experiences (DX). Our team creates world-class immersive digital experiences for the Company’s premier vacation brands including Disney’s Parks & Resorts worldwide, Disney Cruise Line, Aulani, a Disney Resort & Spa, and Disney Vacation Club.

We are responsible for the end-to-end digital and physical Guest experience for all technology & digital-led initiatives across the Attractions & Entertainment, Food & Beverage, Resorts & Transportation and Merchandise lines of business as well as other initiatives including MyDisneyExperience and Hey, Disney!

This role sits in the Commerce Shared Services organization within Technology & Digital for Disney Experiences. It works closely with Technical Operations and Product Delivery teams. 

The Manager- Site Reliability Engineering will report to the Senior Manager – Site Reliability Engineering.

About The Role & Team:

This is a people leadership role over a strong team of site reliability engineers.    You will enable this team by defining, measuring and improving service levels for the team and for the portfolio of applications supported by the team.   This team needs a strong mentor who can help develop and execute specific reliability plans in line with the business strategy of DX Tech and Digital.
 

What You'll Do:

  • Oversee finances and budgets in MyPPM, ensure accurate billing processes, and contribute to forecasting and accrual processes to maintain financial integrity and support organizational objectives
  • Work with the vendor management team to maintain the optimal mix of cast members, contractors and managed services to support the required work
  • Manage the work of your team in Jira and maintain documentation in Confluence
  • Lead the evolution of DevOps practices within the broader team framework, guiding others in leveraging this culture to enhance observability practices
  • Manage the SRE team to deliver monitoring and observability for the development and business users as needed
  • Work with development teams to develop and manage mutually agreeable service levels for all critical business applications
  • Drive teams to consult, design, build, and support development pipelines, automate infrastructure and operations, build telemetry for monitoring, engineer high-reliability and reinforce best-practices to secure company data
  • Lead your team to develop and grow all aspects of technology engineering skills using Amazon Web Services and Google Cloud Platform for container, virtualization and serverless based workloads
  • Develop and advocate strategic directions for reliability, observability and recovery and bring practical knowledge on systems, network, operational excellence and application stability, security, performance, and capacity management
  • Engage in estimation and planning across the organization, voicing recommendations, feedback, and solutions from a technical perspective and aligning to the overall project goals to deliver on-time & in-scope
  • Proactively track and assess new technologies across the industry to inform strategic decision-making and recommendations

Required Qualifications:

  • Minimum 8 years of related work experience
  • Demonstrated leadership in implementing observability principles across complex systems and environments, fostering a culture of reliability and resilience
  • Extensive experience with modern software delivery tools, including GitHub, GitLab, Harness.io, LaunchDarkly, AWS Code Deploy and Azure DevOps and with optimizing workflows and ensuring seamless deployment processes
  • Proficiency in designing and managing highly scalable and resilient infrastructure using configuration management and orchestration tools such as Terraform, Cloud Formation, Ansible and Chef, driving operational excellence and efficiency
  • Preferred not required: Leveraging AI for predictive insights, driving measurable continuous improvement in system reliability
  • Outstanding communication and leadership abilities, to ensure effective growth and development of team
  • A visionary who motivates teams to excel and fosters creativity, consistently driving excellence in all endeavors
  • An advocate for a diverse and inclusive culture that encourages innovation and ensures every team member feels a sense of belonging

Required Education:

  • Bachelor’s degree in Computer Science, Information Systems, Software, Electrical or Electronics Engineering, or comparable field of study, and/or equivalent work experience

#DISNEYTECH

Job Posting Segment:

Technology & Digital

Job Posting Primary Business:

Commerce

Primary Job Posting Category:

Site/System Reliability Engineer

Employment Type:

Full time

Primary City, State, Region, Postal Code:

Orlando, FL, USA

Alternate City, State, Region, Postal Code:

Date Posted:

2024-10-08