Job Posting Title:
Manager, Systems Reliability Engineering
Req ID:
10108887
Job Description:
The Walt Disney Company is a world-class entertainment and technology leader. Walt’s passion was to continuously envision new ways to move audiences around the world—a passion that remains our touchstone in an enterprise that stretches from theme parks, resorts and a cruise line to sports, news, movies and a variety of other businesses. Uniting each endeavor is a commitment to creating and delivering unforgettable experiences — and we’re constantly looking for new ways to enhance these exciting experiences.
Managers working Walt Disney Studio Technology must possess a broad, integrated worldview of both Information and Media technologies in order to successfully implement, manage and support the next generation of infrastructure and workflow applications designed to support feature film production, post-production and distribution. The Manager, Systems Reliability Engineering must be technically competent to oversee Systems Reliability Engineers, Media Systems Engineers and Storage Engineers, able to prioritize, delegate and oversee the team of highly talented engineers in Studio Technology.
- Direct management of a team of Media Systems Engineers, Systems Reliability Engineers, and Storage Engineers that support Studio-run production environments, whether on premise or in the public cloud
- Develop operational plans and procedures covering change control, maintenance events, intergroup communications, and downtime response
- Assist and facilitate architecting and engineering of systems solutions that fulfill various business unit requirements
- Responsible for appropriate application of Corporate, Studio, and individual business unit security practices and standards
- Manages all critical stakeholder interfaces including: developers, production, post production, client services, Studio Tech groups, and wider Enterprise
- Responsible for effective collaboration between all Studio Technology operations and engineering groups, including systems reliability engineering, systems, networking, software, and client services
- Responsible for collaboration and relationships with other proximate Disney groups, including Post Production Services, Enterprise IT, IT outsourcing partners, and the broader Studio Technology groups
- Make sound technical and business decisions autonomously when confronted with competing operational trade-offs
- Oversee the performance and deliverables of 3rd party, domestic or offshore technical support suppliers
- Develop detailed service specifications designed to guide the activities of 3rd party suppliers
- Assure readily available and secured third party communications
- Optimize business value of Studio infrastructure operations in a 24x7 shop, and in the most cost-effective way possible
- Ensure reliable operation of infrastructure and component monitoring, instrumentation and management tools in order to predict, quickly diagnose and resolve abnormal systems behavior
- Secure and report on all systems and subsystems
- Manage key vendors and partners, both internal and external
- Serve as an escalation participant for tickets and issues related to supported infrastructure
- Manage budgets, forecasts and 5-year plans for the studio systems initiatives
- Responsible for all equipment/software maintenance contracts and renewals
- Responsible for all third-party service contracts
Knowledge, Experience & Expertise
- Experience working in media production environments
- Hands on experience building and running Linux and Windows platforms
- Smart, self-driven with a keen focus on and track record of exceptional delivery of innovative solutions
- Strong written and verbal communication skills
- Skilled in managing Cloud/IaaS Environments (e.g. AWS, Google Cloud Compute)
- Knowledge in system management languages (e.g. Chef, Terraform, Ansible)
- Expertise in Software Development Continuous Integration (CI) Pipeline knowledge (e.g. Jenkins, Gitlab CI) and Source Control Management (e.g. Git)
- Expertise with Operating Systems, Distributed Systems and Container Platforms (e.g. Kubernetes/GKE, ECS, Openshift, Fargate)
- Expertise in multiple scripting languages in your toolbox (e.g. Python, GO, Ruby, or Swift), with ability to build test coverage for all code being developed
- Virtual hosting technologies (e.g. VMWare, KVM)
- Data center, network, and application architectures
- Able to evaluate new system and/or infrastructure solutions for technical feasibility against known requirements and standards.
- A seasoned and experienced technical manager, preferably from a dynamic production support environment, able to comfortably oversee highly knowledgeable technical support staff yet also contribute technically to the plan/solution.
- Able to multitask in a highly complex, diverse, systems environment
- Able to quickly make decisions given incomplete and conflicting knowledge
- Highly self-directed, being able to both manage and (re)prioritize the multiple concurrent and competing challenges, issues, ambiguities, and contradictions that, inevitably, occur when supporting systems
- Strong analytical problem-solving skills
- A team-building leader with good interpersonal, verbal and relationship building skills
- Ability to construct, manage and oversee a complex budget of annual and 5-yr planning elements
- A generalist who can perform many different tasks
- Excellent verbal and written communication skills, and thus able to explain and document the systems to their diverse audiences.
- Good interpersonal and relationship building skills.
- BS in Computer Science or related field with 8+ years of experience or equivalent
- 4 years of technology management experience
The hiring range for this position in Los Angeles, CA is $167,700.00-$224,900.00 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.
Job Posting Segment:
TWDSTECH
Job Posting Primary Business:
Technology- Tech Ops Media Sys Eng
Primary Job Posting Category:
Site/System Reliability Engineer
Employment Type:
Full time
Primary City, State, Region, Postal Code:
Burbank, CA, USA
Alternate City, State, Region, Postal Code:
Date Posted:
2025-01-09