If you are a current Chamberlain Group employee, please click here to apply through your Workday account.
The role within Chamberlain Group's Engineering function. A successful incumbent is expected to (i) define, manage, and execute operational tests and processes to confirm system health or status using custom tools and 3rd party monitoring applications, and (ii) manage existing as well as development of new monitoring frameworks, monitoring dashboards and monitoring data history archive in support of service level agreements and quality of service reporting; (iii) work in support of the delivery of “three nines” uptime by forming collaboration teams aligned to the SLA/SLO goal. Requires previous experience in system operations, technical operations or DevOps role for websites and/or mobile applications. The platform operates using public cloud data center resources and task automation. We are a 24x7x365 service organization and look for individuals with experience working in support of consumer-facing products and mobile phone applications.
Job Responsibilities:
- Manage existing as well as development of new monitoring frameworks, monitoring dashboards and monitoring data history archive in support of Service Level Agreements and Quality of Service reporting; deliver the “three nines” uptime by forming collaboration teams aligned to the SLA/SLO goal
- Define, manage, and execute operational tests and processes to confirm system health or status using custom tools and 3rd party monitoring applications
- Participate, manage, lead incident response calls ensuring immediate mitigation and then ultimate resolution to the root causes of all service interruptions; ensures that the operations team documentation and training on incident responses is keep current. Is part of the rotational schedule of the on-call support team with 24x7 on call assignment every few weeks
- Collaborate, consult, coordinate task level work with developers or scrum teams on matter of capacity expansion or new feature deployment; ensure that server and system capacity projections are accurate and prevent any service degradation due to lack of resources
- Coordinate with internal customers to discuss the operational dimensions and requirements of new releases of MyQ/Connected Products; coordinate with internal cross-functional business teams, including production support/analyst roles across call centers, marketing, engineering, product development, security and IT
- Be able to work closely with a tight knit group of developers; Work using agile methodologies; Be able to identify and manage risks with the platform; Produce high quality, maintainable and scalable software; Analyze requirements, collaborate with architect and leads to produce thoughtful software designs that ensure principals and standards are maintained across your code; – Maintain coding standards and participate in peer code reviews; Participate in technical assessment, scoping and management of changes to the code-base on new business requirements, product enhancements and other change requests; Collaborate with other Chamberlain domain experts, such as Infrastructure, Database, and Middleware, as the team develops features and platform enhancements; Capable of leading and contributing to technical discussions
- Perform daily / weekly/ monthly reporting duties as directed by supervisor; conduct data extractions in support of QOS investigations, RCAs, business opportunities and user communications. Author necessary Root Cause Analysis (RCA) documents after service breaks; collect RCA documents from other responsible parties for service breaks impacting myQ/Connected Platforms; follow up on corrective actions items determined by RCA meetings. Review draft RCA artifacts for completeness and quality.
- Support the Scrum Team with defect/task/story tickets research for pre-existing bugs or undesired system behavior. Determine and measure acceptance criteria of tickets.
- Manage/deliver the communications materials as instructed when notification to stakeholder and partner teams occur regarding about planned and unplanned events of the Connected Solutions platform
- Educate technical staff on relevant operational best-practice; follow guidance from Development team architects on new functionality/features being deployed; ensure relevant server capacity is available for all scheduled feature launches
- Responsible for complying with the security requirements set forth by the Information Security team and the established ISO 27001 Security Roles, Responsibilities, and Authorities Document found in the ISMS Document Library
- Comply with health and safety guidelines and rules; managers should also ensure compliance across their teams.
- Protect Chamberlain Group’s reputation by keeping information confidential.
- Maintain professional and technical knowledge by attending educational workshops, reading professional publications, establishing personal networks, and participating in professional societies.
- Contribute to the team effort by accomplishing related results and participating on projects as needed.
Job Requirements:
- Bachelor's degree in computer science or related field
- 5 years in system operations, technical operations or DevOps role for websites and/or mobile applications
- 2+ experience working with a 24x7x365 service organization
- Experience working to support consumer-facing products and mobile phone applications with a strong engineering and/or IT service level component
- 1 year+ operations experience working on public cloud infrastructure
Knowledge, Skills, and Abilities:
- Able to code in C++, Python, PowerShell and experience source code control platforms and processes
- Knowledge of good coding practices and standards, including object-oriented design, code refactoring, and code documentation
- Experience with Source Code Management Tools (e.g. Git, TFS, VSTS, RTC)
- Working in Agile Scrum and Kanban SLDC methodologies.
- Communications skills for working closely with Project Managers, Product Owners, Technical Leads, Scrum Masters and individual contributors
- Requires some Public Cloud platform project and resource administration experience (PaaS, SaaS and IaaS).
- Willing to take ownership of service reliability and operations. Must be ready and available to respond 24x7x365 in cases of service incidents and disruptions
- Willing to “roll up the sleeves” and work with the team to continuously improve Quality Of Service
- Working knowledge of Internet protocols and web server software and communications, including HTTP, TCP, UDP, Web Sockets, Windows Server, IIS.
- Familiarity with security tools (IPD/IPS) and best practices in defending against “bad actors”.
- Fluency with the Atlassian JIRA and Confluence tools for ticket, task, and knowledge management.
- Operator experience with monitoring tools such as Prometheus, Grafana, Monitis, Nagios, DynaTrace, App Dynamics. Administrator level experience is desired.
- Able to write SQL queries to extract content from database tables. A plus if familiar with NoSQL and Big Data systems.
- Understands load balancers and operational issue surround traffic management. Desire a demonstrated ability to manage load balancer pools.
- Highly organized, self-motivated, able to multi-task, able to get the message and needs of operations heard by the technologists, engineering and marketing/ business teams.
- Able to effectively analyze unexpected problems by locating patterns in the data and underlying causes.
- Attention to detail, quality, responsiveness and efficiency.
- Skilled with a CI/CD platforms and tools such as Octopus, Jenkins, Puppet, Chef, Docker.
- Advanced or Expert level PowerShell user and take over script libraries with goal to keep them current/relevant.
- Fluent in MS Office Suite including Visio.
- Demonstrated success operating a complex system
- Ability to work inside a matrix organization.
Preferred Job Requirements:
- Hybrid cloud data center operations
Knowledge, Skills, and Abilities:
- Ability to effectively analyze unexpected problems by locating patterns in the data and underlying causes.
- Understands AMQP technology and messaging services operations
Chamberlain Group wants all of its employees to succeed and encourages people of all backgrounds to apply. We’re proud to be an Equal Opportunity Employer, and you’ll be considered for this role regardless of race, color, religion, sex, national origin, age, sexual orientation, ancestry; marital, disabled or veteran status. We’re committed to fostering an environment where people of all lived experiences feel welcome.
Persons with disabilities who anticipate needing accommodations for any part of the application process may contact, in confidence [email protected].
NOTE: Staffing agencies, headhunters, recruiters, and/or placement agencies, please do not contact our hiring managers directly.