Director of Site Reliability Engineering, NADP

Posted:
7/24/2024, 9:11:13 AM

Location(s):
Lisbon, Portugal ⋅ Karnataka, India ⋅ London, England, United Kingdom ⋅ England, United Kingdom

Experience Level(s):
Senior

Field(s):
DevOps & Infrastructure ⋅ Software Engineering

Who We Are

The name ThousandEyes was born from two big ideas: the power to see things not ordinarily possible and the ability to collect insights from a multitude of vantage points. As the world continues its digital transformation and relies more on cloud services and the Internet, the “network,” which is now both public and private, has become a black box our customers cannot see or understand.  

Our Internet and cloud intelligence platform delivers the only collectively powered real-time view of the Internet and private networks, cloud, and SaaS platforms, helping enterprises and service providers identify problems before they impact revenue, damage brand reputation, or halt employee productivity. 

In August 2020, Cisco Systems completed the acquisition of ThousandEyes, which now forms the ThousandEyes Business Unit within the Cisco Networking Business Group and is the Network Assurance solution for Cisco across the Cisco Networking Cloud and Cisco Security Cloud. ThousandEyes is also a foundational component of Cisco’s growing Full-Stack Observability (“FSO”) business. 

About the role

As the Director of Site Reliability Engineering, Network Assurance Data Platform you will play a critical role in shaping and executing our cloud and big data, ML/AI infrastructure strategy, driving operational excellence, and ensuring the highest levels of system reliability and security. You will lead teams of talented engineers and collaborate closely with cross-functional teams, including software development, operations, and security, to design, build, and maintain our infrastructure, cloud platforms, and security practices, operating at a multi-region scale. 

What You'll Do

  • Lead and inspire a talented team of site reliability engineers, fostering a culture of innovation, collaboration, and excellence in development and operation of infrastructure platforms
  • Drive the strategic vision for the development, implementation, and management of cloud, data, ML/AI platforms.
  • Collaborate closely with cross-functional teams, including development, product management, and security to define and implement reliable, secure, and scalable infrastructure platforms
  • Provide oversight and direction in the development and operation of cloud platforms, ensuring high-quality, scalable, and reliable solutions that meet customer needs
  • Drive operational excellence in operations and security processes
  • Mentor and develop engineering talent, fostering a culture of continuous learning and professional growth within the site reliability engineering group

Qualifications

  • You have a deep understanding of the distributed systems design, cloud technology and their components, dependencies, and code that define infrastructure
  • You possess a deep understanding of SRE principles, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts
  • Extensive hands-on experience building cloud, big data and/or ML/AI infrastructure (e.g. EMR, Airflow, Comet ML, AWS SageMaker, Spark, etc)
  • Extensive hands-on experience operating mission-critical services in production environments which are required to have high availability and reliability.
  • Proven ability to think strategically and align technical initiatives with business objectives
  • Can provide a strong technical vision for your teams and ensure consistent delivery of objectives
  • Have experience formulating a team's technical strategy and roadmap; you've collaborated and partnered effectively with several other teams to execute shared goals
  • Understand how to balance tactical needs with strategic growth and quality-based initiatives that can span multiple quarters
  • Proven site reliability engineering management experience leading multiple teams

Cisco values the perspectives and skills that emerge from employees with diverse backgrounds. That's why Cisco is expanding the boundaries of discovering top talent by not only focusing on candidates with educational degrees and experience but also placing more emphasis on unlocking potential. We believe that everyone has something to offer and that diverse teams are better equipped to solve problems, innovate, and create a positive impact.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification. Research shows that people from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy. We urge you not to prematurely exclude yourself and to apply if you're interested in this work.

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records. 



ThousandEyes

Website: https://www.thousandeyes.com/

Headquarter Location: San Francisco, California, United States

Employee Count: 501-1000

Year Founded: 2010

IPO Status: Private

Last Funding Type: Series D

Industries: Cloud Computing ⋅ Cloud Infrastructure ⋅ Enterprise Software ⋅ SaaS ⋅ Software