Career Area:
Business Technologies, Digital and Data
Job Description:
Your Work Shapes the World at Caterpillar Inc.
When you join Caterpillar, you're joining a global team who cares not just about the work we do – but also about each other. We are the makers, problem solvers, and future world builders who are creating stronger, more sustainable communities. We don't just talk about progress and innovation here – we make it happen, with our customers, where we work and live. Together, we are building a better world, so we can all enjoy living in it.
Reliability in highly complex, integrated systems typically crosses between multiple programming languages, third-party services, and integrations – as well as software and hardware – an SRE needs to be multi-talented.
As an SRE, you will be a process, technology and results oriented team member for Operations to deliver top notch service, quality, and metrics for Cat Digital data Platform.
You will fit this role if you can.
- Think about systems - edge cases, failure modes, behaviours, specific implementations.
- Debug production issues across services and levels of the stack.
- Make monitoring and alerting alert on symptoms and not on outages.
- Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it.
- Have an urge to collaborate and communicate asynchronously.
- Have an urge for delivering quickly and iterating fast.
Basic Qualifications:
- Bachelor’s degree, preferably in Computer Science, Software Engineering, or any other Engineering field.
- 4+ years with DevOps expertise.
Technical Experience:
- Knowledge of CI/CD solution on any platform with prior experience is must.
- Expertise in at least one technology stack designing, coding, testing, and delivering software.
- Working knowledge of Infrastructure components. (E.g. routers, load balancers, cloud products, container systems, compute, storage, and networks).
- 4+ experience on Key AWS services: EC2, S3, VPC, Route 53, RDS, CloudFormation, EC2, DynamoDB (NoSQL), Lambda, logging/CloudWatch, IAM, Certificate Manager, ELB, EBS, ECS, CloudFront/WAF, SQS, SNS, SES.
- Knowledge on Azure Cloud an added advantage.
- Expertise in ELK Monitoring Tool that ensure Open-Source IT monitoring, network monitoring, server and applications monitoring is an added advantage.
- 4+ years prior experience in DevOps and/or application development teams. Hands on experience using large scale software development, preferably in one of these languages: Java, Python, scripting languages is a must.
- Understanding of Restful API, Apigee or any other API Gateway will be plus.
- 4+ years’ experience on Docker and at least one Docker Container orchestration – ECS, Kubernetes
- Understanding with configuration Management tools like Ansible/Puppet/Chef/PowerShell/Terraform.
- Understanding of Git, Bitbucket, Jira, Jenkins, Sonar, Splunk, Maven, AIM and/ or Continuous Delivery tools.
- Excellent problem-solving skills and a strong attention to detail.
- Background in ITIL and/or ITSM process.
- Strong communication skills and ability to collaborate effectively with cross-functional teams.
Responsibilities:
- Meeting SLO, SLA, SLI’s defined in the operations model.
- Setting task prioritization and troubleshoot to closure of incidents.
- Participate on-call /on-rotation.
- Improve Service observability.
- Proactively testing the flexibility and resilience of the system.
- Drive adoption of continuous integration/inspection/deployment
If you have a passion for delivering reliable, high-performance services and thrive in a fast-paced environment, we'd love to hear from you. Apply now to join our team as a Site Reliability Engineer.
Posting Dates:
August 20, 2024 - August 26, 2024
Caterpillar is an Equal Opportunity Employer (EEO).
Not ready to apply? Join our Talent Community.