Senior Systems/Software Engineer

Posted:
3/31/2026, 2:54:03 AM

Location(s):
Karnataka, India ⋅ Bengaluru, Karnataka, India

Experience Level(s):
Senior

Field(s):
Software Engineering

Workplace Type:
On-site

Senior Systems/Software Engineer

  

This role has been designed as ‘Hybrid’ with an expectation that you will work on average 2 days per week from an HPE office.

Who We Are:

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.

Job Description:

   

High Performance Computing, AI and Labs is a critical element of HPE. We are focused on delivering innovative solutions that accelerate our customers’ digital transformation, enabling them to tackle their complex, and data-intensive workloads. Combining deep expertise and the development of the world’s most cutting-edge, high-performance supercomputers, is defining the next era of computing delivering valuable insight & innovation. Join us and redefine what’s next for you.

The HPE Programming Environment team seeks an experienced software engineer. This role requires hands-on software engineering experience in C, C++, and Fortran. The focus will be on sustaining engineering, production software engineering, and development of HPC and AI software stacks. Familiarity with application level development within a Linux environment on large scale systems is required. This is a highly visible role that will require working across geographic boundaries. Close collaboration with architects, executive management, and program management is requied. The successful candidate will substantial experience developing production software for large-scale systems in one of the following areas: development tools, k8s, k3s, containerization, virtualization, HPC, or a relevant AI area. This is a software engineering role that requires hands-on development of software.

What you'll do:

Key Responsibilities

Linux OS & Kernel (Primary)

  • Perform advanced debugging of RHEL/SLES systems: kernel panics/oops, soft/hard lockups, memory corruption, NUMA anomalies, PCIe/I/O faults, scheduler/cgroups issues, and systemd failures.
  • Collect and analyze crash data using kdump/kexec, crash, drgn, ftrace, perf, bcc/eBPF, systemtap, dmesg/journalctl; distinguish software vs. hardware failures quickly and accurately.
  • Manage kernel modules and DKMS, handle Secure Boot signing/verification, and maintain compatibility across kernel/OS patch levels.
  • Lead OS lifecycle tasks: release evaluation, kernel transitions, regression triage, rollback planning, and performance investigations.
  • Tune systems for HPC workloads: NUMA placement, hugepages, IRQ affinity, CPU pinning, I/O schedulers, memory policies, and tuned profiles.

GPU Stack (Secondary)

  • Install, validate, and troubleshoot NVIDIA (driver, CUDA toolkit, cuDNN, NCCL, NVML/DCGM) and AMD (AMDGPU, ROCm stack—HIP, rocBLAS, RCCL, ROCm SMI) on RHEL/SLES.
  • Diagnose GPU‑related failures: driver panics, PCIe link retrains, ECC errors, power/thermal throttling, MIG/SR‑IOV issues, and container runtime mismatches.
  • Ensure correct alignment among firmware, kernel, drivers, and SDKs; maintain compatibility matrices and clear upgrade/rollback procedures.

Containers, CI/CD & Tooling

  • Build GPU‑enabled OCI and Apptainer/Singularity images; integrate nvidia‑container‑toolkit and ROCm container hooks; support air‑gapped flows as needed.
  • Automate validation and post‑mortem collection (Bash/Python); add driver/kernel/SDK regression and smoke tests to CI.
  • Own technical documentation: runbooks, RCAs, playbooks, and knowledge‑base articles.

Observability, Reliability & Security

  • Implement system health/telemetry: DCGM, ROCm SMI, Prometheus exporters, Grafana dashboards, alerting rules, and SLOs for reliability.
  • Apply OS/GPU patching, SBOM/image signing, and secure configuration (SELinux/AppArmor, Secure Boot); collaborate with security and platform teams on compliance.

Collaboration

  • Partner with platform, driver, and application teams to land features and fixes; work with vendors on escalations; communicate findings to technical and executive audiences.

What you need to bring:

Education and Experience Required:

  • Bachelor's or Master's degree in Computer Science, Information Systems, or equivalent.
  • Typically 6-10 years experience.
  • Expert‑level RHEL/SLES administration with strong understanding of Linux internals (process, memory, storage, networking, namespaces/cgroups, systemd).
  • Proven experience with kernel crash analysis and low‑level diagnostics using crash, kdump/kexec, ftrace, perf, eBPF/bcc, systemtap, and kernel logs.
  • Ability to differentiate and clearly document kernel, driver, user‑space, configuration, and hardware fault domains.
  • Hands‑on with NVIDIA and AMD GPU driver stacks and SDKs (CUDA & ROCm) on enterprise Linux.
  • Strong Bash and Python automation; ability to read/debug C/C++ traces and symbols when necessary.
  • Solid grasp of system hardware: PCIe topology, NUMA, memory hierarchy, CPU microarchitecture basics, and storage stacks.
  • Excellent communication skills with a track record of RCA authorship and stakeholder updates.


     

    Additional Skills:

    Cloud Architectures, Cross Domain Knowledge, Design Thinking, Development Fundamentals, DevOps, Distributed Computing, Microservices Fluency, Full Stack Development, Security-First Mindset, Solutions Design, Testing & Automation, User Experience (UX)

    What We Can Offer You:

    Health & Wellbeing

    We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.

    Personal & Professional Development

    We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.

    Unconditional Inclusion

    We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.

    Let's Stay Connected:

    Follow @HPECareers on Instagram to see the latest on people, culture and tech at HPE.

    #india

    #highperformancecompute

    Job:

    Engineering

    Job Level:

    TCP_04

        

        

    HPE is an Equal Employment Opportunity/ Veterans/Disabled/LGBT employer. We do not discriminate on the basis of race, gender, or any other protected category, and all decisions we make are made on the basis of qualifications, merit, and business need. Our goal is to be one global team that is representative of our customers, in an inclusive environment where we can continue to innovate and grow together. Please click here: Equal Employment Opportunity.

    Hewlett Packard Enterprise is EEO Protected Veteran/ Individual with Disabilities.

       

    HPE will comply with all applicable laws related to employer use of arrest and conviction records, including laws requiring employers to consider for employment qualified applicants with criminal histories.

       

    No Fees Notice & Recruitment Fraud Disclaimer

     

    It has come to HPE’s attention that there has been an increase in recruitment fraud whereby scammer impersonate HPE or HPE-authorized recruiting agencies and offer fake employment opportunities to candidates.  These scammers often seek to obtain personal information or money from candidates.

     

    Please note that Hewlett Packard Enterprise (HPE), its direct and indirect subsidiaries and affiliated companies, and its authorized recruitment agencies/vendors will never charge any candidate a registration fee, hiring fee, or any other fee in connection with its recruitment and hiring process.  The credentials of any hiring agency that claims to be working with HPE for recruitment of talent should be verified by candidates and candidates shall be solely responsible to conduct such verification. Any candidate/individual who relies on the erroneous representations made by fraudulent employment agencies does so at their own risk, and HPE disclaims liability for any damages or claims that may result from any such communication.

    Hewlett Packard Enterprise

    Website: https://www.hpe.com/

    Headquarter Location: Palo Alto, California, United States

    Employee Count: 10001+

    Year Founded: 1939

    IPO Status: Private

    Industries: Analytics ⋅ Computer ⋅ Consumer Software ⋅ Information Technology ⋅ IT Management ⋅ Security ⋅ Software