Specialist, Data Engineering

Posted:
5/17/2026, 4:18:19 AM

Location(s):
Telangana, India ⋅ Hyderabad, Telangana, India

Experience Level(s):
Junior ⋅ Mid Level ⋅ Senior

Field(s):
Data & Analytics ⋅ Software Engineering

Job Description

Specialist, Data Engineer:

The Opportunity

Based in Hyderabad, join a global healthcare biopharma company and be part of a 130- year legacy of success backed by ethical integrity, forward momentum, and an inspiring mission to achieve new milestones in global healthcare.
Be part of an organisation driven by digital technology and data-backed approaches that support a diversified portfolio of prescription medicines, vaccines, and animal health products.
Drive innovation and execution excellence. Be a part of a team with passion for using data, analytics, and insights to drive decision-making, and which creates custom software, allowing us to tackle some of the world's greatest health threats.

Our Technology Centers focus on creating a space where teams can come together to deliver business solutions that save and improve lives. An integral part of our company's IT operating model, Tech Centers are globally distributed locations where each IT division has employees to enable our digital transformation journey and drive business outcomes. These locations, in addition to the other sites, are essential to supporting our business and strategy.

A focused group of leaders in each Tech Center helps to ensure we can manage and improve each location, from investing in growth, success, and well-being of our people, to making sure colleagues from each IT division feel a sense of belonging to managing critical emergencies. And together, we must leverage the strength of our team to collaborate globally to optimize connections and share best practices across the Tech Centers.

ROLE Overview:

We are looking for a data & platform engineer to join the team responsible for the Data configuration-driven data pipeline solution that powers data ingestion, transformation, and delivery across AWS Glue, Databricks, and Apache Airflow.

In addition to the core data platform, this role owns AI agents that enable users to automate configuration creation and pipeline troubleshooting and provide guided technical support.

This is an individual contributor role spanning the full stack: a Python/PySpark pipeline engine, Scala/Java Spark extensions, GitHub Actions CI/CD workflows, multi-cloud infrastructure. You will own features and releases end-to-end, support internal teams consuming the framework, and keep the platform secure, scalable, and reliable.

What You Will Do

· Develop and maintain the core engine — build and extend Python/PySpark loaders, transformers, and writers across 23 source connectors and 19 sink connectors.

· Drive CI/CD automation — design, maintain, and improve 28+ GitHub Actions reusable workflows covering dataset build/deploy, framework releases, Docker image promotion, and AWS key rotation.

· Manage multi-cloud infrastructure — provision and maintain AWS resources (ECS, IAM, ECR, S3, Secrets Manager) and Azure using Terraform.

· Own and evolve the product Configuration and Support Bot — maintain the Microsoft Teams bot adapter, integrate with the Company approved LLM, and extend features such as file attachment handling, channel thread context, and Microsoft Graph integration.

· Automate credential lifecycle — operate and improve the automated AWS IAM key rotation service that keeps GitHub Actions secrets, Airflow connections, and AWS Secrets Manager in sync.

· Ensure data quality — implement and extend the rule engine for schema validation, null checks, regex patterns, and quarantine/alert actions.

· Support internal consumers — help dataset teams onboard, troubleshoot pipelines, and adopt new framework features; maintain API stability across releases.

· Contribute to release management — own versioning strategy and artifact promotion through JFrog Artifactory for core, orchestration, and tooling packages.

· Provide L3 technical support for end users.

What you should have:

Data Engineering

4+ years of professional experience in data engineering, platform engineering, or cloud infrastructure

· Strong experience with Python and PySpark for batch and streaming ETL workloads

· Proficiency in Spark SQL, DataFrame API, and custom Spark extensions

· Experience with data lake patterns: Delta Lake, Parquet, partitioning strategies

· Understanding of data quality, schema validation, and Change Data Capture (CDC)

· Familiarity with data orchestration platforms (e.g. Apache Airflow,)

· Experience integrating diverse data sources: relational databases (JDBC), REST APIs with OAuth2, cloud object storage, file transfer protocols (SFTP/SMB) or streaming systems

Cloud Platforms & Infrastructure as Code

· Hands-on experience with AWS (compute, storage, data warehousing, messaging, identity, container services, monitoring) and Azure (compute, app hosting, identity, networking)

· Strong Infrastructure as Code skills — Terraform or equivalent tooling for provisioning, managing, and tearing down cloud resources across environments

· Experience managing multi-environment deployments (dev / test / production) with proper isolation and promotion workflows

· Container management: building, tagging, promoting, and hosting container images in cloud registries

· Familiarity with cloud cost management and resource right-sizing

Cloud Networking & Security

· Understanding of cloud networking: VPC/VNet design, subnet architecture, private endpoints

· Identity and access management across cloud providers: IAM policies, service principals, app registrations, role-based access control

CI/CD & DevOps

· Proficiency in GitHub Actions or comparable CI/CD platforms: reusable workflows, matrix build strategies, environment-scoped secrets

· Docker and containerization best practices for build and deployment pipelines

· Artifact and release management: versioning strategies, artifact promotion, dependency management across multiple packages

· Experience with artifact repositories (e.g. JFrog Artifactory, Nexus, or cloud-native equivalents)

· Automated infrastructure provisioning and teardown as part of CI/CD pipelines

· Experience integrating with LLM-based agents or AI APIs via REST endpoints

· Understanding of conversational context management: multi-turn chat history, context windowing, and truncation strategies

· Awareness of prompt engineering principles: prompt injection defense, context synthesis, input sanitization

· Familiarity with agent orchestration frameworks (e.g. Semantic Kernel, LangChain, or similar)

· Experience building or maintaining bot services that bridge messaging platforms with backend AI systems

· Demonstrated ability to work across Python, JVM languages, and Infrastructure as Code in a single role

· Track record of delivering end-to-end releases: versioning, artifact promotion, multi-environment deployment

· Experience operating production services across both AWS and Azure

Primary Skills:

Python & PySpark – Working experience of atleast 3+ years

Cloud Platforms (AWS & Azure)

Data Lake Patterns

Infrastructure as Code (Terraform)

Ingestion Connectors- JDBC, REST APIs (OAuth2), and SFTP/SMB

JVM Languages (Scala/Java)

CI/CD Automation (GitHub Actions)

SDLC experience

Secondary Skills:

AI & Bot Development & maintenance

JFrog - Release & Artifact Management

Data Quality, CDC & Governance

Security & Identity Management

Orchestration Platforms - Airflow

Containerization – Docker image building

Our technology teams operate as business partners, proposing ideas and innovative solutions that enable new organizational capabilities. We collaborate internationally to deliver services and solutions that help everyone be more productive and enable innovation.

Who we are

We are known as Merck & Co., Inc., Rahway, New Jersey, USA in the United States and Canada and MSD everywhere else. For more than a century, we have been inventing , bringing forward medicines and vaccines for many of the world's most challenging diseases. Today, our company continues to be at the forefront of research to deliver innovative health solutions and advance the prevention and treatment of diseases that threaten people and animals around the world.

What we look for

Imagine getting up in the morning for a job as important as helping to save and improve lives around the world. Here, you have that opportunity. You can put your empathy, creativity, digital mastery, or scientific genius to work in collaboration with a diverse group of colleagues who pursue and bring hope to countless people who are battling some of the most challenging diseases of our time. Our team is constantly evolving, so if you are among the intellectually curious, join us—and start making your impact today.

#HYDIT2025

Required Skills:

AWS Secrets Manager, Business Intelligence (BI), Cloud Networking, Container Terminal Management, Data Analysis, Database Administration, Data Engineering, Data Lake, Data Management, Data Modeling, Data Visualization, Design Applications, Electrical Transformer, Information Management, Infrastructure As Code (IaC), JFrog Artifactory, Prompt Engineering, Relational Database, Relational Database Management System (RDBMS), Release Management, Software Development, Software Development Life Cycle (SDLC), Spark SQL, System Designs, Terraform on Microsoft Azure

Preferred Skills:

DevOps

Current Employees apply HERE

Current Contingent Workers apply HERE

Search Firm Representatives Please Read Carefully
Merck & Co., Inc., Rahway, NJ, USA, also known as Merck Sharp & Dohme LLC, Rahway, NJ, USA, does not accept unsolicited assistance from search firms for employment opportunities. All CVs / resumes submitted by search firms to any employee at our company without a valid written search agreement in place for this position will be deemed the sole property of our company. No fee will be paid in the event a candidate is hired by our company as a result of an agency referral where no pre-existing agreement is in place. Where agency agreements are in place, introductions are position specific. Please, no phone calls or emails.

Employee Status:

Regular

Relocation:

VISA Sponsorship:

Travel Requirements:

Flexible Work Arrangements:

Not Applicable

Shift:

Valid Driving License:

Hazardous Material(s):

Job Posting End Date:

05/25/2026

*A job posting is effective until 11:59:59PM on the day BEFORE the listed job posting end date. Please ensure you apply to a job posting no later than the day BEFORE the job posting end date.

Notify

postings

pricing

login

Specialist, Data Engineering

Specialist, Data Engineer:

Merck

Related Postings

Développeur Fullstack Senior Java – Team Lead

Senior AI Security Researcher

Data Mapping Validation Analyst (On-site)

Operational Cyber Researcher and Capabilities Engineer

Stagiaire en développement logiciels

Notify

postings

our prices

login

contact us

privacy policy