Location(s): Hyderabad, Telangana, India ⋅ Telangana, India
Experience Level(s): Mid Level ⋅ Senior
Field(s): Data & Analytics
Career Category
Information Systems
Job Description
Role Summary
Build and operate large-scale healthcare data pipelines across batch workflows, metadata-driven ingestion, and data service publishing.
Own end-to-end engineering from source ingestion to conformed data products, with strong focus on reliability, data quality, and operational observability.
Partner with analytics, business, and platform teams to deliver trusted datasets for sales, claims, activity, patient, and rare disease use cases.
Key Responsibilities
Design and maintain PySpark/SQL pipelines in Databricks for landing, unified, unstitched, and published data layers.
Build and support Airflow DAGs for scheduling, dependencies, retries, and production operations.
Implement metadata/config-driven frameworks for ingestion, transformation, and rule-based processing.
Develop robust data quality controls, DQ summaries, failure handling, and alerting workflows.
Manage batch/process audit logs, run status tracking, release flags, and operational reporting.
Integrate multi-source data (files, APIs, cloud storage, and relational systems) into governed Delta/Spark tables.
Optimize pipeline performance using partitioning, parallelization, and query tuning.
Collaborate on schema evolution, business-rule onboarding, and production support.
Required Skills
Bachelor’s degree in Computer Science, Information Technology, or a related field with 5-9 years of experience