This is a fully remote position, open to applicants in United States.

📋 Description

• Design and manage Ceresti’s comprehensive data architecture, including a secure cloud object storage landing zone for raw partner files and API payloads, validated ingestion pipelines into our transactional Postgres, and a curated analytics layer that separates reporting and AI workloads from production.

• Develop ingestion pipelines for current data, incorporating partner data files (CSV/JSON/XML/HL7/X12 as needed) and REST/SFTP API integrations with schema validation, error record quarantine, and complete lineage from raw bytes to curated rows.

• Establish and maintain the curated layer (data warehouse/lakehouse-lite) to ensure analytics and ML models can access data without impacting the transactional system's performance.

• Select, integrate, and manage the minimal set of necessary tools, including object storage, an orchestrator (Dagster, Prefect, Airflow, etc.), dbt or a similar tool for transformations, and a singular validation library (Great Expectations/Pandera/Soda).

• Develop and implement data governance protocols for a HIPAA-regulated setting: PHI/PII classification, encryption during transmission and at rest, role-based access control, audit logging, retention and minimum-necessary policies, as well as de-identification where suitable.

• Collaborate with backend, ML, product, and clinical stakeholders to establish data contracts with our health plan and ACO partners while maintaining high data quality standards.

• Create and sustain dependable feature data for ML models, which includes embeddings (e.g., pgvector) and curated feature tables for risk stratification, engagement, and outcomes initiatives.

• Equip the data platform with observability features, monitoring pipeline SLAs, data freshness, schema drift, quality metrics, and proactively responding to data insights.

• Engage fully in our Agile process, including backlog grooming, sprint planning, demos, and retrospectives.

• Mentor team engineers on SQL, schema design, and the art of developing data systems that prioritize reliability and simplicity.

⛳️ Requirements

• BS/BA degree or higher in Computer Science, Engineering, or a related technical discipline.

• Over 8 years of professional data engineering experience, demonstrating a successful track record of delivering end-to-end production data systems.

• Expertise in PostgreSQL, including schema design, indexing, query optimization, partitioning, logical replication, JSONB, extensions (pg_partman, pg_cron, pgvector, etc.), and managing Postgres at scale.

• Proficient in designing and managing data pipelines, encompassing file-based ingestion (SFTP/object storage drops) and API-based ingestion (REST, webhooks).

• Practical experience with one or more cloud platforms (AWS preferred) and their data primitives: object storage (S3), managed Postgres.

• Skilled in designing data warehouses and/or data lakes with the discernment to determine which solution is appropriate for specific problems.

• Strong proficiency with dbt (or an equivalent SQL-based transformation framework) and familiarity with modern data modeling methodologies (Kimball dimensional, Data Vault, One Big Table) along with insights into their appropriate applications.

• Experience with at least one orchestration framework (Dagster, Prefect, or Airflow) and a clear perspective on which one to utilize based on context.

• Strong Python capabilities for ingestion, validation, and tooling development.

• Familiarity with data validation and data-quality frameworks (Great Expectations, Pandera, Soda, or similar).

• Experience with change data capture from Postgres (logical replication or equivalent).

• Knowledge of data governance practices in a HIPAA-regulated environment or at least an understanding of safeguarding PHI and PII (encryption, least privilege, audits, de-identification, BAA-aware vendor selection); HITRUST or SOC 2 experience is highly desirable.

• Comfortable with infrastructure-as-code and CI/CD practices for data systems.

• Experience supporting ML workloads, including building feature tables, managing training data, and serving features during inference; familiarity with embeddings, vector searches (pgvector or equivalent), and LLM integration patterns (RAG, prompt-grounded analytics) is advantageous.

• Exceptional written and verbal communication skills, with the ability to convey complex schema decisions to business stakeholders and articulate data contracts to partners with equal clarity.

• Proven experience working within Agile/Scrum teams.

🏝️ Benefits

• Competitive salary and benefits package.

• Opportunities for professional growth and development.

• Collaborative and dynamic work environment.

• Flexible work arrangements and options for remote work.

• Access to cutting-edge technologies and tools.

Senior Data Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Senior BI Data Engineer

Data Architect, AWS

Data Engineer

Data Engineer – Senior (GCP)

Lead Data Engineer – Data Architect

Senior Data Engineer – Microsoft Fabric

Never miss a great job!