
Senior Data Engineer
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in United States.
• Design and manage Ceresti’s comprehensive data architecture, including a secure cloud object storage landing zone for raw partner files and API payloads, validated ingestion pipelines into our transactional Postgres, and a curated analytics layer that separates reporting and AI workloads from production.
• Develop ingestion pipelines for current data, incorporating partner data files (CSV/JSON/XML/HL7/X12 as needed) and REST/SFTP API integrations with schema validation, error record quarantine, and complete lineage from raw bytes to curated rows.
• Establish and maintain the curated layer (data warehouse/lakehouse-lite) to ensure analytics and ML models can access data without impacting the transactional system's performance.
• Select, integrate, and manage the minimal set of necessary tools, including object storage, an orchestrator (Dagster, Prefect, Airflow, etc.), dbt or a similar tool for transformations, and a singular validation library (Great Expectations/Pandera/Soda).
• Develop and implement data governance protocols for a HIPAA-regulated setting: PHI/PII classification, encryption during transmission and at rest, role-based access control, audit logging, retention and minimum-necessary policies, as well as de-identification where suitable.
• Collaborate with backend, ML, product, and clinical stakeholders to establish data contracts with our health plan and ACO partners while maintaining high data quality standards.
• Create and sustain dependable feature data for ML models, which includes embeddings (e.g., pgvector) and curated feature tables for risk stratification, engagement, and outcomes initiatives.
• Equip the data platform with observability features, monitoring pipeline SLAs, data freshness, schema drift, quality metrics, and proactively responding to data insights.
• Engage fully in our Agile process, including backlog grooming, sprint planning, demos, and retrospectives.
• Mentor team engineers on SQL, schema design, and the art of developing data systems that prioritize reliability and simplicity.
• BS/BA degree or higher in Computer Science, Engineering, or a related technical discipline.
• Over 8 years of professional data engineering experience, demonstrating a successful track record of delivering end-to-end production data systems.
• Expertise in PostgreSQL, including schema design, indexing, query optimization, partitioning, logical replication, JSONB, extensions (pg_partman, pg_cron, pgvector, etc.), and managing Postgres at scale.
• Proficient in designing and managing data pipelines, encompassing file-based ingestion (SFTP/object storage drops) and API-based ingestion (REST, webhooks).
• Practical experience with one or more cloud platforms (AWS preferred) and their data primitives: object storage (S3), managed Postgres.
• Skilled in designing data warehouses and/or data lakes with the discernment to determine which solution is appropriate for specific problems.
• Strong proficiency with dbt (or an equivalent SQL-based transformation framework) and familiarity with modern data modeling methodologies (Kimball dimensional, Data Vault, One Big Table) along with insights into their appropriate applications.
• Experience with at least one orchestration framework (Dagster, Prefect, or Airflow) and a clear perspective on which one to utilize based on context.
• Strong Python capabilities for ingestion, validation, and tooling development.
• Familiarity with data validation and data-quality frameworks (Great Expectations, Pandera, Soda, or similar).
• Experience with change data capture from Postgres (logical replication or equivalent).
• Knowledge of data governance practices in a HIPAA-regulated environment or at least an understanding of safeguarding PHI and PII (encryption, least privilege, audits, de-identification, BAA-aware vendor selection); HITRUST or SOC 2 experience is highly desirable.
• Comfortable with infrastructure-as-code and CI/CD practices for data systems.
• Experience supporting ML workloads, including building feature tables, managing training data, and serving features during inference; familiarity with embeddings, vector searches (pgvector or equivalent), and LLM integration patterns (RAG, prompt-grounded analytics) is advantageous.
• Exceptional written and verbal communication skills, with the ability to convey complex schema decisions to business stakeholders and articulate data contracts to partners with equal clarity.
• Proven experience working within Agile/Scrum teams.
• Competitive salary and benefits package.
• Opportunities for professional growth and development.
• Collaborative and dynamic work environment.
• Flexible work arrangements and options for remote work.
• Access to cutting-edge technologies and tools.
Anord Mardix
Stefanini Brasil
InVision Communications
Get handpicked remote jobs straight to your inbox weekly.