Remotery

Data Engineer

Posted 22 hours ago

This is a fully remote position, open to applicants in Poland.

πŸ“‹ Description

β€’ Recreate a comprehensive descriptive-statistics report from start to finish, ensuring that every figure can be traced back to its original source β€” addressing the gaps acknowledged by the client (data points they currently cannot validate).

β€’ Analyze and reconcile varying source schemas across acquired organizations: align different field names, types, encodings, and business definitions for the same concept into a unified model.

β€’ Develop dbt models for staging, intermediate, and mart layers with testing; codify the harmonized definitions as specified by the Data Science Lead.

β€’ Create Great Expectations suites (null, range, uniqueness, referential checks) and integrate them into the pipeline to ensure that erroneous data fails loudly, preventing silent corruption of analysis.

β€’ Execute entity and identity resolution (both deterministic and fuzzy matching) in cases where there is no clean shared key for the same customer or account across different sources.

β€’ Implement and validate anonymization and pseudonymization techniques (hashing, tokenization, k-anonymity) and provide evidence that re-identification risk is managed for the client's IT and compliance teams.

β€’ Optimize Spark and Glue jobs handling tens of millions of rows β€” focusing on partitioning, file formats (Parquet), incremental loads, and cost management.

β€’ Coordinate with Airflow and Step Functions; establish repeatable, scheduled pipelines instead of one-off scripts.

β€’ Prepare clean, documented, and feature-ready datasets for the PD and delinquency models.

β€’ Document runbooks to enable the offshore team to manage the pipelines, ensuring that handover processes take days rather than weeks; assist in scoping the onboarding of remaining sources (Ireland and additional sources).


⛳️ Requirements

β€’ Over 4 years of experience in data engineering, with a strong focus on AWS and Spark/SQL at scale.

β€’ Proven track record in harmonizing and integrating data across multiple source systems.

β€’ Experience in building validated, reproducible pipelines within regulated environments (BFSI, healthcare, government) β€” a significant advantage.

β€’ Comfortable working within a complex, partially constructed data landscape and enhancing it to meet standards.

β€’ Able to operate as the sole or lead data engineer within a small delivery team (3–4 members).


🏝️ Benefits

β€’ Preference for full-time engagement.

People also viewed

Anord Mardix9 hours ago

Senior BI Data Engineer

GB flagUnited Kingdom OnlyFull-timeData Engineer
ApplyView job
Stefanini Brasil9 hours ago

Data Architect, AWS

BR flagBrazil OnlyFull-timeData Engineer
ApplyView job
InVision Communications9 hours ago

Data Engineer

US flagUnited States OnlyFull-timeData Engineer$100k – $110k/year
ApplyView job
Leega9 hours ago

Data Engineer – Senior (GCP)

BR flagBrazil OnlyFull-timeData Engineer
ApplyView job
Enable Data9 hours ago

Lead Data Engineer – Data Architect

IN flagIndia OnlyFull-timeData Engineer
ApplyView job
Capco9 hours ago

Senior Data Engineer – Microsoft Fabric

BR flagBrazil OnlyFull-timeData Engineer
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers