
Senior Data Engineer
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in Brazil.
• You will design and enhance the datalake, which serves as the company's data backbone — the core system that supports, in real time, the dynamic pricing engine, machine learning models, and the group's business intelligence.
• This position entails ownership: you will establish the multi-tenant Lakehouse architecture, covering aspects from streaming to the semantic layer, while ensuring its reliability, governance, and cost-effectiveness.
• Develop and improve the data lake utilizing Apache Iceberg over S3 — implementing well-defined layers, partitioning and compaction, time-travel capabilities, and support for DELETE/UPDATE in accordance with LGPD (Brazilian data protection law).
• Create real-time ingestion processes (Kafka, Flink, CDC with Debezium) with managed schema evolution (Schema Registry) and delivery assurances.
• Design the transformation layer in dbt and coordinate batch and quality workflows in Airflow, spanning from crawler to backfill.
• Uphold metric definitions in Cube.js — the unified source that powers BI and AI agents, ensuring consistency throughout the organization.
• Execute federated and low-latency OLAP queries over the lake, maintaining cost and access isolation by tenant while ensuring high-performance queries.
• Guarantee data testing, lineage tracking, and cost efficiency, ensuring the platform remains reliable as it scales.
• Proficient in SQL with expertise in query optimization within distributed environments (Minimum 5 years).
• Experience in Python, particularly with PySpark or distributed processing.
• Knowledge of orchestration (Airflow), ELT processes, and dbt implemented at scale (Minimum 4 years).
• Familiarity with streaming technologies (Kafka, Flink) and Lakehouse architectures utilizing Apache Iceberg (Minimum 3 years).
• Strong grasp of data governance, quality assurance, and data modeling practices.
• Comfortable engaging with AI-assisted development tools (e.g., Claude Code).
• Experience with CDC (Debezium) and low-latency OLAP systems (ClickHouse, Pinot, Trino/Athena).
• Knowledge of semantic layers (Cube.js, dbt) and Data Mesh architectures.
• Familiarity with governance and cataloging tools (OpenMetadata, Lake Formation).
• Experience with vector databases (Qdrant) and data pipelines for machine learning.
• Remote work
• Project duration: 6 months, with the potential for extension or conversion to permanent employment.
Aimpoint Digital
Power Digital Marketing
Get handpicked remote jobs straight to your inbox weekly.