This is a fully remote position, open to applicants in Brazil.

• Design, develop, and maintain cloud data pipelines (Azure and/or AWS) with recurring execution (e.g., every X hours, daily, weekly).

• Implement robust and scalable ETL/ELT processes, managing large volumes of data and complex transformations, primarily using Spark.

• Develop distributed processing solutions with PySpark in Databricks.

• Monitor pipelines to ensure data quality, consistency, and availability, implementing alerts, recovery strategies, and executing backfills.

• Manage, configure, and optimize Databricks clusters, focusing on performance, scalability, availability, and cost control.

• Integrate multiple data sources, including APIs, relational databases, non-relational databases, and streaming data.

• Perform data modeling and define table structures for various layers (raw, curated, analytics, feature).

• Develop and maintain ingestion and orchestration pipelines using tools such as Azure Data Factory and Airflow.

• Implement near real-time data ingestion using Kafka and Change Data Capture (CDC) concepts.

• Proficiency in Python for processing and automation;

• Experience with Azure and/or AWS in the context of data engineering;

• Solid understanding of Azure Data Factory (ADF) / Airflow or equivalent tools;

• Strong knowledge of Databricks, including jobs, clusters, and optimization strategies;

• Intermediate knowledge of consuming and integrating REST APIs (Databricks, Azure, and AWS services);

• Experience in building feature pipelines for Machine Learning models (feature engineering and feature stores);

• Understanding of data modeling focused on Machine Learning.

• Position also available for people with disabilities (PcD).

Senior AWS Data Engineer

People also viewed