This is a fully remote position, open to applicants in Ukraine.

• Create and implement data ingestion pipelines for both batch and streaming data.

• Set up and manage data orchestration workflows (using Airflow and NiFi) and automate CI/CD processes for data operations.

• Design and structure data layers within the Data Lake architecture (including HDFS, Iceberg, and S3).

• Develop and manage secure and governed data environments utilizing Apache Ranger, Atlas, and SDX.

• Write SQL queries and enhance performance for analytical workloads in Hive/Impala.

• Collaborate on data modeling for analytics and business intelligence, ensuring clean schemas and dimensional models.

• Assist in machine learning workflows leveraging Spark MLlib or Cloudera Machine Learning (CML).

• Demonstrated experience in constructing and maintaining large-scale data pipelines for both batch and streaming data.

• In-depth knowledge of data engineering principles: ETL/ELT, data governance, data warehousing, and Medallion architecture.

• Proficient SQL skills for data serving in a Data Warehouse environment.

• At least 3 years of experience in Python or Scala for data processing tasks.

• Practical experience with Apache Spark, Kafka, Airflow, and optimizing distributed systems.

• Familiarity with Apache Ranger and Atlas for security and metadata management.

• Upper-Intermediate proficiency in English.

• Bachelor’s degree in Computer Science or a related field.

• Options for remote work.

Senior Data Engineer – GovTech, Public Sector

People also viewed