
Mid-Level Data Engineer
Posted 2 days ago

Posted 2 days ago
This is a fully remote position, open to applicants in United States.
• Collaborate with senior engineers to create new ETL pipelines and data ingestion processes utilizing AWS Glue (Spark-based, PySpark), MWAA (Airflow), Lambda, and SNS.
• Incorporate the agency's ETL Common Library into Glue jobs to standardize orchestration, manage error handling, record metadata, and send SNS notifications for all successful and erroneous job events.
• Ingest structured and semi-structured datasets (CSV, XML, JSON, Avro, pipe-delimited) into S3 landing, raw, and curated zones using Apache Iceberg tables.
• Set up static ETL metadata in the centralized PostgreSQL metadata store; ensure that dynamic metadata captures job status and timestamps for all crucial execution steps.
• Oversee assigned production jobs and engage in operations support rotations.
• Ensure that ETL Load Reports are updated in real-time and ETL Gap Reports are refreshed weekly.
• Create and sustain materialized views and semantic layer objects in Trino and Athena to enhance query performance and maintain consistent business logic.
• Generate and keep up-to-date required documentation for each assigned dataset: Business Requirements, ETL Design Documents, Data Models, Data Dictionaries, Mapping Documents, Deployment Documents, O&M Guides, and ETL Test Plans.
• Develop unit and integration tests to meet a minimum code coverage threshold of 90%; conduct security scans at least once per sprint.
• Deploy ETL resources using CloudFormation templates via the agency's CICD pipeline.
• Assist in the transition of ETL jobs from other agency teams and participate in disaster recovery exercises.
• US Citizenship is mandatory.
• A Bachelor's Degree is required.
• A minimum of 3-5 years of relevant experience is necessary.
• Practical experience with Python (PEP 8), PySpark, and SQL for ETL pipeline development.
• Familiarity with AWS services, including Glue, S3, MWAA (Airflow), Lambda, SNS, and SQS.
• Knowledge of Apache Iceberg, Parquet, and ORC file formats, as well as S3 data lake zone concepts.
• Experience with PostgreSQL and basic knowledge of Redshift or Oracle.
• Understanding of Trino or Athena for query and semantic layer development.
• Experience with CloudFormation, GitHub branching workflows, and CI/CD-integrated deployments.
• Ability to create comprehensive ETL documentation, including data models (in Mermaid format) and data dictionaries.
• Understanding of ETL metadata concepts, including static and dynamic metadata, load reports, and gap reports.
• Experience in agile development settings with sprint-based delivery.
• Familiarity with IV&V and/or User Acceptance Testing (UAT) processes in a federal or technical program environment.
• Experience with automated testing frameworks; capability to write unit and integration tests that meet defined code coverage thresholds.
• Knowledge of FISMA, NIST 800-53, and OWASP ASVS Level 2 is a plus.
• Availability to work from 8 am to 5 pm Eastern Time, regardless of home location.
• An active federal public trust suitability determination or the ability to obtain one is required.
• Flexible work arrangements.
• Continuous learning opportunities.
• Professional development support.
• Special incentives for team members residing in qualified HUBZones.
Cision France
Navigate Power
Get handpicked remote jobs straight to your inbox weekly.