
Data Engineer – DataOps, Infrastructure Focus
Posted May 2

Posted May 2
• Design, automate, and uphold production-ready data infrastructure on AWS (e.g., S3, EMR, Glue, Lambda, Redshift) utilizing Terraform or CDK, prioritizing high availability, security, and consistent environments throughout the SDLC.
• Incorporate Claude Code and other LLM-based agents into the engineering workflow to enhance infrastructure provisioning, refactoring, and the creation of technical documentation, integrating AI into everyday development practices.
• Create, construct, and refine CI/CD pipelines that test, deploy, and monitor dbt models and AWS Glue/Spark jobs, ensuring the dependable and repeatable delivery of governed data assets.
• Execute agentic operations for DataOps—configuring AI agents to assess and conduct root-cause analysis of pipeline failures, identify cost-optimization signals, and proactively recognize schema drift or data quality regressions.
• Develop scalable, well-governed data pipelines and tables using Apache Iceberg, Airflow (MWAA), and Redshift, focusing on simplicity, reusability, and clear ownership of data products.
• Operationalize security and compliance best practices within a regulated insurance environment, encompassing IAM automation, encryption, audit-ready logging, and adherence to enterprise RBAC/MFA standards.
• Collaborate with Product Strategy, PDO, and data science teams to ensure that data platforms and features can effectively support AI-intensive products such as the Agentic AI Platform, Claim Summary, and Underwriting Assistant at scale.
• 5+ years of experience in Data Engineering, Data Operations, or Platform Engineering focusing on building and managing cloud data infrastructure.
• Extensive expertise with AWS (e.g., S3, EMR, Glue, Lambda, Redshift) and infrastructure-as-code (Terraform preferred; CDK is a plus).
• Significant experience with dbt in production (modeling, testing, documentation, deployment).
• Advanced SQL capabilities (performance tuning, complex joins, and window functions).
• Strong Python proficiency for automation, orchestration, and data engineering tasks.
• Practical experience with Apache Spark for large-scale batch or streaming workloads, preferably on AWS EMR or Glue.
• Demonstrated success in building or maintaining CI/CD pipelines (Git-based workflows, automated testing, deployment, and monitoring) for data and analytics workloads.
• Strong systems thinking and data modeling abilities (e.g., Kimball, Data Vault).
• Effective, collaborative communication style with the capacity to work with product, security, and business stakeholders in a distributed setting.
• Flexible work environment
• Health and Wellness benefits
• Paid time off programs including volunteer time off
• Market-competitive pay and incentive programs
• Continual development and internal career growth opportunities
SmartLight Analytics
CloudSmiths
BPCS, Comprehensive marketing solutions, ltd.
Get handpicked remote jobs straight to your inbox weekly.