
Staff Data Engineer
Posted Jun 21

Posted Jun 21
This is a fully remote position, open to applicants in North Carolina.
• Establish the technical architecture and platform standards for our lakehouse on AWS, including distributed cloud architecture, schema conventions, multi-tenant isolation, and integration design.
• Oversee the design and implementation of production pipelines that aggregate performance and product data, while also managing data modeling for complex entities (time-series, hierarchical, multi-source) to ensure that the models effectively support products, analytics, and machine learning.
• Implement an appropriate level of data governance, ownership, and stewardship to enhance our data maturity and create the foundation for a catalog and semantic layer that analytics, machine learning, and AI agents can utilize.
• Create and maintain the Data Platform playbook (which includes reusable patterns, ADRs, runbooks, and Terraform modules) with built-in data quality and reliability, enabling product teams to independently access new datasets and integrations.
• Manage the delivery process from start to finish, covering requirements and planning, coordinating workstreams, and communicating progress to senior leadership and non-technical stakeholders.
• Guide engineers at various levels, elevate standards through design reviews and on-call responsibilities, and serve as the engineering authority influencing the platform roadmap.
• A minimum of 10 years in data engineering or a related field, with a strong proficiency in Python for pipelines, transformations, and platform tooling.
• Extensive experience in designing, operating, and providing direction for lakehouse platforms (such as Delta Lake, Iceberg, or Hudi) and modern processing engines (including Spark, Databricks, Trino, or Snowflake) at a production scale, along with the ability to make difficult trade-offs and resolve issues.
• Proficient in AWS and distributed cloud architecture (S3, IAM, Glue, EMR/Lambda, networking), with a strong command of Terraform and best practices for implementing those designs.
• In-depth knowledge of data modeling and schema design for complex entities (time-series, hierarchical, multi-source) in multi-tenant environments across various systems you have developed (warehouses, lakehouses, relational), and established integration standards among teams (event-driven, API, batch).
• Proven history of establishing or significantly enhancing a data platform from vague objectives, including the organizational efforts to align leaders and teams and to communicate decisions to senior and non-technical stakeholders through RFCs and ADRs.
• Understanding of how to implement data governance, ownership, and stewardship programs, with the discernment to apply just enough to improve data maturity without unnecessary complexity.
• Equity options available.
• Bonus opportunities offered.
Anord Mardix
Stefanini Brasil
InVision Communications
Get handpicked remote jobs straight to your inbox weekly.