
Senior Autonomy Data Engineer
Posted 2 days ago

Posted 2 days ago
This is a fully remote position, open to applicants in Virginia.
• Take charge of the design and organization of the program’s data lake, encompassing schema definitions, partitioning strategies, and metadata indexing.
• Design and uphold end-to-end pipelines that reliably ingest high-bandwidth sensor logs from vehicles into cloud storage, ensuring tolerance for ad-hoc and intermittent connectivity.
• Create data validation and integrity checks capable of identifying corrupted information, missing sensors, and inconsistent calibration before the data is processed by downstream systems.
• Enforce retention, tiering, and lifecycle policies for data to balance storage costs with developmental value.
• Develop tools to query raw logs and generate curated training and evaluation datasets.
• Automate cost-effective pseudo-labeling workflows at scale with data ingestion.
• Implement data quality and model performance metrics to prioritize labeling efforts on the highest-value examples.
• Deploy and maintain data visualization tools to assist with log review, annotation quality assurance, and autonomy debugging workflows.
• Establish integrations between visualization tools and the data lake, enabling engineers to move from a dataset entry or model failure directly to the originating log data.
• Collaborate with autonomy engineers to define and display custom visualization panels and implement metrics for analyzing unstructured operating environments.
• Create dashboards that offer autonomy engineers visibility into data coverage by terrain type, operating environment, and geographic region.
• Develop and document data contracts between data services and model training consumers.
• Collaborate with perception, planning, and embedded engineers throughout the data lifecycle, from shaping logging schemas and collection triggers to defining dataset interfaces for model training and evaluation.
• Establish data engineering standards, best practices, and tooling selections for an innovative and fast-paced team.
• Contribute to the data roadmap and provide insights to technical leadership on investment priorities.
• Mentor junior engineers to enhance the team's capabilities in data infrastructure scalability and operational hygiene.
• Bachelor’s degree in Computer Science, Computer Engineering, Software Engineering, Electrical Engineering, or a related field with 6+ years of data engineering experience, or a Master’s degree with 4+ years.
• Strong command of Python and SQL, with a proven ability to develop production-quality data pipelines.
• Extensive experience with cloud data infrastructure (preferably AWS: S3, Glue, Athena, Redshift, or equivalent) and infrastructure-as-code tools (Terraform, Cloud Formation).
• Solid understanding of data partitioning strategies and columnar storage formats (Parquet, Orc, etc.).
• Experience in building and operating data pipelines that handle time-series and binary data.
• Demonstrated ability to evaluate and integrate open-source tools when suitable, as opposed to building from the ground up.
• Strong instincts for ensuring data quality through superior implementations of monitoring, validation, and lineage tracking.
• A competitive compensation package that includes a bonus component and stock options.
• 100% paid medical, dental, and vision premiums for full-time employees.
• 401K plan with a 6% employer match.
• Flexibility in schedule and generous paid vacation (available immediately after the start date).
• Company-wide holiday office closures.
• AD+D and Life Insurance.
Cision France
Navigate Power
Get handpicked remote jobs straight to your inbox weekly.