This is a fully remote position, open to applicants in Virginia.

📋 Description

• Take charge of the design and organization of the program’s data lake, encompassing schema definitions, partitioning strategies, and metadata indexing.

• Design and uphold end-to-end pipelines that reliably ingest high-bandwidth sensor logs from vehicles into cloud storage, ensuring tolerance for ad-hoc and intermittent connectivity.

• Create data validation and integrity checks capable of identifying corrupted information, missing sensors, and inconsistent calibration before the data is processed by downstream systems.

• Enforce retention, tiering, and lifecycle policies for data to balance storage costs with developmental value.

• Develop tools to query raw logs and generate curated training and evaluation datasets.

• Automate cost-effective pseudo-labeling workflows at scale with data ingestion.

• Implement data quality and model performance metrics to prioritize labeling efforts on the highest-value examples.

• Deploy and maintain data visualization tools to assist with log review, annotation quality assurance, and autonomy debugging workflows.

• Establish integrations between visualization tools and the data lake, enabling engineers to move from a dataset entry or model failure directly to the originating log data.

• Collaborate with autonomy engineers to define and display custom visualization panels and implement metrics for analyzing unstructured operating environments.

• Create dashboards that offer autonomy engineers visibility into data coverage by terrain type, operating environment, and geographic region.

• Develop and document data contracts between data services and model training consumers.

• Collaborate with perception, planning, and embedded engineers throughout the data lifecycle, from shaping logging schemas and collection triggers to defining dataset interfaces for model training and evaluation.

• Establish data engineering standards, best practices, and tooling selections for an innovative and fast-paced team.

• Contribute to the data roadmap and provide insights to technical leadership on investment priorities.

• Mentor junior engineers to enhance the team's capabilities in data infrastructure scalability and operational hygiene.

⛳️ Requirements

• Bachelor’s degree in Computer Science, Computer Engineering, Software Engineering, Electrical Engineering, or a related field with 6+ years of data engineering experience, or a Master’s degree with 4+ years.

• Strong command of Python and SQL, with a proven ability to develop production-quality data pipelines.

• Extensive experience with cloud data infrastructure (preferably AWS: S3, Glue, Athena, Redshift, or equivalent) and infrastructure-as-code tools (Terraform, Cloud Formation).

• Solid understanding of data partitioning strategies and columnar storage formats (Parquet, Orc, etc.).

• Experience in building and operating data pipelines that handle time-series and binary data.

• Demonstrated ability to evaluate and integrate open-source tools when suitable, as opposed to building from the ground up.

• Strong instincts for ensuring data quality through superior implementations of monitoring, validation, and lineage tracking.

🏝️ Benefits

• A competitive compensation package that includes a bonus component and stock options.

• 100% paid medical, dental, and vision premiums for full-time employees.

• 401K plan with a 6% employer match.

• Flexibility in schedule and generous paid vacation (available immediately after the start date).

• Company-wide holiday office closures.

• AD+D and Life Insurance.

Senior Autonomy Data Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Rate Analyst

HSE Manager

People Partner

B2B Outside Sales Consultant

Business Development Executive, Early Career – European Language Required

Statistical Programmer II

Never miss a great job!