
Market Data Engineer
Posted Jun 12

Posted Jun 12
This is a fully remote position, open to applicants in Spain.
• Capture & Ingestion. Take ownership of the complete capture pipeline from wire to data lake: decode and standardize raw exchange feeds (pcap, multicast UDP / ITCH / FIX) as well as vendor sources (OneTick, Refinitiv, Bloomberg, ICE) into a cohesive canonical model featuring nanosecond timestamps. Create batch and stream pipelines (Airflow, Spark, dbt) for tick and reference data. Manage L2/L3 order-book reconstruction with gap handling. Provide Python and Rust producer SDKs for internal feed handlers.
• Storage & Modeling — Apache Iceberg. Oversee the Iceberg-over-S3 lakehouse: design partitioning, sorting orders, and row-group layout to facilitate rapid scans; handle schema evolution, snapshots, time travel, compaction, and TTL management. Maintain reference data as slowly-changing tables ensuring point-in-time accuracy for backtests. Drive storage cost optimization through compaction, tiering, and snapshot expiration.
• Tooling & Libraries. Develop libraries for schema management, data contracts, validation, and lineage on top of the Iceberg catalog. Create shared access services (Spark + Polars) to enable Research, backtesting, and trading to utilize a single normalized data layer, including gap detection and reconciliation between pcap and lake.
• Reliability & Observability. Integrate monitoring, alerting, SLAs/SLOs, and CI/CD throughout capture and pipeline layers on Kubernetes (EKS). Own data-quality dashboards and incident runbooks for the capture fleet.
• Collaboration. Collaborate with Quant Research, Data Science, Backend, and DevOps to translate requirements into platform capabilities and promote best practices in market-data engineering.
• Over 5 years of experience in building production-grade data systems, with a demonstrated track record of architecting and launching data lakes/lakehouses from the ground up.
• Practical experience with Apache Iceberg (or similar table formats such as Delta/Hudi): partitioning, schema evolution, snapshots, compaction, and catalog operations; familiarity with Apache Arrow for zero-copy, columnar in-memory interchange.
• Background in market data and/or network packet capture — decoding pcap, exchange feed protocols (ITCH, FIX/FAST, multicast UDP), order-book reconstruction, and large-scale time-series (strong plus; eagerness to learn is required).
• Experience in normalizing market data from various vendors — for instance, OneTick, Refinitiv/Reuters, Bloomberg, ICE — into a unified schema and symbology (strong plus).
• Proficient in Python (including Polars and/or PySpark); Rust is a strong advantage (relevant for high-performance capture/decoding).
• Familiarity with modern orchestration (Airflow) and distributed processing (Apache Spark).
• Advanced SQL skills: complex aggregations, window functions, query optimization, partition pruning.
• Solid foundation in Linux, containerization (Docker, Kubernetes / EKS), and cloud object storage (AWS S3).
• Knowledge of DevOps & observability: CI/CD, infrastructure-as-code (Terraform), GitOps (ArgoCD), and metrics/dashboards/alerting (Grafana, Prometheus).
• Strong understanding of structured + unstructured/binary data, as well as storage optimization — partitioning, compression, and cost management.
• Proficient in English for documentation and collaboration within an international team.
• Compensation for health insurance, sports, professional development, and more.
Aimpoint Digital
Power Digital Marketing
Get handpicked remote jobs straight to your inbox weekly.