This is a fully remote position, open to applicants in Spain.

📋 Description

• Capture & Ingestion. Take ownership of the complete capture pipeline from wire to data lake: decode and standardize raw exchange feeds (pcap, multicast UDP / ITCH / FIX) as well as vendor sources (OneTick, Refinitiv, Bloomberg, ICE) into a cohesive canonical model featuring nanosecond timestamps. Create batch and stream pipelines (Airflow, Spark, dbt) for tick and reference data. Manage L2/L3 order-book reconstruction with gap handling. Provide Python and Rust producer SDKs for internal feed handlers.

• Storage & Modeling — Apache Iceberg. Oversee the Iceberg-over-S3 lakehouse: design partitioning, sorting orders, and row-group layout to facilitate rapid scans; handle schema evolution, snapshots, time travel, compaction, and TTL management. Maintain reference data as slowly-changing tables ensuring point-in-time accuracy for backtests. Drive storage cost optimization through compaction, tiering, and snapshot expiration.

• Tooling & Libraries. Develop libraries for schema management, data contracts, validation, and lineage on top of the Iceberg catalog. Create shared access services (Spark + Polars) to enable Research, backtesting, and trading to utilize a single normalized data layer, including gap detection and reconciliation between pcap and lake.

• Reliability & Observability. Integrate monitoring, alerting, SLAs/SLOs, and CI/CD throughout capture and pipeline layers on Kubernetes (EKS). Own data-quality dashboards and incident runbooks for the capture fleet.

• Collaboration. Collaborate with Quant Research, Data Science, Backend, and DevOps to translate requirements into platform capabilities and promote best practices in market-data engineering.

⛳️ Requirements

• Over 5 years of experience in building production-grade data systems, with a demonstrated track record of architecting and launching data lakes/lakehouses from the ground up.

• Practical experience with Apache Iceberg (or similar table formats such as Delta/Hudi): partitioning, schema evolution, snapshots, compaction, and catalog operations; familiarity with Apache Arrow for zero-copy, columnar in-memory interchange.

• Background in market data and/or network packet capture — decoding pcap, exchange feed protocols (ITCH, FIX/FAST, multicast UDP), order-book reconstruction, and large-scale time-series (strong plus; eagerness to learn is required).

• Experience in normalizing market data from various vendors — for instance, OneTick, Refinitiv/Reuters, Bloomberg, ICE — into a unified schema and symbology (strong plus).

• Proficient in Python (including Polars and/or PySpark); Rust is a strong advantage (relevant for high-performance capture/decoding).

• Familiarity with modern orchestration (Airflow) and distributed processing (Apache Spark).

• Advanced SQL skills: complex aggregations, window functions, query optimization, partition pruning.

• Solid foundation in Linux, containerization (Docker, Kubernetes / EKS), and cloud object storage (AWS S3).

• Knowledge of DevOps & observability: CI/CD, infrastructure-as-code (Terraform), GitOps (ArgoCD), and metrics/dashboards/alerting (Grafana, Prometheus).

• Strong understanding of structured + unstructured/binary data, as well as storage optimization — partitioning, compression, and cost management.

• Proficient in English for documentation and collaboration within an international team.

🏝️ Benefits

• Compensation for health insurance, sports, professional development, and more.

Market Data Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Senior Data Engineer

Mid-level Data Engineer

AI Data Engineer

Data Engineer

Data Engineer

Data Engineering Manager

Never miss a great job!