
Senior Data Platform Engineer
Posted May 20

Posted May 20
This is a fully remote position, open to applicants in Portugal.
• Design and enhance Sword’s streaming lakehouse — the essential infrastructure relied upon by every data consumer within the organization.
• Create and manage distributed streaming pipelines that ensure data is transferred with low latency and high reliability.
• Take ownership of the resilient workflows that orchestrate complex data movements across various systems.
• Define the platform’s API surface — the interface utilized by producers and consumers to interact without needing to engage with the infrastructure.
• Lead assessments and integrations with vendor data platforms, engaging with architectural trade-offs rather than merely consuming the results.
• Contribute to the self-service and agentic layer: interfaces crafted for use by humans, systems, and AI agents alike.
• Collaborate with data engineers and analysts on contracts, governance, and data lineage.
• Develop and sustain AI-ready data infrastructure that powers machine learning and AI-driven products across Sword.
• Utilize AI coding assistants and LLMs to expedite development, automate documentation, and enhance code quality.
• Operate within a regulated environment where audit, compliance, and governance considerations are integral to every design.
• Demonstrated experience in designing and managing data platforms at scale — including warehouse, data lake, or lakehouse architectures in a production setting.
• Practical experience with a modern lakehouse table format — Iceberg is strongly preferred; Delta Lake or Hudi are also acceptable. You possess an understanding of the underlying workings of the format: metadata layout, snapshots, manifests, compaction, copy-on-write vs. merge-on-read.
• A clear mental framework of catalogs (REST, Polaris, Glue, Unity, Hive) — their trade-offs, and an understanding of how compute remains separate from storage.
• Familiarity with at least one vendor lakehouse or query platform — such as Snowflake, Starburst, or Databricks — at a level that enables you to reason about its architecture, not just utilize its user interface.
• Strong experience with a distributed processing engine — Flink is strongly preferred; Spark is also acceptable. You can analyze its internals, fine-tune a running job, and troubleshoot a pipeline that is degrading silently.
• Knowledge of durable execution — such as Temporal, Restate, or similar — or, at a minimum, a solid understanding of what durable execution entails and its significance for data workflows.
• Practical experience in building and operating APIs (REST or gRPC) at scale — possessing good instincts regarding contracts, versioning, retries, rate limiting, and observability.
• Comprehensive understanding of Kafka and event-driven architectures (producers/consumers, partitioning, delivery semantics).
• Comfort working in regulated environments (healthcare, fintech, government) where audit, compliance, and data governance are essential components of every design.
• A platform-oriented mindset: you design for self-service, prioritize API-first approaches, and recognize systems and agents — not just humans — as valid consumers.
• Health, dental, and vision insurance
• Meal allowance
• Equity shares
• Remote work allowance
• Flexible working hours
• Work from home
• Discretionary vacation
• Snacks and beverages
MAINSOFT
World Vision
Block Labs
Attio
Get handpicked remote jobs straight to your inbox weekly.