This is a fully remote position, open to applicants in Portugal.

📋 Description

• Design and enhance Sword’s streaming lakehouse — the essential infrastructure relied upon by every data consumer within the organization.

• Create and manage distributed streaming pipelines that ensure data is transferred with low latency and high reliability.

• Take ownership of the resilient workflows that orchestrate complex data movements across various systems.

• Define the platform’s API surface — the interface utilized by producers and consumers to interact without needing to engage with the infrastructure.

• Lead assessments and integrations with vendor data platforms, engaging with architectural trade-offs rather than merely consuming the results.

• Contribute to the self-service and agentic layer: interfaces crafted for use by humans, systems, and AI agents alike.

• Collaborate with data engineers and analysts on contracts, governance, and data lineage.

• Develop and sustain AI-ready data infrastructure that powers machine learning and AI-driven products across Sword.

• Utilize AI coding assistants and LLMs to expedite development, automate documentation, and enhance code quality.

• Operate within a regulated environment where audit, compliance, and governance considerations are integral to every design.

⛳️ Requirements

• Demonstrated experience in designing and managing data platforms at scale — including warehouse, data lake, or lakehouse architectures in a production setting.

• Practical experience with a modern lakehouse table format — Iceberg is strongly preferred; Delta Lake or Hudi are also acceptable. You possess an understanding of the underlying workings of the format: metadata layout, snapshots, manifests, compaction, copy-on-write vs. merge-on-read.

• A clear mental framework of catalogs (REST, Polaris, Glue, Unity, Hive) — their trade-offs, and an understanding of how compute remains separate from storage.

• Familiarity with at least one vendor lakehouse or query platform — such as Snowflake, Starburst, or Databricks — at a level that enables you to reason about its architecture, not just utilize its user interface.

• Strong experience with a distributed processing engine — Flink is strongly preferred; Spark is also acceptable. You can analyze its internals, fine-tune a running job, and troubleshoot a pipeline that is degrading silently.

• Knowledge of durable execution — such as Temporal, Restate, or similar — or, at a minimum, a solid understanding of what durable execution entails and its significance for data workflows.

• Practical experience in building and operating APIs (REST or gRPC) at scale — possessing good instincts regarding contracts, versioning, retries, rate limiting, and observability.

• Comprehensive understanding of Kafka and event-driven architectures (producers/consumers, partitioning, delivery semantics).

• Comfort working in regulated environments (healthcare, fintech, government) where audit, compliance, and data governance are essential components of every design.

• A platform-oriented mindset: you design for self-service, prioritize API-first approaches, and recognize systems and agents — not just humans — as valid consumers.

🏝️ Benefits

• Health, dental, and vision insurance

• Meal allowance

• Equity shares

• Remote work allowance

• Flexible working hours

• Work from home

• Discretionary vacation

• Snacks and beverages

Senior Data Platform Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Ingeniero de plataforma de integración

Power Platform Developer

Data Platform Engineer

Senior Platform Engineer

AWS Platform Engineer

Platform Engineer

Never miss a great job!