Remotery

Senior Database Reliability Engineer, Architect

Posted May 20

This is a fully remote position, open to applicants in Poland.

📋 Description

• DBaaS Architecture: Create and deploy a self-service platform utilizing Terraform and Ansible, facilitating the establishment of HA clusters (PostgreSQL, ClickHouse, MongoDB, Redis) across diverse environments (Bare Metal, OpenNebula, Kubernetes, Public Clouds). You will transform infrastructure into a marketable product.

• Scaling ClickHouse: Oversee the management of rapidly expanding analytics clusters (12+ clusters, several terabytes of data). You will address sharding, optimize table engines (ReplicatedMergeTree), and develop dependable S3 backup pipelines under significant load.

• Data Platform & Analytics Support: Sustain and enhance the infrastructure for Apache Airflow and Redash. You will guarantee the reliability of ETL pipelines and visualization tools, connecting the raw infrastructure with the data analytics team.

• Reliability as Code: Incorporate SRE principles in data management. Transition from manual incident responses to automated self-healing processes. Define and execute SLO/SLI for all databases.

• Stack Modernization: Spearhead the transition from outdated solutions to contemporary cloud architectures. Engage in decision-making regarding the implementation of Kubernetes operators for stateful workloads.

• Expertise & Mentorship: Act as the technical expert for product teams, assisting them in optimizing data schemas and SQL queries for high-load environments.


⛳️ Requirements

• AI-Augmented Engineering: You perceive AI not as a substitute for solid technical fundamentals, but as a powerful tool. We actively utilize AI agents (Claude, Codex, Gemini, etc.) to automate routine tasks, analyze intricate logs, and expedite research. We expect you to embrace modern workflows and incorporate AI into your daily tasks, allowing you to concentrate on genuine architectural challenges.

• Deep PostgreSQL Expertise (5+ years): You possess an understanding of MVCC internals, locking mechanics, can configure Patroni and PgBouncer effortlessly, and have experience with seamless major version upgrades under load.

• ClickHouse Mastery: Proven experience managing large clusters, with an understanding of ZooKeeper/ClickHouse Keeper, sharding, replication internals, and the capability to diagnose performance issues at the data-part level.

• Engineering Mindset (SRE/DevOps): You dislike performing repetitive tasks manually. Experience in developing complex Terraform modules and Ansible roles is essential. Programming skills in Python or Go for automation are highly advantageous.

• Hybrid Environment Experience: You recognize the distinctions between operating databases on Bare Metal, Kubernetes, and Cloud, and know how to optimize TCO and disk subsystem performance (NVMe, Network Storage).

• Systems Approach: You have a comprehensive view - from network packets to application business logic. You understand the significance of security (FIPS, Audit logs) and Disaster Recovery.

• **Nice to Have:**

• Experience in building an Internal Developer Platform (IDP).

• Experience in managing databases within Kubernetes (CloudNativePG, Altinity Operator).

• Experience working with Cloud and Hosting providers on similar services.


🏝️ Benefits

• A strong emphasis on professional growth.

• Engaging and challenging projects.

• Fully remote work with flexible hours, enabling you to organize your schedule and work from anywhere in the world.

• Paid 24 days of vacation annually, 10 public holidays, and unlimited sick leave.

• Coverage for private medical insurance.

• Reimbursement for co-working and gym/sports expenses.

• Educational budget.

• The chance to earn a reward for the most innovative idea that the company can patent.

People also viewed

Work Life Group5 min ago

Lead DevOps Engineer, Data & AI Platform

HU flagHungary OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
accesa.eu5 min ago

DevOps Engineer, German

RO flagRomania OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cisco12 min ago

Site Reliability Engineer – Kubernetes Platform

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Work Life Group19 min ago

Lead DevOps Engineer – Data & AI Platform

CZ flagCzechia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
JumpCloud19 min ago

Security Engineer, DevSecOps

MX flagMexico OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Unit419 min ago

Cloud Operations Engineer

PT flagPortugal OnlyFull-timeDevOps & Site Reliability Engineer (SRE)€30.5k – €35.1k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers