This is a fully remote position, open to applicants in Poland.

📋 Description

• Take charge of PostgreSQL reliability in production, including HA design, Patroni, PgBouncer, replication, failover, upgrades, vacuum/bloat management, query optimization, locks, indexes, capacity management, backups, PITR, and restore validation.

• Enhance disaster recovery and operational documentation by ensuring tested restores, well-documented recovery paths, measurable RTO/RPO targets, runbooks, and secure maintenance plans.

• Provide support for the broader database ecosystem, including ClickHouse, MongoDB, and Redis. You will respond to incidents, review access and data-safety modifications, enhance monitoring, and familiarize yourself with existing production ClickHouse patterns.

• Streamline DBA workflows by utilizing Ansible, Terraform/OpenTofu, GitLab CI/CD, scripts, and reproducible runbooks for tasks such as provisioning, grants, backups, restores, health checks, and ownership metadata management.

• Assist in creating DBaaS-style self-service functionalities, enabling engineering teams to request databases, access permissions, credentials, and operational checks with minimal DBA intervention.

• Enhance observability and incident response capabilities through the use of Grafana, metrics, logs, SLOs, alert rules, Opsgenie routing, and effective communication during production incidents.

⛳️ Requirements

• Extensive hands-on experience with PostgreSQL in critical production environments, typically requiring 5+ years or equivalent depth of knowledge.

• In-depth understanding of PostgreSQL internals and operations, including MVCC, WAL, transactions, locks, indexes, query planning, replication, autovacuum, bloat, major upgrades, backups, PITR, and restore testing.

• Demonstrated experience with highly available databases, with the ability to reason about quorum, split-brain risks, failover strategies, rollbacks, and recovery processes.

• Strong foundation in Linux and infrastructure, including systemd, networking, storage, filesystems, CPU/memory/disk bottlenecks, TLS, DNS, firewalls, and root-cause analysis.

• Proficiency in automation using Ansible and scripting. Familiarity with Terraform/OpenTofu, GitLab CI/CD, and merge-request based delivery is a significant advantage.

• Capability to support multiple database engines. While you do not need to be a ClickHouse expert on your first day, you must be eager to learn it swiftly and assume responsibility for it.

• Practical experience using AI engineering assistants such as Claude and Codex. We expect you to leverage these tools to enhance speed and quality while personally validating generated SQL, commands, scripts, and operational insights.

• Proficient written English for asynchronous communication in platforms like Jira, Slack, GitLab, Slite, and runbooks.

🏝️ Benefits

• Emphasis on professional growth and development.

• Engaging and challenging projects.

• Fully remote position with flexible working hours, allowing you to organize your day and work from any location around the globe.

• Paid vacation of 24 days per year, along with 10 days of national holidays and unlimited sick leave.

• Coverage for private medical insurance.

• Reimbursement for co-working spaces and gym/sports activities.

• Educational budget available.

• Opportunity to earn a reward for the most innovative idea that the company can patent.

Senior Database Reliability Engineer – Worldwide Remote

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

DevOps Reliability Engineer

Senior Site Reliability Engineer – Network

Staff Site Reliability Engineer

DevOps Engineer, Mid Level

DevOps Engineer, Azure

DevOps Engineer, mk8s

Never miss a great job!