
DBRE – Database Reliability Engineer
Posted May 22

Posted May 22
This is a fully remote position, open to applicants in Brazil.
• Oversee, manage, and enhance database environments in both production and non-production settings utilizing Infrastructure as Code (IaC).
• Guarantee high availability, performance, scalability, and dependability of databases.
• Strategize and implement backup, restoration, replication, and disaster recovery plans.
• Conduct tuning, capacity planning, and troubleshooting for databases and data integrations.
• Address incidents, perform Root Cause Analysis (RCA), and apply preventive measures.
• Develop and update runbooks and documentation of standards.
• Ensure effective, secure, and resilient connectivity between applications and databases hosted on AWS.
• Plan, manage, and optimize costs for AWS network resources (VPCs, Subnets, Route Tables, NAT Gateway, VPC Peering, VPC Endpoints, Transit Gateway, VPN, etc.).
• Actively engage in diagnosing latency, packet loss, timeouts, and connection issues.
• Collaborate with IAM and access policies for resources.
• Define and track SLIs and SLOs for databases and establish standards for monitoring and observability.
• Utilize tools such as Datadog, Prometheus, Grafana, and CloudWatch.
• Support strategies centered around reliability-oriented observability.
• Prior experience as a DBRE in high-stakes production environments.
• Strong understanding of implementation, scaling, tuning, and disaster recovery for at least two of the following databases: PostgreSQL, Elasticsearch, SOLR, MongoDB, Oracle, and Snowflake.
• Extensive experience with AWS, including networking, connectivity, monitoring, and observability services.
• Practical knowledge of Terraform (infrastructure as code for databases and networks).
• In-depth understanding of networking concepts: TCP/IP, DNS, latency, throughput, and timeouts.
• Scripting/automation proficiency in Shell, Python, or similar languages.
• Experience in database monitoring, metrics, and alerting.
• Background in SRE/DevOps environments with incident response experience in production.
• Strong analytical skills for complex troubleshooting involving databases, networks, and applications.
• Familiarity with EKS/Kubernetes for integrating applications and databases.
• Meal and Food Allowance.
• Gympass/TotalPass.
• Home-office allowance.
• Health Insurance and Dental Plan (dental optional).
• Childcare assistance (up to the child’s 6th birthday).
• Extended Maternity, Paternity, and Adoptive Leave (#allfamiliesmatter).
• Life Insurance.
• Birthday Day Off (one day off to be taken on the birthday or during the birthday month).
• Family Day (one day off for parents to be taken between May and August).
• Mental Break (one continuous week off in December to rest and recharge).
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.