
Senior Database Site Reliability Engineer
Posted Jun 20

Posted Jun 20
This is a fully remote position, open to applicants in United States.
• Tasked with designing, implementing, and maintaining critical database systems that are high-availability, high throughput, and data and compute intensive, specifically running PostgreSQL to support a growing 24x7 SaaS platform.
• Define and enhance the reliability of database services through monitoring/alerting, SLO-oriented metrics, and ensuring operational readiness.
• Engage in and assist with incident response, root cause analysis, and post-incident corrective measures for database-related production occurrences.
• Collaborate with other technical leaders to guarantee that all newly launched systems are supportable and maintainable by both development and operations teams.
• Provide escalated technical support and guidance to various technology teams throughout the organization.
• Offer on-call coverage for production support and fulfill additional duties as required.
• Responsible for adhering to HIPAA security policies within the database platform.
• Ensure that all solutions and operational tasks comply with the organization’s established security and operational policies.
• Own and continuously enhance our Datadog database observability by creating actionable dashboards, alerts, and service-level views using an observability stack (e.g., Prometheus, Grafana, New Relic, or similar). Familiarity with PGAnalyze or Percona is a plus.
• Automate system maintenance tasks utilizing Bash, Powershell, Python, or Ansible, and manage infrastructure as code (IaC) by writing Ansible playbooks. Some experience with Terraform is a plus.
• Experience in writing and designing ETL pipelines using Python is advantageous.
• Understand and maintain various components of the PostgreSQL ecosystem such as PgBouncer, PgBackrest, HaProxy, and RepMgr is a plus.
• Demonstrate excellent communication and interpersonal skills.
• Bachelor’s degree in Information Systems, Engineering, or equivalent experience.
• 7-10+ years of experience in Engineering, focusing on Database Engineering, Systems Engineering, DevOps, and/or SRE.
• Familiarity with cloud-based compute, storage, and containerization solutions, specifically Azure and Kubernetes, is preferred.
• Proficiency in operating PostgreSQL within a Linux environment is advantageous.
• Expertise with an observability/monitoring platform (e.g., Prometheus/Grafana, New Relic, Datadog, or equivalent); experience with Datadog is a plus.
• Experience working in Agile/DevOps environments and managing production services in alignment with ITSM practices where applicable.
• Employer-sponsored health, dental, vision, life, and disability insurance.
• Retirement plan with company contributions.
• Annual company profit-sharing.
• Personal development and training budget.
• Open and collaborative work environment.
• Extensive two-week onboarding plan.
• Comprehensive mentorship program.
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.