This is a fully remote position, open to applicants in Poland.

📋 Description

• Ensure the dependability, scalability, and efficiency of Dropbox's infrastructure and services.

• Collaborate with interdisciplinary teams to establish and uphold best practices for monitoring, logging, and incident response.

• Construct, implement, and sustain automation and infrastructure-as-code tools, specifically Terraform, Ansible, and GitHub Actions, along with custom code platforms.

• Leverage container orchestration platforms, including Kubernetes, Amazon ECS, and Red Hat OpenShift, to manage containers on a large scale.

• Oversee and enhance monitoring and logging pipelines utilizing tools such as Datadog and Cribl LogStream.

• Propel improvement initiatives concerning service health and visibility for our stakeholders, which range from developers to business service owners to C-level executives.

• Create and maintain custom tools and automation scripts using Bash, Python, and other scripting languages.

⛳️ Requirements

• 5+ years of experience in site reliability engineering or similar engineering roles with practical coding experience.

• Strong understanding of AWS services, including EC2, S3, RDS, R53, Lambda, and others.

• In-depth knowledge of Linux administration, internals, filesystems, volume management, and specific distributions such as Ubuntu and RHEL, along with DNS and DHCP.

• Familiarity with monitoring and logging tools, including Datadog and logging pipeline tools like Vector or Cribl LogStream.

• Proven experience in leading one or more transformational programs related to metrics and observability.

• Proficient in scripting with a higher-level language (Python preferred).

• Experience in developing automation to address infrastructure-related tasks using tools such as Chef, Ansible, or Terraform.

• Knowledge of log analysis and the creation of metrics, alerts, and visuals from log data.

• Strong expertise in infrastructure-as-code tools, particularly Terraform.

• Solid proficiency in configuration management tools, specifically Ansible Automation Platform and Chef.

• Familiarity with containerization technologies, such as Docker, and container orchestration platforms like Kubernetes or Amazon ECS.

• Understanding of LDAP, REST APIs, and current authentication methods.

• Familiarity with GitHub and Git-based workflows.

• Knowledge of RDS databases and network security technologies, including WAF.

• Excellent problem-solving abilities and the capacity to thrive in a fast-paced, collaborative environment.

• Outstanding written and verbal communication skills.

🏝️ Benefits

• Health insurance.

• Retirement plans.

• Paid time off.

• Flexible work arrangements.

• Professional development.

Site Reliability Engineer

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Cloud Engineer – DevOps

DevSecOps/DevOps Engineer

Deployment Engineer

Senior Cloud - Kubernetes SRE

DevOps Engineer

DevSecOps Engineer

Never miss a great job!