
Site Reliability Engineer
Posted May 9

Posted May 9
This is a fully remote position, open to applicants in Brazil.
• Take responsibility for the reliability of our global bare metal fleet, including monitoring, alerting, incident response, and conducting post-mortems.
• Develop and sustain internal tools, such as NetBox (the infrastructure source of truth) and services in Python/Go.
• Lead automation efforts for the hardware lifecycle, encompassing provisioning, decommissioning, firmware updates, and network modifications.
• Work closely with platform engineers on the provisioning stack.
• Engage in an on-call rotation.
• A minimum of 4 years of experience in SRE or infrastructure engineering.
• Strong foundation in Linux, including kernel, networking, and hardware layers.
• Experience in network automation (BGP, VLANs, IPAM) is a notable advantage.
• Proficient in Python or Go for developing internal tooling.
• Familiarity with NetBox, Prometheus, or comparable tools.
• A passion for owning projects from start to finish.
• Nice to have: experience with bare metal operations, Tinkerbell/PXE systems, and a background in network engineering (CCNA/CCNP or equivalent).
• Comprehensive health insurance packages.
• Flexible work hours and remote work options.
• Opportunities for professional development and growth.
• Access to cutting-edge technology and tools.
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.