
Intermediate Site Reliability Engineer
Posted May 25

Posted May 25
This is a fully remote position, open to applicants in Chile.
• Assist with the daily operations of a mobile point-of-sale system.
• Offer first-line operational support, monitor systems, and address production incidents.
• Diagnose cloud systems and integrations, implementing corrective measures.
• Handle escalations and work collaboratively on bug fixes and hotfixes.
• Manage MDM solutions and aid in remote software deployments.
• Establish automated monitoring and alerting to enhance incident response.
• Document procedures, update knowledge bases, and create incident runbooks.
• Engage in on-call rotation to provide 24/7 support for critical incidents.
• Contribute to post-incident analyses to refine monitoring, response, and resolution processes.
• Develop Node/TypeScript utilities to streamline workflows, parse logs/JSON, and validate API payloads.
• Investigate REST/GraphQL integrations and evaluate request/response traces.
• Oversee third-party API integrations and collaborate with teams to enhance error handling.
• Examine system and application logs and telemetry to troubleshoot issues.
• Administer and manage system access.
• Bachelor’s degree in Computer Science, Engineering, or a related discipline.
• Minimum of 3 years experience supporting production systems, with an emphasis on incident response and resolution.
• Extensive background in operational support or SRE roles within cloud environments.
• Proficient in Node.js, including debugging, error handling, and performance optimization.
• Familiarity with AWS, Azure, or GCP, particularly in monitoring and troubleshooting cloud-native applications.
• Experience with APIs and integrations.
• Knowledge of logging and monitoring tools (Winston, Bunyan, Datadog, ELK Stack, CloudWatch).
• Strong problem-solving abilities in high-pressure and time-sensitive environments.
• Experience with CI/CD pipelines and automated deployments (Jenkins, GitLab CI, AWS CodePipeline).
• Excellent communication skills, including clear and structured incident reporting and documentation.
• Ability to collaborate effectively across development, DevOps, and product teams.
• Upper-Intermediate+ level of English.
• **Desirable:**
• Experience with containerization technologies (Docker, Kubernetes).
• Understanding of REST APIs, WebSockets, and microservices architectures.
• Familiarity with incident management frameworks (ITIL, SRE practices).
• Knowledge of cloud security best practices.
• Experience with mobile POS platforms or mobile application environments.
• Familiarity with mobile device management (MDM) solutions.
• Enjoy 30 paid days off each year for vacations, holidays, or personal time.
• Receive 5 paid sick days, up to 60 days of medical leave, and up to 6 paid days off annually for significant family events such as weddings, funerals, or childbirth.
• Benefit from partially covered health insurance after probation, along with a wellness bonus for gym memberships, sports nutrition, and similar needs after 6 months.
• We compensate in U.S. dollars and cover all approved overtime.
• Participate in English lessons and Dev.Pro University programs, and engage in enjoyable online activities and team-building events.
Advanced Solutions International, Inc.
Stone
Replit
Soum
Get handpicked remote jobs straight to your inbox weekly.