
Senior Site Reliability Engineer
Posted 1 day ago

Posted 1 day ago
This is a fully remote position, open to applicants in Poland.
• Ensure the accessibility, dependability, and performance of high-traffic Java applications within a distributed setting.
• Diagnose and resolve intricate issues in both production and non-production environments.
• Engage in pre- and post-deployment performance testing and monitoring to enhance application performance continually.
• Design, develop, and manage agentic AI workflows that automate operational tasks such as alert triage and root cause analysis.
• Bachelor’s degree in Computer Science or a related field, or equivalent professional experience.
• Over 5 years of experience in SRE, DevOps, or similar infrastructure roles, with a background in managing large-scale, high-availability production systems.
• At least 3 years of hands-on experience in managing production Kubernetes clusters, with a comprehensive understanding of architecture, networking, storage, and security.
• Advanced proficiency with the Grafana observability stack, including dashboards, alerting, visualization, and Grafana Alloy for telemetry collection.
• Strong scripting skills in Python, Bash, or Go, with a background in building CI/CD pipelines and deployment automation.
• Minimum of 1 year of practical experience in developing or operating AI/LLM-powered tools, agents, or workflows.
• Fully remote position.
• Opportunity to work with cutting-edge AI tools.
• Collaborative team environment.
Innovative Solutions
Caspar Health
IVIX
Investigo
Get handpicked remote jobs straight to your inbox weekly.