
Senior Site Reliability Engineer
Posted May 6

Posted May 6
This is a fully remote position, open to applicants in Illinois.
• Design and enhance both new and existing systems to improve performance, reliability, and scalability.
• Develop, implement, and iterate on CI/CD pipelines.
• Support the Management, Development, Design, and Deployment of microservice and containerized applications.
• Establish robust security measures in distributed systems and agents.
• Collaborate with engineers and developers to automate deployments and configurations across different platforms.
• Simplify the complexity of Observability implementation by creating scalable automation.
• Spot opportunities for improvement in observability and related processes.
• Standardize and develop alerts, notifications, and responses for monitoring tools.
• Partner with application teams to integrate Observability into daily operations.
• Take part in post-mortem analyses and provide root cause insights along with the implementation of action items.
• Advocate for DevOps best practices within the team.
• Engage in and promote Agile/Scrum methodologies.
• Contribute to the hybrid cloud production containerization service offerings.
• Design and establish standards, policies, and procedures for automation and integrations.
• Collaborate with application subject matter experts to learn our toolsets and recommend/implement new features to optimize operations.
• Bachelor’s Degree with 7 years of experience; Master’s Degree with 6 years of experience; PhD with 2 years of experience.
• Approach best practices for security as a necessity, not an afterthought.
• Proficient in Cloud Platform administration (AWS, GCP, Azure).
• Familiarity with the pillars of Observability.
• Experience working in high-scale environments with a strong understanding of distributed architectures.
• Knowledge of Agile and DevOps methodologies.
• Familiarity with CI/CD tools (GitHub Actions, Bamboo, Jenkins, Azure DevOps).
• Experience managing Docker workloads using orchestration tools (Kubernetes / Amazon ECS).
• Ability to work independently and collaboratively for daily tasks.
• Eagerness to learn new concepts and processes swiftly, adapting to a dynamic environment.
• Comfortable administering Linux and Windows environments.
• Preferred: Experience with SPIRE/SPIFFE.
• Direct experience with Terraform and Crossplane.
• Proficient in development tools and scripting languages (Git / Mercurial / Subversion; Python / Elixir / Go).
• Capable of integrating MCP Servers with authorization controls.
• Knowledge of database management systems (NoSQL, Relational Databases, and relevant query languages).
• AWS Cloud Practitioner or Azure AZ-900 Certification is desirable.
• Extensive experience in the design and implementation of serverless architecture solutions.
• Proven experience in deploying containerized applications (Kubernetes, etc.).
• Familiarity with data management and pipeline technologies (Apache Storm, Kafka, Flink, Spark, Hadoop, etc.).
• Previous experience working in an Agile team.
• Strong understanding of observability solutions using OpenTelemetry, Prometheus/Grafana, or similar applications.
• Excellent understanding of distributed system architectures and telemetry.
• Significant experience in deploying and managing large Kubernetes distributed platforms.
• Proficient in GitOps practices and Infrastructure as Code systems (such as Terraform, ArgoCD, Helm).
• Paid time off (vacation, holidays, sick leave).
• Medical, dental, and vision insurance.
• 401(k) plan for eligible employees.
• Short-term incentive programs.
Investigo
Software Mind
Cherokee Federal
Avaya
Get handpicked remote jobs straight to your inbox weekly.