Remotery

Senior Software Engineer, Infrastructure

atEpic KidsUS flagUnited StatesFull-timeUncategorizedSenior$160k – $200k/year

Posted 8 hours ago

This is a fully remote position, open to applicants in United States.

đź“‹ Description

• Ensure the stability and reliability of Epic's GCP infrastructure by establishing and monitoring SLOs/SLIs, minimizing toil, and eliminating recurring instability sources.

• Design and manage Epic's GCP infrastructure to achieve high availability, scalability, and cost-effectiveness.

• Oversee and enhance our Docker and GKE container platform, focusing on workload scheduling, autoscaling, networking, and seamless failure management.

• Sustain and optimize CI/CD pipelines that facilitate rapid, secure, and low-risk delivery across engineering teams.

• Take ownership of the observability stack—metrics, logs, traces, dashboards, and alerts—ensuring that signals are actionable, noise is minimized, and on-call personnel have the necessary context to resolve issues swiftly.

• Write and manage Terraform scripts to codify infrastructure throughout the organization, prioritizing consistency, change safety, and reproducibility.

• Engage in capacity planning, cost optimization, and architectural reviews with a strong emphasis on reliability.

• Advocate for platform security best practices, encompassing secrets management, IAM policies, and network segmentation.

• Assist in compliance-oriented infrastructure practices—vulnerability management, access reviews, audit-evidence flows, and incident-response readiness—as we advance our SOC 2 and student-data compliance initiatives.

• Collaborate with data engineering to oversee the orchestration platform and its supporting infrastructure—deployment, scaling, reliability, and observability.

• Work closely with backend and data engineers to diagnose service and platform issues.

• Set an example by participating in a regular on-call rotation; lead incident response, conduct blameless post-mortems, and ensure follow-through that transforms one-time outages into lasting reliability enhancements.

• Offer guidance to developers on infrastructure-related concerns and best practices.


⛳️ Requirements

• A Bachelor's degree or higher in Computer Science, Software Engineering, or a related discipline.

• Over 5 years of experience in infrastructure, platform, DevOps, or a similar engineering role.

• Practical experience with GCP (GCE, GCS, VPC, IAM, Cloud Monitoring, and associated services).

• Familiarity with Docker and Kubernetes (GKE)—including containerizing workloads, deploying to GKE, Helm, and cluster fundamentals.

• Experience with CI/CD pipelines (GitHub Actions, ArgoCD, Jenkins, or equivalent).

• Proficient in using an observability platform like New Relic (metrics, logging, alerting, dashboards).

• Expertise in Terraform for managing infrastructure as code.

• Scripting/programming capabilities in Python, Bash, or similar languages.

• Willingness to participate in a regular production on-call rotation.

• Proven track record of significantly enhancing the reliability of production systems—e.g., establishing SLOs, decreasing incident frequency or MTTR, and eliminating recurrent failure modes.

• Strong problem-solving abilities, a sense of ownership, and the capacity to operate effectively within dynamic systems.

• Proficient in English for daily collaboration and technical documentation.

• Proficient in Mandarin Chinese to facilitate effective collaboration with global engineering and business teams.


🏝️ Benefits

• Competitive salary and performance-based bonuses.

• Comprehensive health, dental, and vision insurance.

• Generous paid time off and flexible work arrangements.

• Opportunities for professional development and continuous learning.

• Supportive work environment fostering collaboration and innovation.

People also viewed

Urrly5 hours ago

Senior Vice President, Client Strategy

US flagNew York OnlyFull-timeUncategorized$175k – $215k/year
ApplyView job
Weiler Abrasives Group5 hours ago

National Accounts Manager

US flagUnited States OnlyFull-timeUncategorized
ApplyView job
Abbott5 hours ago

Associate Sales Representative, CRM

US flagColorado OnlyFull-timeUncategorized$43.9k – $109.2k/year
ApplyView job
Segoso5 hours ago

3rd Party Collections Specialist

US flagFlorida OnlyFull-timeUncategorized$17 – $20/hour
ApplyView job
DDN5 hours ago

Client Director – Strategic AI Infrastructure

US flagCalifornia OnlyFull-timeUncategorized$175k – $200k/year
ApplyView job
Kandu5 hours ago

Regional Sales Manager

US flagTexas OnlyFull-timeUncategorized$80k – $120k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers