Remotery

Senior DevOps – Platform Reliability Engineer

Posted Jun 20

This is a fully remote position, open to applicants in New York.

📋 Description

• Take ownership and enhance CI/CD pipelines utilizing GitHub Actions and OIDC-based authentication for microservices and agentic workloads, ensuring safe, rapid, and reversible deployments.

• Automate the provisioning of infrastructure through Infrastructure as Code (IaC) tools like Terraform and CloudFormation.

• Manage and scale our Kubernetes platform (EKS + Argo CD), which includes autoscaling, ingress, external-dns, cert-manager, External Secrets Operator, backups, runtime guardrails, and multi-tenant isolation for enterprise clients.

• Oversee the edge and network perimeter, which encompasses Cloudflare (CDN, WAF, Bot Management, DDoS protection, Zero Trust / Access), CloudFront, API Gateway, ALB/NLB, Route 53, and network security measures.

• Handle the data and event tier, including Aurora MySQL, ElastiCache/Redis, S3, and MSK (Kafka), with accountability for backups, point-in-time recovery (PITR), and multi-AZ disaster recovery aligned with defined RTO/RPO targets.

• Develop and maintain Lambda workloads where event-driven or serverless architectures are applicable.

• Create observability as a product using Prometheus, Grafana, and OpenTelemetry, including telemetry for LLM and agentic systems such as token costs, tool-call latency, evaluation signals, and prompt/version tracking.

• Enhance our security and compliance posture for SOC 2 and HIPAA, incorporating least-privilege IAM, SCPs, secrets management, SAST/DAST, dependency and container scanning, image signing, AWS Config, Security Hub, GuardDuty, Inspector, and evidence automation.

• Lead FinOps initiatives, including tagging standards, Savings Plans and Reserved Instances, cost attribution per tenant and workload, and LLM cost management.

• Develop and advance our AI-native DevOps capabilities.


⛳️ Requirements

• Over 5 years of experience in DevOps, SRE, or Platform Engineering managing production systems on AWS.

• Extensive experience with CI/CD pipelines and tools such as GitHub Actions, GitLab CI, Jenkins, or CircleCI.

• Practical experience in operating production EKS environments, covering autoscaling, ingress, secrets management, and cluster upgrades.

• Strong AWS networking knowledge, including multi-account VPC design, subnets, routing, security groups, NACLs, Route 53, ACM, and load balancers.

• In-depth experience with Terraform and GitHub Actions, preferably using OIDC-based cloud authentication.

• Familiarity with Aurora/RDS MySQL, Redis (ElastiCache), and S3, including backups, PITR, migrations, and lifecycle management.

• Solid observability experience with Prometheus, Grafana, and OpenTelemetry.

• Experience in operating Argo CD at scale.

• Proficiency with Infrastructure as Code tools such as Terraform, CloudFormation, or Ansible.

• Experience managing Cloudflare services, including WAF, Bot Management, Rate Limiting, and Zero Trust / Access, along with CloudFront.

• Experience in operating Kafka/MSK at scale, including topics, consumer groups, and schema registries.

• Familiar with Lambda and event-driven architectures.

• Proficient in working with Python, Bash, and Linux systems.

• Strong grasp of security best practices across IAM, KMS, secrets management, networking, and software supply chain security.

• Knowledge of vulnerability scanning and compliance tools.


🏝️ Benefits

• Competitive compensation packages

• Comprehensive health benefits:

• 100% of employee premiums covered

• 75%–80% of dependent premiums covered for most health, dental, and vision plans

• 401(k) plans to assist with retirement planning (no employer matching currently)

• Paid parental leave

• Unlimited PTO

• Flexible remote work from any location

• Up to $200/month co-working reimbursement

• Home office stipend:

• Up to $500 for home office setup

• $100/month for internet, phone, and related expenses

People also viewed

Innovative Solutions2 hours ago

Cloud Engineer – DevOps

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$100k – $160k/year
ApplyView job
Caspar Health2 hours ago

DevSecOps/DevOps Engineer

DE flagGermany OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
IVIX2 hours ago

Deployment Engineer

US flagNew York OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Investigo12 hours ago

Senior Cloud - Kubernetes SRE

GB flagUnited Kingdom OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Software Mind12 hours ago

DevOps Engineer

AR flagArgentina OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Cherokee Federal12 hours ago

DevSecOps Engineer

US flagUnited States OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$125k – $140k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers