Remotery

Senior DevOps Engineer – AWS, AI Infrastructure

Posted 6 days ago

This is a fully remote position, open to applicants in Argentina.

📋 Description

• Provision and set up a dedicated VPC along with a segmented cloud environment on AWS.

• Establish the foundational CI/CD pipeline and oversee its maintenance and evolution throughout all delivery stages.

• Configure and manage the vector store infrastructure (OpenSearch/Pinecone on AWS).

• Implement and manage the observability stack, including CloudWatch, X-Ray, alerting thresholds, and monitoring specific to LLM.

• Execute infrastructure-as-code for all environments (development, staging, production) utilizing Terraform or CDK.

• Oversee secrets management, KMS encryption key configuration, and tenant-scoped access controls.

• Set up connectivity with LLM providers (OpenAI / Anthropic / Amazon Bedrock enterprise tier, zero-data-retention).

• Develop and execute an environment promotion strategy in line with the 2-week sprint cadence.

• Assist with the infrastructure requirements for the incremental ingestion pipeline and manage nightly scheduling.


⛳️ Requirements

• Over 6 years of experience in DevOps or cloud infrastructure engineering, with a strong emphasis on AWS.

• Proficiency in infrastructure-as-code tools: Terraform, CloudFormation, or AWS CDK.

• Familiarity with CI/CD tools: GitHub Actions, AWS CodePipeline, or similar.

• Knowledge of core AWS services: VPC, ECS, Lambda, S3, DynamoDB, API Gateway, Cognito, CloudWatch, X-Ray.

• Proven experience in designing and managing multi-tenant cloud environments with tenant-level data isolation.

• AI experience is required and not optional.

• Experience in configuring and managing vector store infrastructure (OpenSearch, Pinecone, Weaviate, or equivalent) in a production setting.

• Understanding of LLM provider APIs (OpenAI, Anthropic, or Amazon Bedrock) in a production or enterprise configuration, including zero-data-retention tier setup.

• Comprehension of AI-specific observability issues: token usage monitoring, latency profiling for LLM calls, and model response logging.


🏝️ Benefits

• An excellent work environment certified by Great Place To Work.

• Opportunities for professional development.

People also viewed

Advanced Solutions International, Inc.10 hours ago

DevOps Reliability Engineer

AU flagAustralia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)$90k – $110k/year
ApplyView job
Stone10 hours ago

Senior Site Reliability Engineer – Network

BR flagBrazil OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Replit1 day ago

Staff Site Reliability Engineer

EuropeFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Soum1 day ago

DevOps Engineer, Mid Level

EG flagEgypt OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Lakeside Software1 day ago

DevOps Engineer, Azure

IN flagIndia OnlyFull-timeDevOps & Site Reliability Engineer (SRE)
ApplyView job
Interval Group1 day ago

DevOps Engineer, mk8s

DE flagGermany OnlyFreelanceDevOps & Site Reliability Engineer (SRE)
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers