Remotery

Senior Platform Engineer – AI Agent Infrastructure

Posted May 19

This is a fully remote position, open to applicants in Argentina.

📋 Description

• Creating event-driven communication strategies.

• Enhancing the reliability of streaming services.

• Developing observability tools for the platform.

• Leading architectural decision-making processes.

• Managing cloud infrastructure and automation using Infrastructure as Code (IaC).

• Establishing systems for monitoring, tracing, and alerting.


⛳️ Requirements

• Expertise in event-driven architecture and messaging systems — you’ve designed solutions utilizing message queues (Kafka, NATS, RabbitMQ, or similar). You possess a strong understanding of at-least-once delivery, consumer groups, dead letters, backpressure, and have ideally transitioned a system from synchronous to asynchronous messaging.

• Proficient in AWS — extensive experience with EC2, VPC, IAM, S3, RDS. You have a solid grasp of networking principles since inter-service communication operates over internal VPC.

• Database skills — strong knowledge of both SQL (PostgreSQL) and NoSQL (MongoDB, Redis). You know when to apply each type, as well as indexing strategies, replication, and performance optimization techniques.

• Docker — familiarity with container lifecycle management, resource limitations, health checks, bind mounts, and multi-stage builds.

• Experience in debugging distributed systems — you’ve troubleshot asynchronous flows and cascading failures across production services, and can articulate what went wrong and how you resolved it.

• Infrastructure as Code — proficiency with Terraform or Pulumi. You advocate for infrastructure changes to be reviewed in pull requests rather than through console clicks.

• Observability expertise — fluency in Datadog or equivalent tools (dashboards, monitors, APM, log pipelines, distributed tracing).

• Familiarity with a tech stack including Go, AWS (EC2, S3, VPC, RDS PostgreSQL), Docker, PostgreSQL, MongoDB, Redis, and Datadog.

• Experience with AI / MLOps infrastructure — managing AI workloads in production (model serving, LLM inference, GPU/resource management, agent evaluation, and tools like LangFuse, LangSmith, Braintrust, MLflow).

• Knowledge of multi-tenant container platforms — experience with services that run customer/user workloads in containers (Replit, Railway, Fly.io, or internal PaaS systems).

• Kubernetes — you have successfully migrated from "Docker on bare EC2" to Kubernetes at least once and are aware of potential issues that can arise during this transition.

• Experience with data pipelines and orchestration — familiarity with tools such as Airflow, Prefect, or similar. Knowledge of data warehouses (Databricks, Snowflake, BigQuery) is a plus.


🏝️ Benefits

• Competitive Compensation.

• Remote Work – You can work from anywhere!

• Home Office Bonus – A one-time allowance to assist you in creating your perfect home office setup.

• Provision of Work Equipment.

• Stock Options.

• Comprehensive Health Plan available wherever you are.

• Flexible Days Off.

• Opportunities for Language, Professional, and Personal Growth courses.

People also viewed

MAINSOFT1 hour ago

Ingeniero de plataforma de integración

CO flagColombia OnlyFull-timePlatform Engineer
ApplyView job
World Vision2 hours ago

Power Platform Developer

CR flagCosta Rica OnlyFull-timePlatform Engineer
ApplyView job
Block Labs2 hours ago

Data Platform Engineer

PT flagPortugal OnlyFull-timePlatform Engineer
ApplyView job
Attio2 days ago

Senior Platform Engineer

PL flagPoland OnlyFull-timePlatform Engineer€95k – €125k/year
ApplyView job
Devoteam3 days ago

AWS Platform Engineer

PT flagPortugal OnlyFull-timePlatform Engineer
ApplyView job
TechBiz Global6 days ago

Platform Engineer

CH flagSwitzerland OnlyFull-timePlatform Engineer
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers