Remotery

Senior Data Engineer – AWS, RAG Pipelines

Posted 2 hours ago

This is a fully remote position, open to applicants in Colombia.

📋 Description

• Design and manage the cloud data infrastructure that supports AI projects.

• Create production-grade data lakes on AWS.

• Develop real-time data ingestion and monitoring pipelines.

• Take ownership of the vector search and embedding layers that support RAG systems and autonomous agents.


⛳️ Requirements

• Overall Experience: 7+ years in Data Engineering, Distributed Systems, or Data Architecture.

• AWS & Infrastructure: 4+ years in designing production-scale data lakes, storage tiers, and event streaming.

• AI/LLM Pipelines: 2+ years of experience in building RAG systems, managing embeddings, and orchestrating foundational models.

• Proficiency in AWS Data Lake Architecture & Storage.

• Proficiency in Real-Time Observability & Log Analytics.

• Proficiency in Elasticsearch & OpenSearch Optimization, Vectorization, and Embeddings.

• Proficiency in Amazon Bedrock & Generative AI Pipelines.

• Proficiency in Software Engineering & API Ingestion.

• Production-level proficiency in one or more of the following: C# (.NET Core), Java, Python, or Node.js.

• Familiarity with AWS S3 partitioning strategies, lifecycle policies, and columnar formats (Parquet, Iceberg).

• Experience with AWS Glue Data Catalog and Lake Formation for fine-grained, multi-tenant access control.

• Expertise in query optimization over petabyte-scale datasets using Amazon Athena and Redshift Spectrum.

• Configuration of distributed oTel collectors for log, trace, and metrics capture and routing into S3.

• High-volume streaming of system logs, Datadog captures, and raw server events into S3.

• Real-time Change Data Capture (CDC) from PostgreSQL using Debezium or AWS DMS.

• Management of Amazon OpenSearch clusters that enable simultaneous lexical and high-dimensional vector search.

• Knowledge in OpenSearch index lifecycle management, sharding strategies, and dynamic mappings at scale.

• Familiarity with Amazon Bedrock foundational model APIs (Claude, Titan) for tasks such as data enrichment, classification, and semantic parsing.

• Understanding of Knowledge Bases for Amazon Bedrock for automatic chunking, metadata extraction, and syncing vector indexes from S3.

• Experience with ETL/ELT pipelines for ingesting unstructured event data from SaaS APIs (e.g., Pendo, Hotjar, Google Analytics).

• Development of MCP servers to provide data lake context and utilities for AI agents.


🏝️ Benefits

• Flexible remote work options.

• 13 floating holidays.

• 15 vacation days per year upon completion.

• Positive working environment.

People also viewed

Future Processing1 hour ago

Senior Cloud Data Engineer, GCP

PL flagPoland OnlyFull-timeData Engineer
ApplyView job
Codvo.ai1 hour ago

Senior Data Engineer – Full Stack

US flagUnited States OnlyFull-timeData Engineer
ApplyView job
Guild Mortgage2 hours ago

Senior DataOps Engineer

US flagUnited States OnlyFull-timeData Engineer$94.9k – $136.1k/year
ApplyView job
Persona2 hours ago

Software Engineer, Data Products

US flagCalifornia OnlyFull-timeData Engineer$130k – $220k/year
ApplyView job
Compass2 hours ago

Data Engineer, Databricks – Senior

BR flagBrazil OnlyFull-timeData Engineer
ApplyView job
Dutch Bros Coffee2 hours ago

Lead Engineer, Data Platforms

US flagAlabama, +3 more statesFull-timeData Engineer
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers