
Senior Data Engineer – AWS, RAG Pipelines
Posted 2 hours ago

Posted 2 hours ago
This is a fully remote position, open to applicants in Colombia.
• Design and manage the cloud data infrastructure that supports AI projects.
• Create production-grade data lakes on AWS.
• Develop real-time data ingestion and monitoring pipelines.
• Take ownership of the vector search and embedding layers that support RAG systems and autonomous agents.
• Overall Experience: 7+ years in Data Engineering, Distributed Systems, or Data Architecture.
• AWS & Infrastructure: 4+ years in designing production-scale data lakes, storage tiers, and event streaming.
• AI/LLM Pipelines: 2+ years of experience in building RAG systems, managing embeddings, and orchestrating foundational models.
• Proficiency in AWS Data Lake Architecture & Storage.
• Proficiency in Real-Time Observability & Log Analytics.
• Proficiency in Elasticsearch & OpenSearch Optimization, Vectorization, and Embeddings.
• Proficiency in Amazon Bedrock & Generative AI Pipelines.
• Proficiency in Software Engineering & API Ingestion.
• Production-level proficiency in one or more of the following: C# (.NET Core), Java, Python, or Node.js.
• Familiarity with AWS S3 partitioning strategies, lifecycle policies, and columnar formats (Parquet, Iceberg).
• Experience with AWS Glue Data Catalog and Lake Formation for fine-grained, multi-tenant access control.
• Expertise in query optimization over petabyte-scale datasets using Amazon Athena and Redshift Spectrum.
• Configuration of distributed oTel collectors for log, trace, and metrics capture and routing into S3.
• High-volume streaming of system logs, Datadog captures, and raw server events into S3.
• Real-time Change Data Capture (CDC) from PostgreSQL using Debezium or AWS DMS.
• Management of Amazon OpenSearch clusters that enable simultaneous lexical and high-dimensional vector search.
• Knowledge in OpenSearch index lifecycle management, sharding strategies, and dynamic mappings at scale.
• Familiarity with Amazon Bedrock foundational model APIs (Claude, Titan) for tasks such as data enrichment, classification, and semantic parsing.
• Understanding of Knowledge Bases for Amazon Bedrock for automatic chunking, metadata extraction, and syncing vector indexes from S3.
• Experience with ETL/ELT pipelines for ingesting unstructured event data from SaaS APIs (e.g., Pendo, Hotjar, Google Analytics).
• Development of MCP servers to provide data lake context and utilities for AI agents.
• Flexible remote work options.
• 13 floating holidays.
• 15 vacation days per year upon completion.
• Positive working environment.
Future Processing
Codvo.ai
Guild Mortgage
Persona
Get handpicked remote jobs straight to your inbox weekly.