This is a fully remote position, open to applicants in Brazil.

• Develop batch and streaming data ingestion pipelines

• Design and organize data lakes and data warehouses

• Create datasets that are optimized for machine learning (ML)

• Implement embedding pipelines

• Construct vector indexing for Retrieval-Augmented Generation (RAG)

• Ensure data quality, governance, and security measures

• Optimize costs related to storage and processing

• Collaborate with AI Engineers to design feature stores

• Proficiency in Python

• Advanced knowledge of SQL

• Familiarity with Scala (optional)

• Experience with Apache Airflow

• Knowledge of dbt

• Experience with Prefect

• Proficiency in Spark

• Experience with Pandas

• Knowledge of PySpark

• Experience with data lakes (S3, GCS, Azure Blob)

• Experience with data warehouses (BigQuery, Snowflake, Redshift)

• Familiarity with NoSQL databases

• Knowledge of vector databases (Pinecone, Weaviate, FAISS)

• Experience with Kafka

• Familiarity with Pub/Sub

• Proficiency in Docker

• Knowledge of Kubernetes

• Experience with cloud platforms (AWS, GCP, or Azure)

• Access to a training and development program

• Opportunities for community and social initiatives that foster development

• Commitment to Diversity, Respect, and Ethics

Data Engineer

People also viewed