
Mid-Level Data Engineer
Posted 5 days ago

Posted 5 days ago
This is a fully remote position, open to applicants anywhere in the world.
• Pipeline Development: Create, construct, test, and sustain scalable data pipelines (both batch and streaming) along with ETL/ELT processes.
• AI Infrastructure: Develop and oversee data pipelines that emphasize the Machine Learning lifecycle, incorporating both structured and unstructured data.
• Quality and Governance: Maintain data quality, integrity, and security by implementing governance and data curation practices for use in predictive models and large language models (LLMs).
• Performance Optimization: Track data flow performance and refine complex queries to minimize costs and processing time.
• Collaboration with AI Teams: Collaborate closely with Data Scientists and Machine Learning Engineers to comprehend requirements and facilitate large-scale data utilization.
• 2–4 years of demonstrated experience as a Data Engineer.
• Proficient in SQL (modeling, optimization, and processing) and Python (data manipulation using Pandas, PySpark, etc.).
• Practical experience with cloud platforms (AWS, GCP, or Azure) and Data Warehouse services (BigQuery, Redshift, or Snowflake).
• Hands-on experience structuring unstructured data (text, PDFs, images) and integrating with vector databases (such as Pinecone, Milvus, Chroma, pgvector, or Weaviate) to support semantic search and RAG (Retrieval-Augmented Generation) systems.
• Familiarity with workflow orchestrators (preferably Apache Airflow).
• Understanding of relational and NoSQL databases.
• Experience working with APIs and integrating various systems.
• Knowledge of natural language processing (NLP) concepts and embeddings.
• Assertive Communication: Capable of interacting with both business and technical teams, clearly explaining technological limitations and opportunities to non-technical stakeholders.
• Critical Thinking and Business Awareness: Focused on identifying root causes of structural issues and prioritizing tasks that provide the highest value and cost efficiency for the company.
• Proactivity/Autonomy and Ownership: Take responsibility for pipelines, anticipate failures, proactively suggest enhancements, and document architectural decisions.
• Collaborative Spirit: Empathetic towards the needs of data consumers and willing to share knowledge with the team.
• Adaptability: Resilient in managing scope changes, new data sources, or technology advancements while maintaining a focus on delivery.
• Care for your health: Medical plan, Dental plan, Telemedicine, and Life Insurance.
• Customizable multi-benefit program (Flash).
• Rest is essential: Paid time off.
• Celebrate your day: Day off on your birthday!
• We offer Gympass to promote a healthy routine.
• Autonomy and flexibility.
• Workplace exercise and Quality of Life initiatives.
• Training and development program, Academia X.
• Start your self-awareness journey: Profiler and behavioral mapping.
Persona
NVIDIA
Get handpicked remote jobs straight to your inbox weekly.