Remotery

Senior Data Engineer – Generative AI, Data and MDM

Posted 6 days ago

This is a fully remote position, open to applicants in Brazil.

📋 Description

• Develop, enhance, and maintain scalable data pipelines utilizing Databricks, Spark, and PySpark.

• Engage in the integration, transformation, cleansing, and provisioning of extensive master/customer data.

• Construct and uphold cloud-based Big Data products, ensuring they are scalable, high-performing, reliable, and of high data quality.

• Collaborate within the customer master data domain, advancing initiatives related to Golden Record, data quality, governance, and MDM.

• Design and execute data solutions that empower intelligent systems based on Generative AI, agents, and multi-agent systems.

• Prepare, structure, and provision data for use by Machine Learning models, LLMs, autonomous agents, and intelligent workflows.

• Aid in the development of agents and multi-agent systems capable of analyzing customer master data on a large scale.

• Assist in identifying patterns, inconsistencies, gaps, duplicates, anomalies, and opportunities for enhancing master/customer data.

• Create data mechanisms that generate alerts, provide explainable recommendations, and support decision-making.

• Utilize Analytics and Machine Learning techniques for anomaly detection, classification, clustering, and scoring of master/customer data.

• Support strategies for data qualification, enrichment, prioritization, and governance.

• Leverage Generative AI to assist in the generation, evolution, validation, and explanation of business rules.

• Explore and facilitate the implementation of architectures such as RAG, autonomous agents, multi-agent systems, and intelligent workflows.

• Work with near real-time data processing utilizing Kafka.

• Assist in the productionization of Machine Learning models on Databricks, applying MLOps principles.

• Develop, maintain, and improve CI/CD pipelines using GitHub and GitHub Actions.

• Ensure adherence to development best practices, guaranteeing that solutions meet quality, efficiency, maintainability, and governance standards.

• Optimize the utilization of available data to maximize its business and technological value.

• Collaborate with MDM, Data, Technology, and Business teams to ensure that developed solutions are measurable, governable, and applicable.

• Support technical and functional refinements, ensuring clarity, feasibility, and alignment of stories with project requirements.

• Assist in creating clear, well-defined, and technically feasible user stories.

• Participate in AS-IS and TO-BE design, documenting current and future processes, and identifying technical debt, risks, and opportunities for enhancement.

• Develop refined and approved stories that ensure quality, efficiency, and compliance with technical standards.

• Identify technical debt and suggest continuous improvements in architecture, processes, data, and solutions.

• Collaborate with data and business focal points to ensure alignment between technical solutions, best practices, and strategic objectives.


⛳️ Requirements

• Extensive experience as a Data Engineer, Senior Data Engineer, or in a similar capacity.

• Strong proficiency in Python for data engineering, automation, data analysis, and supporting AI initiatives.

• Solid understanding of SQL and experience with structured data exploration.

• Experience with Databricks, PySpark, and Spark.

• Proven track record in building, optimizing, and maintaining scalable data pipelines.

• Familiarity with cloud environments, preferably Azure Databricks and GCP.

• Knowledge or experience with Gemini to support GenAI solutions.

• Understanding of Generative AI, LLMs, and data-driven intelligent applications.

• Familiarity with agent architectures, multi-agent systems, and autonomous systems.

• Experience with RAG, intelligent workflows, data-driven recommendations, and cognitive automation.

• Proficiency in applied Machine Learning techniques, including anomaly detection, classification, clustering, and scoring.

• Experience with near real-time data processing, preferably using Kafka.

• Background in productionizing Machine Learning models on Databricks and MLOps practices.

• Experience with CI/CD pipelines, particularly GitHub and GitHub Actions.

• Mastery of data engineering best practices, including versioning, testing, code review, and data quality assurance.

• Knowledge of programming logic, application development, and performance optimization.

• Ability to translate business challenges into data-driven, analytical, and intelligent solutions.

• Experience working in agile squads and familiarity with agile methodologies.


🏝️ Benefits

• Health and dental insurance;

• Meal and grocery allowances;

• Childcare assistance;

• Extended parental leave;

• Partnership with gyms and health & wellness professionals via Wellhub (Gympass) TotalPass;

• Profit Sharing (PLR);

• Life insurance;

• Continuous learning platform (CI&T University);

• Discount club;

• Free online platform dedicated to physical and mental health and wellbeing;

• Pregnancy and responsible parenting course;

• Partnerships with online course platforms;

• Language learning platform;

• And many others.

People also viewed

Akka (formerly Lightbend)10 hours ago

Forward Deployed Engineer

DE flagGermany OnlyFull-timeEngineer
ApplyView job
Swimlane1 day ago

Professional Services Engineer

IN flagIndia OnlyFull-timeEngineer$120k – $160k/year
ApplyView job
ITTConnect1 day ago

Senior Cisco CUCM Engineer

BR flagBrazil OnlyFull-timeEngineer
ApplyView job
Logicalis Spain1 day ago

Ingeniero de Observabilidad IA

ES flagSpain OnlyFull-timeEngineer
ApplyView job
Ohmium2 days ago

Field Services Engineer

HR flagCroatia OnlyFull-timeEngineer
ApplyView job
DeepHealth2 days ago

Technical Services Engineer

NL flagNetherlands OnlyFull-timeEngineer€35k – €50k/year
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers