
Senior Data Engineer – Generative AI, Data and MDM
Posted 6 days ago

Posted 6 days ago
This is a fully remote position, open to applicants in Brazil.
• Develop, enhance, and maintain scalable data pipelines utilizing Databricks, Spark, and PySpark.
• Engage in the integration, transformation, cleansing, and provisioning of extensive master/customer data.
• Construct and uphold cloud-based Big Data products, ensuring they are scalable, high-performing, reliable, and of high data quality.
• Collaborate within the customer master data domain, advancing initiatives related to Golden Record, data quality, governance, and MDM.
• Design and execute data solutions that empower intelligent systems based on Generative AI, agents, and multi-agent systems.
• Prepare, structure, and provision data for use by Machine Learning models, LLMs, autonomous agents, and intelligent workflows.
• Aid in the development of agents and multi-agent systems capable of analyzing customer master data on a large scale.
• Assist in identifying patterns, inconsistencies, gaps, duplicates, anomalies, and opportunities for enhancing master/customer data.
• Create data mechanisms that generate alerts, provide explainable recommendations, and support decision-making.
• Utilize Analytics and Machine Learning techniques for anomaly detection, classification, clustering, and scoring of master/customer data.
• Support strategies for data qualification, enrichment, prioritization, and governance.
• Leverage Generative AI to assist in the generation, evolution, validation, and explanation of business rules.
• Explore and facilitate the implementation of architectures such as RAG, autonomous agents, multi-agent systems, and intelligent workflows.
• Work with near real-time data processing utilizing Kafka.
• Assist in the productionization of Machine Learning models on Databricks, applying MLOps principles.
• Develop, maintain, and improve CI/CD pipelines using GitHub and GitHub Actions.
• Ensure adherence to development best practices, guaranteeing that solutions meet quality, efficiency, maintainability, and governance standards.
• Optimize the utilization of available data to maximize its business and technological value.
• Collaborate with MDM, Data, Technology, and Business teams to ensure that developed solutions are measurable, governable, and applicable.
• Support technical and functional refinements, ensuring clarity, feasibility, and alignment of stories with project requirements.
• Assist in creating clear, well-defined, and technically feasible user stories.
• Participate in AS-IS and TO-BE design, documenting current and future processes, and identifying technical debt, risks, and opportunities for enhancement.
• Develop refined and approved stories that ensure quality, efficiency, and compliance with technical standards.
• Identify technical debt and suggest continuous improvements in architecture, processes, data, and solutions.
• Collaborate with data and business focal points to ensure alignment between technical solutions, best practices, and strategic objectives.
• Extensive experience as a Data Engineer, Senior Data Engineer, or in a similar capacity.
• Strong proficiency in Python for data engineering, automation, data analysis, and supporting AI initiatives.
• Solid understanding of SQL and experience with structured data exploration.
• Experience with Databricks, PySpark, and Spark.
• Proven track record in building, optimizing, and maintaining scalable data pipelines.
• Familiarity with cloud environments, preferably Azure Databricks and GCP.
• Knowledge or experience with Gemini to support GenAI solutions.
• Understanding of Generative AI, LLMs, and data-driven intelligent applications.
• Familiarity with agent architectures, multi-agent systems, and autonomous systems.
• Experience with RAG, intelligent workflows, data-driven recommendations, and cognitive automation.
• Proficiency in applied Machine Learning techniques, including anomaly detection, classification, clustering, and scoring.
• Experience with near real-time data processing, preferably using Kafka.
• Background in productionizing Machine Learning models on Databricks and MLOps practices.
• Experience with CI/CD pipelines, particularly GitHub and GitHub Actions.
• Mastery of data engineering best practices, including versioning, testing, code review, and data quality assurance.
• Knowledge of programming logic, application development, and performance optimization.
• Ability to translate business challenges into data-driven, analytical, and intelligent solutions.
• Experience working in agile squads and familiarity with agile methodologies.
• Health and dental insurance;
• Meal and grocery allowances;
• Childcare assistance;
• Extended parental leave;
• Partnership with gyms and health & wellness professionals via Wellhub (Gympass) TotalPass;
• Profit Sharing (PLR);
• Life insurance;
• Continuous learning platform (CI&T University);
• Discount club;
• Free online platform dedicated to physical and mental health and wellbeing;
• Pregnancy and responsible parenting course;
• Partnerships with online course platforms;
• Language learning platform;
• And many others.
Akka (formerly Lightbend)
Swimlane
Get handpicked remote jobs straight to your inbox weekly.