
Data Engineer, Databricks
Posted May 9

Posted May 9
This is a fully remote position, open to applicants in Mexico.
• You will play a crucial role in designing and implementing modern data architectures with high quality, driving analytical solutions based on Big Data technologies.
• You will design, maintain, and optimize parallel processing systems, applying best practices for storage and management in data warehouses, data lakes, and lakehouses.
• You will collect, process, clean, and orchestrate large volumes of data, understanding both structured and semi-structured models.
• You will define the optimal strategy according to business objectives and technical requirements, transforming complex problems into achievable solutions that help our clients make data-driven decisions.
• You will integrate into the project, participating in its sprints and executing development activities while consistently applying the best data practices and the technologies we implement.
• You will identify requirements and define the scope, participating in sprint planning and engineering sessions with a consultant mindset that adds extra value.
• You will classify and estimate activities using agile methodologies (epics, features, technical/user stories) and provide daily follow-up to maintain sprint momentum.
• You will meet committed delivery dates and manage risks by communicating deviations in a timely manner.
• Experience in: Query and Programming Languages: T-SQL / Spark SQL: DDL and DML, intermediate and advanced queries (subqueries, CTEs, multiple joins with business rules), grouping and aggregation (GROUP BY, window functions, business metrics), stored procedures for ETL/ELT, index optimization, statistics, and execution plans for massive processes.
• Python (PySpark): Object-oriented programming (classes, modules), management of structures and data types (variables, lists, tuples, dictionaries), control flow through conditionals and loops, ingestion of structured and semi-structured data, development of DataFrames and UDFs, temporal windows and partitioning for optimization, best coding practices (PEP8, modularity).
• Databricks: Apache Spark & DataFrame API: Design of pipelines that leverage the DataFrame API for massive transformations; use of declarative functions and vectorized expressions.
• Delta Lake: Management of Delta tables with ACID transactions, time travel for auditing, and partition pruning for efficient reads within the medallion architecture.
• Autoloader & Data Ingestion: Configuration of incremental ingestion in OneLake or ADLS Gen2 with Auto Loader, capturing schema changes (schema evolution) and checkpointing to ensure exactly-once delivery without additional code.
• Structured Streaming: Orchestration of real-time streams with event-time and processing-time triggers, watermarking, and stateful operations for low latency and fault tolerance.
• Delta Live Tables (DLT): Declaration of ETL/ELT pipelines in SQL or Python with integrated data quality (Expectations), automatic dependency management, and continuous monitoring.
• Performance Optimization: Caching techniques, broadcast joins, shuffle optimizations, and use of columnar formats (Parquet/Delta) with Z-Ordering and OPTIMIZE to reduce processing times.
• Lakehouse Federation: Unified querying over external sources through Unity Catalog; Jobs & Workflows: Creation of multi-stage pipelines with dependencies, automatic retries, scheduling or data arrival; integration with Azure Data Factory if required.
• Repos & CI/CD: Versioning of notebooks and scripts in GitHub/Azure DevOps, configuration of validation pipelines (unit and schema) and automated deployment in dev-test-prod environments.
• Monitoring and Observability: Alerts through job notifications in Workflows for events such as failures and automated proactive alert generation.
• It would be an advantage to have general knowledge in Azure Data Factory!
• WELLNESS: We will promote your overall well-being through personal, professional, and financial balance. Our legal and additional benefits will assist you in achieving this.
• LET'S RELEASE YOUR POWER: You will have the opportunity to specialize comprehensively in different areas and technologies, thus achieving interdisciplinary development.
• WE CREATE NEW THINGS: We like to think outside the box. You will have the space, trust, and freedom to create, along with the necessary training to achieve it.
• WE GROW TOGETHER: You will participate in cutting-edge technological projects, multinational collaborations, and work with foreign teams.
HubSpot
Prima
Get handpicked remote jobs straight to your inbox weekly.