This is a fully remote position, open to applicants in Mexico.

📋 Description

• You will play a crucial role in designing and implementing modern data architectures with high quality, driving analytical solutions based on Big Data technologies.

• You will design, maintain, and optimize parallel processing systems, applying best practices for storage and management in data warehouses, data lakes, and lakehouses.

• You will collect, process, clean, and orchestrate large volumes of data, understanding both structured and semi-structured models.

• You will define the optimal strategy according to business objectives and technical requirements, transforming complex problems into achievable solutions that help our clients make data-driven decisions.

• You will integrate into the project, participating in its sprints and executing development activities while consistently applying the best data practices and the technologies we implement.

• You will identify requirements and define the scope, participating in sprint planning and engineering sessions with a consultant mindset that adds extra value.

• You will classify and estimate activities using agile methodologies (epics, features, technical/user stories) and provide daily follow-up to maintain sprint momentum.

• You will meet committed delivery dates and manage risks by communicating deviations in a timely manner.

⛳️ Requirements

• Experience in: Query and Programming Languages: T-SQL / Spark SQL: DDL and DML, intermediate and advanced queries (subqueries, CTEs, multiple joins with business rules), grouping and aggregation (GROUP BY, window functions, business metrics), stored procedures for ETL/ELT, index optimization, statistics, and execution plans for massive processes.

• Python (PySpark): Object-oriented programming (classes, modules), management of structures and data types (variables, lists, tuples, dictionaries), control flow through conditionals and loops, ingestion of structured and semi-structured data, development of DataFrames and UDFs, temporal windows and partitioning for optimization, best coding practices (PEP8, modularity).

• Databricks: Apache Spark & DataFrame API: Design of pipelines that leverage the DataFrame API for massive transformations; use of declarative functions and vectorized expressions.

• Delta Lake: Management of Delta tables with ACID transactions, time travel for auditing, and partition pruning for efficient reads within the medallion architecture.

• Autoloader & Data Ingestion: Configuration of incremental ingestion in OneLake or ADLS Gen2 with Auto Loader, capturing schema changes (schema evolution) and checkpointing to ensure exactly-once delivery without additional code.

• Structured Streaming: Orchestration of real-time streams with event-time and processing-time triggers, watermarking, and stateful operations for low latency and fault tolerance.

• Delta Live Tables (DLT): Declaration of ETL/ELT pipelines in SQL or Python with integrated data quality (Expectations), automatic dependency management, and continuous monitoring.

• Performance Optimization: Caching techniques, broadcast joins, shuffle optimizations, and use of columnar formats (Parquet/Delta) with Z-Ordering and OPTIMIZE to reduce processing times.

• Lakehouse Federation: Unified querying over external sources through Unity Catalog; Jobs & Workflows: Creation of multi-stage pipelines with dependencies, automatic retries, scheduling or data arrival; integration with Azure Data Factory if required.

• Repos & CI/CD: Versioning of notebooks and scripts in GitHub/Azure DevOps, configuration of validation pipelines (unit and schema) and automated deployment in dev-test-prod environments.

• Monitoring and Observability: Alerts through job notifications in Workflows for events such as failures and automated proactive alert generation.

• It would be an advantage to have general knowledge in Azure Data Factory!

🏝️ Benefits

• WELLNESS: We will promote your overall well-being through personal, professional, and financial balance. Our legal and additional benefits will assist you in achieving this.

• LET'S RELEASE YOUR POWER: You will have the opportunity to specialize comprehensively in different areas and technologies, thus achieving interdisciplinary development.

• WE CREATE NEW THINGS: We like to think outside the box. You will have the space, trust, and freedom to create, along with the necessary training to achieve it.

• WE GROW TOGETHER: You will participate in cutting-edge technological projects, multinational collaborations, and work with foreign teams.

Data Engineer, Databricks

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

AI Data Platform Lead

Data Engineer

Senior Product Manager, Events Data Platform

Technical Product Manager – Data Platform

Senior Director, Clinical Data Engineering

Senior Data Engineer

Never miss a great job!