Remotery

Data Engineer, Databricks

Posted May 9

This is a fully remote position, open to applicants in Mexico.

📋 Description

• You will play a crucial role in designing and implementing modern data architectures with high quality, driving analytical solutions based on Big Data technologies.

• You will design, maintain, and optimize parallel processing systems, applying best practices for storage and management in data warehouses, data lakes, and lakehouses.

• You will collect, process, clean, and orchestrate large volumes of data, understanding both structured and semi-structured models.

• You will define the optimal strategy according to business objectives and technical requirements, transforming complex problems into achievable solutions that help our clients make data-driven decisions.

• You will integrate into the project, participating in its sprints and executing development activities while consistently applying the best data practices and the technologies we implement.

• You will identify requirements and define the scope, participating in sprint planning and engineering sessions with a consultant mindset that adds extra value.

• You will classify and estimate activities using agile methodologies (epics, features, technical/user stories) and provide daily follow-up to maintain sprint momentum.

• You will meet committed delivery dates and manage risks by communicating deviations in a timely manner.


⛳️ Requirements

• Experience in: Query and Programming Languages: T-SQL / Spark SQL: DDL and DML, intermediate and advanced queries (subqueries, CTEs, multiple joins with business rules), grouping and aggregation (GROUP BY, window functions, business metrics), stored procedures for ETL/ELT, index optimization, statistics, and execution plans for massive processes.

• Python (PySpark): Object-oriented programming (classes, modules), management of structures and data types (variables, lists, tuples, dictionaries), control flow through conditionals and loops, ingestion of structured and semi-structured data, development of DataFrames and UDFs, temporal windows and partitioning for optimization, best coding practices (PEP8, modularity).

• Databricks: Apache Spark & DataFrame API: Design of pipelines that leverage the DataFrame API for massive transformations; use of declarative functions and vectorized expressions.

• Delta Lake: Management of Delta tables with ACID transactions, time travel for auditing, and partition pruning for efficient reads within the medallion architecture.

• Autoloader & Data Ingestion: Configuration of incremental ingestion in OneLake or ADLS Gen2 with Auto Loader, capturing schema changes (schema evolution) and checkpointing to ensure exactly-once delivery without additional code.

• Structured Streaming: Orchestration of real-time streams with event-time and processing-time triggers, watermarking, and stateful operations for low latency and fault tolerance.

• Delta Live Tables (DLT): Declaration of ETL/ELT pipelines in SQL or Python with integrated data quality (Expectations), automatic dependency management, and continuous monitoring.

• Performance Optimization: Caching techniques, broadcast joins, shuffle optimizations, and use of columnar formats (Parquet/Delta) with Z-Ordering and OPTIMIZE to reduce processing times.

• Lakehouse Federation: Unified querying over external sources through Unity Catalog; Jobs & Workflows: Creation of multi-stage pipelines with dependencies, automatic retries, scheduling or data arrival; integration with Azure Data Factory if required.

• Repos & CI/CD: Versioning of notebooks and scripts in GitHub/Azure DevOps, configuration of validation pipelines (unit and schema) and automated deployment in dev-test-prod environments.

• Monitoring and Observability: Alerts through job notifications in Workflows for events such as failures and automated proactive alert generation.

• It would be an advantage to have general knowledge in Azure Data Factory!


🏝️ Benefits

• WELLNESS: We will promote your overall well-being through personal, professional, and financial balance. Our legal and additional benefits will assist you in achieving this.

• LET'S RELEASE YOUR POWER: You will have the opportunity to specialize comprehensively in different areas and technologies, thus achieving interdisciplinary development.

• WE CREATE NEW THINGS: We like to think outside the box. You will have the space, trust, and freedom to create, along with the necessary training to achieve it.

• WE GROW TOGETHER: You will participate in cutting-edge technological projects, multinational collaborations, and work with foreign teams.

People also viewed

Agiloft1 hour ago

AI Data Platform Lead

CA flagCanada OnlyFull-timeData Engineer
ApplyView job
Oscilar1 hour ago

Data Engineer

BR flagBrazil OnlyFull-timeData Engineer
ApplyView job
HubSpot1 hour ago

Senior Product Manager, Events Data Platform

US flagUnited States OnlyFull-timeData Engineer$140k – $175k/year
ApplyView job
Prima2 hours ago

Technical Product Manager – Data Platform

IT flagItaly OnlyFull-timeData Engineer
ApplyView job
Newfire Global Partners2 hours ago

Senior Director, Clinical Data Engineering

US flagMassachusetts OnlyFull-timeData Engineer$229k – $280k/year
ApplyView job
Latino Legends2 hours ago

Senior Data Engineer

AR flagArgentina OnlyFull-timeData Engineer$6,000 – $8,500/month
ApplyView job

Never miss a great job!

Get handpicked remote jobs straight to your inbox weekly.

Trusted by 7,400+ designers