This is a fully remote position, open to applicants in Brazil.

📋 Description

• Analyze and plan loads and pipelines.

• Evaluate the architecture and requirements of the data warehouse.

• Map out data, transformations, and processes utilizing GCP services (Cloud Storage, BigQuery, Dataproc).

• Establish the data migration strategy (full load, incremental, CDC).

• Create a data architecture plan on GCP.

• Data design and modeling on GCP:

• Develop table schemas in BigQuery, taking into account performance, cost, and scalability.

• Define partitioning and clustering strategies for BigQuery.

• Model data zones in Cloud Storage (Bronze, Silver, Gold).

• Development of ELT/ETL pipelines:

• Construct data transformation routines using Dataproc (Spark) or Dataflow to load data into BigQuery.

• Convert business logic and existing transformations into GCP.

• Implement mechanisms for data validation and quality assurance.

• Performance and cost optimization:

• Optimize BigQuery queries to minimize costs and enhance performance.

• Fine-tune and optimize Spark jobs on Dataproc.

• Monitor and optimize GCP resource usage to manage costs.

• Data security and governance:

• Enforce data security during transit and at rest.

• Define and implement IAM policies to manage access to data and resources.

• Ensure adherence to data governance policies.

• Monitoring and support:

• Diagnose performance and functionality issues within data pipelines and GCP resources.

• Documentation:

• Document the architecture, data pipelines, data models, and operational procedures.

• Communication:

• Effectively communicate with team members, stakeholders, and other departments within the company.

• Ensure clear communication between architectural definitions and software components, supporting the evolution and quality of the team's developments.

• Jira / Agile methodologies:

• Familiarity with agile methodologies, their ceremonies, and proficiency with the Jira tool.

⛳️ Requirements

• Demonstrated experience with DBT for at least 3 years.

• Expertise in:

• Models (staging, intermediate, marts)

• ref() and source()

• Macros (Jinja)

• Seeds and snapshots

• Tests (not null, unique, custom)

• Layered organization:

• Staging → Transform → Mart (Data Warehouse)

• Google Cloud Platform (GCP):

• BigQuery: In-depth knowledge in data modeling, query optimization, partitioning, clustering, data loading (streaming and batch), security, and data governance.

• Cloud Storage: Experience in managing buckets, storage classes, lifecycle policies, access control (IAM), and data security.

• Dataproc: Proficient in provisioning, configuring, and managing Spark/Hadoop clusters, optimizing jobs, and integrating with other GCP services.

• Dataflow/Composer/DBT: Familiarity with orchestration and data-processing tools for ELT/ETL pipelines.

• Cloud IAM (Identity and Access Management): Experience in implementing security policies and fine-grained access control.

• VPC, Networking, and Security: Understanding of networks, subnets, firewall rules, and cloud security best practices.

• Programming languages:

• Python and PySpark: Essential for automation scripts, constructing data pipelines, and integrating with GCP APIs.

• SQL (advanced): Required for BigQuery, DBT, and data transformations.

• Shell scripting: For task automation.

• Version control:

• Git/GitHub/Bitbucket.

🏝️ Benefits

• 🏥 Porto Seguro Health Insurance

• 🦷 Porto Seguro Dental Insurance

• 💰 Profit Sharing (PLR)

• 👶 Childcare Allowance

• 🍽️ Alelo Food and Meal Vouchers

• 💻 Home Office Allowance

• 📚 Partnerships with Educational Institutions

• 🚀 Support for Certifications, including Cloud

• 🎁 Livelo Points

• 🏋️‍♂️ TotalPass

• 🧘‍♂️ Mindself

Mid-level Data Engineer – GCP, DBT

📋 Description

⛳️ Requirements

🏝️ Benefits

People also viewed

Senior Data Engineer

Mid-level Data Engineer

AI Data Engineer

Data Engineer

Data Engineer

Data Engineering Manager

Never miss a great job!