
Mid-level Data Engineer – GCP, DBT
Posted May 19

Posted May 19
This is a fully remote position, open to applicants in Brazil.
• Analyze and plan loads and pipelines.
• Evaluate the architecture and requirements of the data warehouse.
• Map out data, transformations, and processes utilizing GCP services (Cloud Storage, BigQuery, Dataproc).
• Establish the data migration strategy (full load, incremental, CDC).
• Create a data architecture plan on GCP.
• Data design and modeling on GCP:
• Develop table schemas in BigQuery, taking into account performance, cost, and scalability.
• Define partitioning and clustering strategies for BigQuery.
• Model data zones in Cloud Storage (Bronze, Silver, Gold).
• Development of ELT/ETL pipelines:
• Construct data transformation routines using Dataproc (Spark) or Dataflow to load data into BigQuery.
• Convert business logic and existing transformations into GCP.
• Implement mechanisms for data validation and quality assurance.
• Performance and cost optimization:
• Optimize BigQuery queries to minimize costs and enhance performance.
• Fine-tune and optimize Spark jobs on Dataproc.
• Monitor and optimize GCP resource usage to manage costs.
• Data security and governance:
• Enforce data security during transit and at rest.
• Define and implement IAM policies to manage access to data and resources.
• Ensure adherence to data governance policies.
• Monitoring and support:
• Diagnose performance and functionality issues within data pipelines and GCP resources.
• Documentation:
• Document the architecture, data pipelines, data models, and operational procedures.
• Communication:
• Effectively communicate with team members, stakeholders, and other departments within the company.
• Ensure clear communication between architectural definitions and software components, supporting the evolution and quality of the team's developments.
• Jira / Agile methodologies:
• Familiarity with agile methodologies, their ceremonies, and proficiency with the Jira tool.
• Demonstrated experience with DBT for at least 3 years.
• Expertise in:
• Models (staging, intermediate, marts)
• ref() and source()
• Macros (Jinja)
• Seeds and snapshots
• Tests (not null, unique, custom)
• Layered organization:
• Staging → Transform → Mart (Data Warehouse)
• Google Cloud Platform (GCP):
• BigQuery: In-depth knowledge in data modeling, query optimization, partitioning, clustering, data loading (streaming and batch), security, and data governance.
• Cloud Storage: Experience in managing buckets, storage classes, lifecycle policies, access control (IAM), and data security.
• Dataproc: Proficient in provisioning, configuring, and managing Spark/Hadoop clusters, optimizing jobs, and integrating with other GCP services.
• Dataflow/Composer/DBT: Familiarity with orchestration and data-processing tools for ELT/ETL pipelines.
• Cloud IAM (Identity and Access Management): Experience in implementing security policies and fine-grained access control.
• VPC, Networking, and Security: Understanding of networks, subnets, firewall rules, and cloud security best practices.
• Programming languages:
• Python and PySpark: Essential for automation scripts, constructing data pipelines, and integrating with GCP APIs.
• SQL (advanced): Required for BigQuery, DBT, and data transformations.
• Shell scripting: For task automation.
• Version control:
• Git/GitHub/Bitbucket.
• 🏥 Porto Seguro Health Insurance
• 🦷 Porto Seguro Dental Insurance
• 💰 Profit Sharing (PLR)
• 👶 Childcare Allowance
• 🍽️ Alelo Food and Meal Vouchers
• 💻 Home Office Allowance
• 📚 Partnerships with Educational Institutions
• 🚀 Support for Certifications, including Cloud
• 🎁 Livelo Points
• 🏋️♂️ TotalPass
• 🧘♂️ Mindself
Aimpoint Digital
Power Digital Marketing
Get handpicked remote jobs straight to your inbox weekly.