Senior ETL Developer

Job Description

  • Contractor
  • Anywhere

About the job
Client: Government

Type: Contract

Role: Senior ETL Developer

Term: 1 Year + extension option

Pay Rate: $100.00

Start date: 2-3 weeks

Location: Toronto / Hybrid

Req ID: RQ08659

 

Follow Us on LinkedIn (if you aren’t already!)

 

Requirements: What you’ll need

Skills, Knowledge, Experience, and Qualifications:

 

General Responsibilities

This role is responsible for designing, developing, maintaining, and optimizing ETL (Extract, Transform, Load) processes in Databricks for data warehousing, data lakes, and analytics. The developer will work closely with data architects and business teams to ensure the efficient transformation and movement of data to meet business needs, including handling Change Data Capture (CDC) and streaming data.

 

Tools used are:

Azure Databricks, Delta Lake, Delta Live Tables, and Spark to process structured and unstructured data.-Azure Databricks/PySpark (good Python/PySpark knowledge required) to build transformations of raw data into curated zone in the data lake.-Azure Databricks/PySpark/SQL (good SQL knowledge required) to develop and/or troubleshoot transformations of curated data into FHIR.

Experience:

Experience of 7+ years of working with SQL Server, T-SQL, Oracle, PL/SQL development or similar relational databasesExperience of 2+ years of working with Azure Data Factory, Databricks and Python developmentExperience building data ingestion and change data capture using Oracle Golden GateExperience in designing, developing, and implementing ETL pipelines using Databricks and related tools to ingest, transform, and store large-scale datasetsExperience in leveraging Databricks, Delta Lake, Delta Live Tables, and Spark to process structured and unstructured data.Experience working with building databases, data warehouses and working with delta and full loadsExperience on Data modeling, and tools – e.g. SAP Power Designer, Visio, or similarExperience working with SQL Server SSIS or other ETL tools, solid knowledge and experience with SQL scriptingExperience developing in an Agile environmentUnderstanding data warehouse architecture with a delta lakeAbility to analyze, design, develop, test and document ETL pipelines from detailed and high-level specifications, and assist in troubleshooting.Ability to utilize SQL to perform DDL tasks and complex queriesGood knowledge of database performance optimization techniquesAbility to assist in the requirements analysis and subsequent developmentsAbility to conduct unit testing and assist in test preparations to ensure data integrityWork closely with Designers, Business Analysts and other DevelopersLiaise with Project Managers, Quality Assurance Analysts and Business Intelligence ConsultantsDesign and implement technical enhancements of Data Warehouse as required.

 

 

Development, Database and ETL experience (60 points)

 

Experience in developing and managing ETL pipelines, jobs, and workflows in Databricks.Deep understanding of Delta Lake for building data lakes and managing ACID transactions, schema evolution, and data versioning.Experience automating ETL pipelines using Delta Live Tables, including handling Change Data Capture (CDC) for incremental data loads.Proficient in structuring data pipelines with the Medallion Architecture to scale data pipelines and ensure data quality.Hands-on experience developing streaming tables in Databricks using Structured Streaming and readStream to handle real-time data.Expertise in integrating CDC tools like GoldenGate or Debezium for processing incremental updates and managing real-time data ingestion.Experience using Unity Catalog to manage data governance, access control, and ensure compliance.Skilled in managing clusters, jobs, autoscaling, monitoring, and performance optimization in Databricks environments.Knowledge of using Databricks Autoloader for efficient batch and real-time data ingestion.Experience with data governance best practices, including implementing security policies, access control, and auditing with Unity Catalog.Proficient in creating and managing Databricks Workflows to orchestrate job dependencies and schedule tasks.Strong knowledge of Python, PySpark, and SQL for data manipulation and transformation.Experience integrating Databricks with cloud storage solutions such as Azure Blob Storage, AWS S3, or Google Cloud Storage.Familiarity with external orchestration tools like Azure Data FactoryImplementing logical and physical data modelsKnowledge of FHIR is an asset

Design Documentation and Analysis Skills

Demonstrated experience in creating design documentation such as:Schema definitionsError handling and loggingETL Process DocumentationJob Scheduling and Dependency ManagementData Quality and Validation ChecksPerformance Optimization and Scalability PlansTroubleshooting GuidesData LineageSecurity and Access Control Policies applied within ETLExperience in Fit-Gap analysis, system use case reviews, requirements reviews, coding exercises and reviews.Participate in defect fixing, testing support and development activities for ETLAnalyze and document solution complexity and interdependencies including providing support for data validation.Strong analytical skills for troubleshooting, problem-solving, and ensuring data quality.

Certifications (nice to have)

Certified in one or more of the following certifications:Databricks Certified Data Engineer AssociateDatabricks Certified Professional Data EngineerMicrosoft Certified: Azure Data Engineer AssociateAWS Certified Data Analytics – SpecialtyGoogle Cloud Professional Data Engineer

 

 

Must Have Skills

7+ years using ETL tools such as Microsoft SSIS, stored procedures, T-SQL2+ Delta Lake, Databricks and Azure Databricks pipelinesStrong knowledge of Delta Lake for data management and optimization.Familiarity with Databricks Workflows for scheduling and orchestrating tasks.2+ years Python and PySparkSolid understanding of the Medallion Architecture (Bronze, Silver, Gold) and experience implementing it in production environments.Hands-on experience with CDC tools (e.g., GoldenGate) for managing real-time data.SQL Server, Oracle

 

 

PREFERRED SKILLS

 

ETL SSIS + Python + SQL + CDC Tools + Delta Lake + Databricks