Databricks Architect – GenAI

Job Description

  • Contractor
  • Anywhere

About the job
Databricks Lead / Data Architect (Hands-On)

Greater Toronto Area, Ontario

Contract

12+ yrs data engineering; 4+ yrs Databricks

 

 

About the Role

We are hiring a senior, deeply hands-on Databricks Lead / Data Architect to drive the Databricks workstream of a large-scale data and AI modernization program for a major Canadian enterprise retail client. This is a build-and-lead role: you will own the technical direction of Databricks-based solutions end to end — architecture, lakehouse design, data engineering, migration of legacy ETL workloads, and production operations — while remaining personally hands-on in code and design.

You will work side by side with the client’s VP of Data & AI and the AVP of Data Platforms & Integration and their teams, acting as the senior technical authority who turns strategy into delivered, production-grade outcomes. The immediate focus is modernizing a large on-premise ETL estate (IBM DataStage) to an Azure-native lakehouse on Azure Data Factory and Databricks, and then scaling the platform to power enterprise analytics and AI use cases.

Key Responsibilities

Architect the lakehouse: Design and own scalable, secure Databricks Lakehouse architecture on Azure (Delta Lake, Unity Catalog, medallion bronze/silver/gold, ADLS Gen2) aligned to enterprise standards.
•Stay hands-on: Personally build and review PySpark / Spark SQL pipelines, Delta Live Tables, notebooks, and orchestration — setting the engineering bar, not just directing it.
Lead legacy migration: Drive conversion of complex legacy ETL (DataStage) workloads to Databricks/PySpark and ADF, including patterns, accelerators, and reusable frameworks for code conversion and validation.
Own performance & cost: Optimize cluster configuration, job performance, partitioning, and cost; establish FinOps and right-sizing practices on Databricks.
Embed governance: Implement data governance, lineage, quality, and access control through Unity Catalog and Purview; ensure security, privacy, and compliance by design.
Enable analytics & AI: Design Gold-layer semantic models and feature pipelines that serve BI (Power BI), advanced analytics, and ML/GenAI use cases (MLflow, Azure ML).
Lead the squad: Provide technical leadership and mentoring to data engineers; define best practices, coding standards, CI/CD (Azure DevOps), and review processes.
Partner with the client: Work closely with the client’s VP (Data & AI), AVP (Data Platforms & Integration), platform architects, and business stakeholders to translate requirements into delivery roadmaps and measurable outcomes.
Required Qualifications (Must-Have)

12+ years in data engineering / data platform architecture, with 4+ years of deep, hands-on Databricks delivery.
Expert-level Databricks: Spark (PySpark & Spark SQL), Delta Lake, Delta Live Tables, Unity Catalog, Workflows, performance tuning, and cluster/cost optimization.
Strong Azure data stack: Azure Data Factory, ADLS Gen2, Azure Key Vault, Azure DevOps (CI/CD), and Azure networking/security fundamentals.
Proven migration track record: Led at least one large-scale migration from legacy ETL (e.g., DataStage, Informatica, Teradata) to a cloud lakehouse, including complex transformation logic.
Lakehouse design depth: Medallion architecture, dimensional & semantic modelling, SCD handling, surrogate keys, and data quality / reconciliation frameworks.
Engineering rigor: CI/CD, version control (Git), automated testing/validation, observability, and production support of mission-critical pipelines.
Leadership with hands-on credibility: Demonstrated ability to lead engineers and engage senior client stakeholders while still contributing code and designs directly.
Preferred / Nice-to-Have

Databricks certifications (e.g., Databricks Certified Data Engineer Professional / Solutions Architect) and relevant Azure certifications (DP-203, AZ-305).
Experience in retail, supply chain, merchandising, or financial-services data domains.
Familiarity with IBM DataStage, DB2, Oracle, and legacy on-prem ETL estates.
Exposure to agentic AI / GenAI patterns, MLOps/LLMOps, and AI-assisted code migration tooling.
Experience operating a warm-standby DR and high-availability data platform.
What Success Looks Like in the First 6 Months
A clear, agreed Databricks lakehouse target architecture and migration blueprint in production use.
Complex legacy workloads converted and validated on Databricks/PySpark with automated reconciliation.
Reusable migration accelerators, standards, and CI/CD established and adopted by the engineering squad.
Trusted advisor relationship with the client’s Data & AI leadership, delivering measurable performance, cost, and time-to-value gains.
How to Apply

Submit your CV along with a brief summary of your most significant Databricks lakehouse and legacy-migration projects (scope, your hands-on role, technologies, and measurable outcomes). Shortlisted candidates will complete a technical discussion and a hands-on Databricks/PySpark assessment.

 

“We are an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.”