Site Reliability Engineer – Lead

April 16, 2025

Job Description

  • Contractor
  • Anywhere

Title :  Site Reliability Engineer – Lead
Type: Onsite ( Weekly 5 days office )
Location: Toronto, ON

Must Haves:
Proven experience in Technical project management within an SRE, DevOps or infrastructure focused environment. Lead and manage SRE projects, ensuring they are delivered on time, within scope, and on budget.
Ensure high customer connect while building processes for all relevant team members to engage with the customer.
Work closely with cross functional teams to plan, design, and implement reliability improvements and automation initiatives.
Collaborate with stakeholders to define, measure, and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
Facilitate post incident reviews (PIRs), ensuring action items are identified and followed through.
Drive initiatives to automate manual tasks and improve system observability and monitoring. Facilitate knowledge sharing across teams to ensure best practices are followed and operational knowledge is captured.
Experience with cloud platforms (AWS or GCP) and containerization (Docker, Kubernetes).
Skilled with Linux and Python/Shell scripting.
Proficient in Kubernetes clusters maintenance, managing and debugging containerized applications (Golang, Java, Python).
Understanding of Kafka, Spark, Storm, Cassandra, ElasticSearch, PostgreSQL, Redis (Elasticache), Zookeeper, Nginx, AWS S3/GCP GS.
Relavent knowledge of infrastructure as code software (e.g. Terraform, CloudFormation).
Experience in continuous integration practices & tools (Jenkins, Travis CI, CircleCI, etc)
Experience with monitoring solutions such as: (CloudWatch, Stackdriver, Prometheus, Grafana)

 

If you’re interested kindly attach your updated resume over here… Email : gayathri.durvasula@techdoquest.com