Site Reliability Engineer

July 31, 2025

Job Description

  • Permanent
  • Anywhere

Hiring: Site Reliability Engineer – Canada (Local) Remote
🕒 Full-time | Permanent

Job Description –

:)Proven experience in Technical project management within an SRE, DevOps or infrastructure focused environment. Lead and manage SRE projects, ensuring they are delivered on time, within scope, and on budget.
:)Ensure high customer connect while building processes for all relevant team members to engage with the customer.
:)Work closely with cross functional teams to plan, design, and implement reliability improvements and automation initiatives.
:)Collaborate with stakeholders to define, measure, and track Service Level :)Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
:)Facilitate post incident reviews (PIRs), ensuring action items are identified and followed through.
:)Drive initiatives to automate manual tasks and improve system observability and monitoring. Facilitate knowledge sharing across teams to ensure best practices are followed and operational knowledge is captured.
:)Experience with cloud platforms (AWS or GCP) and containerization (Docker, Kubernetes).
:)Skilled with Linux and Python/Shell scripting.
:)Proficient in Kubernetes clusters maintenance, managing and debugging containerized applications (Golang, Java, Python).
:)Understanding of Kafka, Spark, Storm, Cassandra, ElasticSearch, PostgreSQL, Redis (Elasticache), Zookeeper, Nginx, AWS S3/GCP GS.
:)Relavent knowledge of infrastructure as code software (e.g. Terraform, CloudFormation).
:)Experience in continuous integration practices & tools (Jenkins, Travis CI, CircleCI, etc)
:)Experience with monitoring solutions such as: (CloudWatch, Stackdriver, Prometheus, Grafana)

📧 Send your resume to madhuri.rane@techdoquest.com or comment for complete Job Description !!