Job Description
About the job
Provide 24/7 production support for cloud-hosted applications and services
Monitor and troubleshoot AWS services like s3, Redshift, RDS, EC2, Glue, CloudFormation etc.
Experience with AWS CDK development using TypeScript or Python.
Collaborate with development, QA, and operations teams to resolve incidents, identify root causes, and implement preventive measures.
Develop and maintain IaC (Infrastructure as Code) using AWS CDK.
Implement logging, monitoring, and alerting solutions using CloudWatch, Prometheus, Grafana, ELK, or Datadog.
Perform routine maintenance tasks such as backups, patching, and resource optimization.
Experience with containerization and orchestration (Docker, Kubernetes, ECS, or EKS). Proficiency in using monitoring/logging tools and handling real-time incident response.
Participate in incident response, post-mortems, and change management processes.
Write and maintain documentation for systems, processes, and runbooks.
Ensure compliance and security best practices are followed across all cloud resources.