I'm Vivek — a Lead DevOps / SRE engineer with 8+ years shipping production systems across payments (Paytm, PayU), telecom (Airtel) and SaaS (Sprinklr). I've owned cloud migrations (Ali Cloud → AWS, on-prem DC → AWS), shipped Kubernetes Operators that automate toil, and built observability stacks that page humans only when humans are actually needed.
I lead from the keyboard: I write the design doc, ship the first version, then mentor the team that owns it. I've operated platforms at 1000+ cluster / 5000+ engineer scale, run on-call rotations, hardened security and compliance for regulated payment workloads, and driven 30%+ cloud cost reductions through right-sizing and FinOps.
Looking ahead, I'm open to Lead, Principal, Staff, or DevOps/SRE Manager roles where reliability, platform engineering and AI-assisted operations are first-class problems.
1000+ Kubernetes clusters across 14 regions on EKS · GKE · AKS. GitOps-first with ArgoCD ApplicationSets; zero-drift policy enforced in CI.
Prometheus + Thanos + VictoriaMetrics + Grafana + Loki/ELK at PB scale. Per-tenant SLOs, burn-rate alerting, runbook-linked dashboards.
LLM-backed SRE copilot (Langfuse-traced, RAG over runbooks) that triages alerts, proposes remediations, and auto-closes low-risk incidents — 1200+/week.
IAM, KMS, Secret Manager, GuardDuty, CloudTrail, WAF, Shield, SonarQube, Qualys, Cortex. Shift-left policies in CI/CD; PCI-DSS-aware payment workloads.
Lead cross-functional pods of 6–12 engineers, run incident reviews and on-call rotations, mentor SRE-2/SRE-3 ICs, partner with product on platform roadmaps.
30% cloud spend reductions through right-sizing, spot/savings plans, multi-tenant cluster packing, and a Release Engineering UI that cut deploy toil 70%.