About myself

Hello

I am Ranjib Dey, a software engineer focused on production engineering, infrastructure automation, reliability, incident management, and open source systems. My public work spans infrastructure-as-code, testing discipline for operations, Linux containers, service discovery, ML-assisted capacity management, and hardware automation.

I currently work on production engineering at Uber. Public artifacts from this era include Uber Engineering Blog work on ML-driven capacity safety and capacity recommendation, talks on incident taxonomy and incident management, and a co-authored USENIX NSDI 2026 paper on Uber’s failover architecture.

Earlier public work includes Chef and DevOps infrastructure automation at ThoughtWorks and PagerDuty. I was an open source maintainer of Chef, received the Chef Awesome Community Member / MVP award, authored chef-lxc, and wrote and spoke about testing infrastructure code, TDD in operations, CI/CD for Chef, LXC, and Consul.

Outside work, I build and maintain reef-pi, an open source Raspberry Pi based reef tank controller. reef-pi has grown from a hobby automation project into a public ecosystem with official guides, community support, hardware integrations, Maker Faire recognition, and coverage from Make:, Adafruit, and the Raspberry Pi Foundation.

Public work themes

  • Infrastructure as code and configuration management with Chef.
  • Testing discipline for operations, including ChefSpec, Test Kitchen, Serverspec/InSpec, and CI for infrastructure code.
  • Linux containers and early LXC-based build and deployment workflows.
  • Service discovery and platform engineering with Consul and distributed systems tooling.
  • SRE, incident management, resiliency engineering, and incident taxonomy.
  • Capacity management and ML-assisted operations at hyperscale.
  • Physical computing, IoT, and open source reef automation through reef-pi.

Public speaking, papers, and blogs

  • Culturing resiliency with Data: A taxonomy of incidents. ChaosConf, 2020. InfoQ
  • How Uber does incident management. Major Incident Management Expo, 2020.
  • Using ML for microservice capacity safety. Uber Engineering Blog, 2019. Uber Engineering
  • Uber’s Failover Architecture: Reconciling Reliability and Efficiency in Hyperscale Microservice Infrastructure. USENIX NSDI 2026. USENIX
  • Consul at PagerDuty. HashiCorp User Group SF, 2016. Slides
  • Strategies for adopting Test Driven Development in operations. Agile 2015, Washington, D.C. Slides, experience report
  • Extending CI/CD in operations using Chef and LXC. LinuxCon North America, 2015. Slides
  • Chef at PagerDuty. PagerDuty Engineering Blog. Post
  • Chef-LXC: Building and deploying custom containers. Bay Area Chef Meetup, 2013. Slides
  • A short introduction to LXC. Bay Area Large Scale Production Engineering group, 2014. Slides
  • How to mock a mocking bird: testing dynamic infrastructure. ChefConf, 2014. Slides
  • Attaining Resiliency: Culture, Tools and Practices. DevopsDays India, 2013. Slides
  • Automated infrastructure testing. vodQA-10, Pune, 2012. Slides
  • System automation using Chef. FudCon Pune, 2011.
  • DZone blogs on web operations, CI/CD, and Chef.

Profiles and projects