Proven Technologist, Leader & Team Builder


Professional Experience

Jun 2022 - Present
BlueJeans by Verizon - Senior Manager, SRE & DevOps Tools
San Jose, CA

At BlueJeans I was the US Manager of Site Reliability Engineering and Developer Tools. We follow the sun and operate a second team in Bangalore, India. BlueJeans SRE operates Kubernetes infrastructure in hybrid (DC + Cloud) multi-Cloud (AWS + Azure). DevOps Tools operates Developer and IaC tooling.

  • Cultivated the team from four full-time Engineers to 12 full-time Engineers.
  • Composed and defined the BlueJeans sunset plan, currently in-flight.
  • Designated an objective metric for application risk and targeted fixes for the riskiest services.
  • Delineated a Developer Tools lifecycle to ensure gaps were addressed and updates or deprecations occurred in a predictable fashion.
  • Authored and implemented an onboarding process for new services and functionality to ensure frequent and fruitful interaction between Dev and SRE. This delivered SLOs, background, and documentation on generic mitigations.
  • Led self-healing automation deployment capable of resolving 90% of current alerts without human intervention.
Sep 2020 - May 2022
NS1 - Engineering Manager, TechOps & Observability
San Jose, CA

At NS1, I managed the TechOps and Observability Teams. The TechOps team is a horizontal team that operates our Managed DNS products, focused on technical width. The Observability team focuses on extracting actionable data from internal metrics to allow data-based decisions at every level.

  • Manager of 12 Engineers across the US and Vietnam.
  • Created an Observability strategy centered on implementing high quality indicators (SLI) and objectives (SLO). This translates to major signal improvements.
  • Led reduction of pages from four thousand per month to under five hundred, while maintaining our Incident detection rate. This results in noise reduction.
  • Curated SDWAN deployment to stabilize the ChinaNet offerings, eliminating multiple hours of daily toil from previous workflow.
  • Designed and implemented a BeyondCorp / Zero Trust management plane, reducing complexity by retiring the inflexible team-based VPN solution.
  • Constructed a new Incident Management process, including postmortem and reviews. This focused on mitigation during customer pain and root cause + improvements occuring offline.
Oct 2018 - Aug 2020
Google LLC - Site Reliability Engineering Manager
Sunnyvale, CA

At Google I was responsible for the reliability of Google Cloud Storage and its internal counterpart, Blobstore. The SRE team was sharded into Serving and Backend. As Backend SRE Manager I was primarily focused with storage health including internal dependencies, durability and data integrity. In addition, I joined at a time of major team reorganization and had to hire and onboard others while personally onboarding.

  • Participated in the team as a design reviewer, code reviewer and oncaller.
  • Hired and onboarded six Software Engineers.
  • Roadmapping with other SRE shard, Dev partners and Dependency orgs.
  • Created training forum for GCS (30 attendees, twice weekly).
  • Led SLO improvements across GCS, from processes to implementations.
  • Taught Production Storage at SRE EDU, a mandatory week of training for all new hire SREs across the entire company.
Aug 2007 - Sep 2018
eBay Inc
San Jose, CA

Apr 2015 - Sept 2018 - Senior Manager, Infra Arch & Search SRE

At eBay I wore many hats. I was the Infrastructure Architecture Manager, Search SRE Manager, and a member of the Virtual Architecture Team focused on Infrastructure. The Search Infrastructure alone accounted for the majority of eBay’s Data Center space.

  • Participated in the Infra Arch team as an IC, creating my own blueprints and partnering with internal teams to resolve problems without clear next steps.
  • Manager of 12 Engineers and Architects. Served as a Tech Lead within GTO (70 Engineers between San Jose and Shanghai).
  • Manager of the Search SRE Manager in Shanghai, China.
  • Led the Search team from a 54% automation change rate to 96%. This resulted in the reduction of $500K OpEx annually by removing the need for contractors. This also helped drive the service to 99.999% of availability for 2016, 2017 and 2018.
  • Reduced footprint by 200 racks of gear through more efficient hardware, saving OpEx via Data Center savings.
  • Found and reported a DoS vulnerability in the F5 GTM Appliance (K23022557).
  • Award: eBay Cultural Luminary - 2018

Sep 2014 - Apr 2015 - Head of TechOps, Advertising

  • Manager of 12 professionals, Systems & Network Engineers + DBAs.
  • Worked with Product and Dev teams to create roadmaps for two business lines. This included effort, scoping and deliverables.
  • Developed TechOps Roadmap with my reports. Analyzed all infrastructure and identified our biggest threats to the business.
  • Led Holiday Readiness capacity adds. This included deep dives in 30 customer facing subsystems, 20 requiring capacity or architecture changes.
  • Reduced OpEx $300K annually by reducing duplication of external network services.
  • Created budget for 2015. YoY savings of $500K annually.

Jan 2014 - Sep 2014 - DevOps Manager, Advertising

  • Manager of six Professionals, DevOps & NetEng.
  • Leadership role in 150 person BU which generates over $400 million a year.
  • Lead triage and incident management process. This reduced unplanned Ops work to less than 5% of our sprint and ensured we didn’t fail the same way more than once.
  • Reduced duplication of efforts with my Ops counterparts in various cross BU projects.
  • Designed and implemented reliable disaster recovery architecture.
  • Developed Infrastructure for cloud-based data pipelines.

Aug 2007 - Dec 2013 - Lead Systems Engineer, Advertising

  • Led a team of 15 Operations professionals supporting a 24x7 production environment with 2,000 servers. This included 1700 Linux, 250 Windows and 50 Solaris hosts.
  • Provided architecture and design support on new apps and rewrites for every layer of our infrastructure. This includes Frontend, Images, Tracking and Import/Export of partner data.
  • Served as Commerce Lead on coast-to-coast data center migration. Migrated merchant ingestion system capable of 200 million SKU/hour and partner export system that generated custom feeds for over 1,000 partners with a 12-hour window.
  • Built network installer that bootstrapped 1,000 machines in a few hours for the data center migration. This utilized the native OS installer, CDPR, MySQL and PHP.
  • Led project to migrating our legacy connectivity to the Corporate eBay backbone. This reduced cost by $130K / year and had no service degradation.
  • Automated infrastructure management including a Github web-hook based DNS auto-update and syntax checking API using BIND, Apache and Perl.
Aug 2005 - Aug 2007
LIGO @ Caltech - Lead Systems Engineer
Pasadena, CA
  • Built and administered HPC cluster of 350 nodes with Condor scheduler.
  • Worked with Scientists to profile and optimize apps resulting in a 33% reduction in power and a 40% utilization reduction while keeping the same level of processing per day.
  • Built and managed Einstein@HOME mirror. This allowed us to augment our processing power by allowing anyone with our screensaver application to help us search for gravity waves.
  • Reworked Anaconda Installer to load a legacy Linux platform on unsupported hardware for ABI/API compatibility with existing cluster.
Jun 2003 - Aug 2005
JPL / NASA - Systems Engineer
Pasadena, CA
  • Created first space-to-web publishing system within NASA for the Mars Rovers website (Spirit and Opportunity).
  • Created and built multi-tenancy HA cluster for high traffic web sites - Mars Rovers, Deep Impact, Cassini, etc.
  • Built and Administered HPC Cluster for TES Instrument on Aura Spacecraft.
  • Wrote a BitTorrent wrapper to push large amounts of data to compute nodes. Our data pushes took four hours using BitTorrent and used to take seven days with the legacy scripts.