Director, Site Reliability Engineer

Full Time
Boston, MA, USA
1 month ago

Our Opportunity:

We are looking for a Director, Site Reliability Engineer at our facility in Boston, Massachusetts to establish and manage incident response protocols for SREs, including on-call schedules and post-incident reviews, to minimize downtime and improve system performance.

What You’ll Do: 

  • Develop and execute a comprehensive SRE strategy that aligns with the company's business objectives and growth plans.
  • Recruit, mentor, and develop SRE team members, fostering their professional growth and skill development.
  • Cross-functional engagement with other engineering teams, managing issues when they happen, as well as promoting reliability and resilience practices throughout the organization.
  • Transform business priorities into technical initiatives and ensure the alignment of SRE efforts with the broader organizational goals.
  • Ensure timely and consistent communication to facilitate a clear understanding of ongoing projects and their prioritization within the organization.
  • Establish strong working relationships at all organizational levels and across functional teams.
  • 15% domestic travel required.

What You’ll Need:

  • Bachelor's degree in Computer Science, Computer Systems Engineering, Electrical Engineering, Telecommunication System Management or related field and 10 years of experience.
  • Experience must include 7 years with: engineering management; ServiceNow ITOM, ITSM Modules that focuses on incident, problem and change management;
  • Developing executive friendly dashboards based on observable metrics in IT systems (KPIs, Incident Trends, MTTR, MTTD etc.);
  • Docker & Kubernetes or similar container-based architectures;
  • Micro-services architecture, design patterns, and standard methodologies.
  • Experience must also include: performance engineering, observability, resiliency and chaos engineering of largescale latency sensitive enterprise applications;
  • ITSM process & tools like JIRA, PagerDuty;
  • Standard DevOps tools,
  • Build automation tools (Jenkins), issue tracking tools and source control systems (GitHub);
  • AWS offerings such as ECS, EC2, Lambda, Fargate, S3, DynamoDB, and API Gateway; and
  • Telemetry tooling and observability systems such as: Prometheus, Splunk, DataDog, Grafana.
  • 15% domestic travel required.
  • The position is eligible for the Employee Referral Program.

Chewy is committed to equal opportunity. We value and embrace diversity and inclusion of all Team Members. If you have a disability under the Americans with Disabilities Act or similar law, and you need an accommodation during the application process or to perform these job requirements, or if you need a religious accommodation, please contact CAAR@chewy.com.

 

If you have a question regarding your application, please contact HR@chewy.com.

 

To access Chewy's Customer Privacy Policy, please click here. To access Chewy's California CPRA Job Applicant Privacy Policy, please click here.