Director, Site Reliability Engineer
Our Opportunity:
We are looking for a Director, Site Reliability Engineer at our facility in Boston, Massachusetts to establish and manage incident response protocols for SREs, including on-call schedules and post-incident reviews, to minimize downtime and improve system performance.
What You’ll Do:
- Develop and execute a comprehensive SRE strategy that aligns with the company's business objectives and growth plans.
- Recruit, mentor, and develop SRE team members, fostering their professional growth and skill development.
- Cross-functional engagement with other engineering teams, managing issues when they happen, as well as promoting reliability and resilience practices throughout the organization.
- Transform business priorities into technical initiatives and ensure the alignment of SRE efforts with the broader organizational goals.
- Ensure timely and consistent communication to facilitate a clear understanding of ongoing projects and their prioritization within the organization.
- Establish strong working relationships at all organizational levels and across functional teams.
- 15% domestic travel required.
What You’ll Need:
- Bachelor's degree in Computer Science, Computer Systems Engineering, Electrical Engineering, Telecommunication System Management or related field and 10 years of experience.
- Experience must include 7 years with: engineering management; ServiceNow ITOM, ITSM Modules that focuses on incident, problem and change management;
- Developing executive friendly dashboards based on observable metrics in IT systems (KPIs, Incident Trends, MTTR, MTTD etc.);
- Docker & Kubernetes or similar container-based architectures;
- Micro-services architecture, design patterns, and standard methodologies.
- Experience must also include: performance engineering, observability, resiliency and chaos engineering of largescale latency sensitive enterprise applications;
- ITSM process & tools like JIRA, PagerDuty;
- Standard DevOps tools,
- Build automation tools (Jenkins), issue tracking tools and source control systems (GitHub);
- AWS offerings such as ECS, EC2, Lambda, Fargate, S3, DynamoDB, and API Gateway; and
- Telemetry tooling and observability systems such as: Prometheus, Splunk, DataDog, Grafana.
- 15% domestic travel required.
- The position is eligible for the Employee Referral Program.
Chewy is committed to equal opportunity. We value and embrace diversity and inclusion of all Team Members. If you have a disability under the Americans with Disabilities Act or similar law, and you need an accommodation during the application process or to perform these job requirements, or if you need a religious accommodation, please contact CAAR@chewy.com.
If you have a question regarding your application, please contact HR@chewy.com.
To access Chewy's Customer Privacy Policy, please click here. To access Chewy's California CPRA Job Applicant Privacy Policy, please click here.