Director, Site Reliability Engineer

Chewy

Full Time

Boston, MA, USA

10 months ago

Apply now

Our Opportunity:

We are looking for a Director, Site Reliability Engineer at our facility in Boston, Massachusetts to establish and manage incident response protocols for SREs, including on-call schedules and post-incident reviews, to minimize downtime and improve system performance.

What You’ll Do:

Develop and execute a comprehensive SRE strategy that aligns with the company's business objectives and growth plans.
Recruit, mentor, and develop SRE team members, fostering their professional growth and skill development.
Cross-functional engagement with other engineering teams, managing issues when they happen, as well as promoting reliability and resilience practices throughout the organization.
Transform business priorities into technical initiatives and ensure the alignment of SRE efforts with the broader organizational goals.
Ensure timely and consistent communication to facilitate a clear understanding of ongoing projects and their prioritization within the organization.
Establish strong working relationships at all organizational levels and across functional teams.
15% domestic travel required.

What You’ll Need:

Bachelor's degree in Computer Science, Computer Systems Engineering, Electrical Engineering, Telecommunication System Management or related field and 10 years of experience.
Experience must include 7 years with: engineering management; ServiceNow ITOM, ITSM Modules that focuses on incident, problem and change management;
Developing executive friendly dashboards based on observable metrics in IT systems (KPIs, Incident Trends, MTTR, MTTD etc.);
Docker & Kubernetes or similar container-based architectures;
Micro-services architecture, design patterns, and standard methodologies.
Experience must also include: performance engineering, observability, resiliency and chaos engineering of largescale latency sensitive enterprise applications;
ITSM process & tools like JIRA, PagerDuty;
Standard DevOps tools,
Build automation tools (Jenkins), issue tracking tools and source control systems (GitHub);
AWS offerings such as ECS, EC2, Lambda, Fargate, S3, DynamoDB, and API Gateway; and
Telemetry tooling and observability systems such as: Prometheus, Splunk, DataDog, Grafana.
15% domestic travel required.
The position is eligible for the Employee Referral Program.

Chewy is committed to equal opportunity. We value and embrace diversity and inclusion of all Team Members. If you have a disability under the Americans with Disabilities Act or similar law, and you need an accommodation during the application process or to perform these job requirements, or if you need a religious accommodation, please contact CAAR@chewy.com.

If you have a question regarding your application, please contact HR@chewy.com.

To access Chewy's Customer Privacy Policy, please click here. To access Chewy's California CPRA Job Applicant Privacy Policy, please click here.