Site Reliability Engineer

Full Time
Bengaluru, Karnataka, India
3 months ago

BE PART OF BUILDING THE FUTURE.

What do NASA and emerging space companies have in common with COVID vaccine R&D teams or with Roblox and the Metaverse? 

The answer is data, -- all fast moving, fast growing industries rely on data for a competitive edge in their industries. And the most advanced companies are realizing the full data advantage by partnering with Pure Storage. Pure’s vision is to redefine the storage experience and empower innovators by simplifying how people consume and interact with data. With 11,000+ customers including 58% of the Fortune 500, we’ve only scratched the surface of our ambitions. 

Pure is blazing trails and setting records:

  • For ten straight years, Gartner has named Pure a leader in the Magic Quadrant 
  • Our customer-first culture and unwavering commitment to innovation have earned us a certified Net Promoter Score in the top 1% of B2B companies globally
  • Industry analysts and press applaud Pure’s leadership across these dimensions
  • And, our 5,000+ employees are emboldened to make Pure a faster, stronger, smarter company as we go

If you, like us, say “bring it on” to exciting challenges that change the world, we have endless opportunities where you can make your mark.

MEET THE TEAM

ISS (Infrastructure Shared Service) is an international organisation within Pure, responsible for all of Pure Storage's engineering infrastructure, development environment, and production services. We work with all internal engineering teams to provide reliable services that are used to develop new products and features, in many different environments: from our multiple data center to various public clouds.

As a Reliability Engineer in ISS, you will work to improve the reliability and performance of Pure Storage's critical infrastructure applications. This means setting and owning SLO goals for uptime and latency, as well as helping colleagues leverage the features and workflows available to them. All with the focus of keeping the backend web servers, load balancers, and database servers healthy and running smoothly.  

We are looking for engineers who have a mix of software and systems skills, are passionate about reliability, performance, and efficiency, and have experience building tools, services, and automation to manage and improve production services.

Responsibilities

  • Engage in and improve the whole lifecycle of services—from inception and design, through deployment and operation.
  • Design, operate, maintain, and troubleshoot enterprise systems such as databases, message queues, APIs, and distributed applications through the use of data and metrics such as SLOs and error budgets.
  • Establish and practice sustainable incident response and blameless postmortems to prevent problem recurrence.
  • Support services before they go live through activities such as system design, developing software platforms and frameworks, capacity planning, and launch reviews.
  • Scale systems sustainably through mechanisms like scripting and automation; evolve systems by pushing changes that improve their operational management reliability and velocity.
  • Work closely with development teams, infrastructure teams, and business stakeholders to understand requirements and design solutions across multiple time zones
  • Ensure that hardware design meets business and technical requirements, including performance, scalability, and reliability
  • Ensure that hardware design meets industry standards and best practices for data center infrastructure
  • Create and maintain detailed documentation on system configurations, procedures, and operational policies.
  • Day to day server administration (physical, virtual), storage administration, network config and applications support, health and performance monitoring. Ensuring quick turnaround times, as well as performance levels, availability, and security.
  • Deploy infrastructure manually and also via configuration management / automation platforms
  • Troubleshoot hardware, software, and network related issues, provide quick resolution to reported problems and perform root cause analysis to analyze reason for issues and prevent future occurrences

Minimum Qualifications

  • Experience programming in Python or other languages.
  • Experience in designing, analysing, and troubleshooting large-scale distributed systems
  • Able to work in a 24x7 on-call rotation (approx. 1 week every 2 months);
  • Systematic problem-solving approach, strong communication skills, and a sense of ownership and drive;
  • Working experience of Observability platforms such as Elastic or DataDog.
  • Experience deploying / troubleshooting Linux systems (Red Hat/CentOS), Ubuntu as well as VMware environments (esxi, NSX, vsan) 
  • Experience working directly with end users to determine deployment and configuration requirements
  • Ability to lift 15+ kilograms when working with storage equipment.

Preferred qualifications

  • 7+ years as a Site Reliability Engineer, DevOps Engineer, or Infrastructure engineer;
  • Understanding of Unix/Linux, and optionally Windows operating systems;
  • Experience working with Infrastructure as Code / Automation tools (Ansible, Terraform, CloudFormation);
  • Well organised, with ability to prioritise tasks independently, set goals and follow through in order to see them to completion; 
  • Experience with containers and container orchestration systems such as Docker and/or Kubernetes;
  • Expertise with hybrid (bare metal/public cloud - AWS & Azure preferred) cloud environments.
  • Experience with containerisation and virtualisation technologies such as Docker, Kubernetes, and VMware
  • Knowledge of storage technologies (SAN / NAS devices)

BE YOU—CORPORATE CLONES NEED NOT APPLY.

Pure is where you ask big questions, think differently, and make an impact. This is not just a job, but a place where you have a voice and can accelerate your career. We value unique thoughts and celebrate individuality, and with ample opportunity to learn, develop yourself, and expand into different roles, joining Pure is an investment in your career journey.

Through our Pure Equality program, which supports a flourishing field of employee resource groups, we nourish the personal and professional lives of our team members. And our Pure Good Foundation gives back to local and global communities through volunteering and grants.

And because we understand the value of bringing your full and best self to work, we offer a variety of perks to manage a healthy balance, including flexible time off, wellness resources, and company-sponsored team events.

PURE IS COMMITTED TO EQUALITY.

Research shows that in order to apply for a job, women feel they need to meet 100% of the criteria while men usually apply after meeting about 60%. Regardless of how you identify, if you believe you can do the job and are a good match, we encourage you to apply.

Pure is proud to be an equal opportunity and affirmative action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or any other characteristic legally protected by the laws of the jurisdiction in which you are being considered for hire. 

If you need assistance or an accommodation due to a disability, you may contact us at TA-Ops@purestorage.com.

APPLICANT & CANDIDATE PERSONAL INFORMATION PRIVACY NOTICE.

If you're wondering how or why Pure collects or uses information you provide, we invite you to check out our Applicant & Candidate Personal Information Protection Notice.