Platform - Site Reliability Engineer
Elastic is a free and open search company that powers enterprise search, observability, and security solutions built on one technology stack that can be deployed anywhere. From finding documents to monitoring infrastructure to hunting for threats, Elastic makes data usable in real-time and at scale. Thousands of organizations worldwide, including Barclays, Cisco, eBay, Fairfax, ING, Goldman Sachs, Microsoft, The Mayo Clinic, NASA, The New York Times, Wikipedia, and Verizon, use Elastic to power mission-critical systems. Founded in 2012, Elastic is a distributed company with Elasticians around the globe. Learn more at elastic.co.
As part of the Platform Engineering department, the SRE team is designing, building, scaling and maintaining the multi-cloud platform for hosting internal and external services such as the Elastic Cloud Hosted and Serverless. This includes developing new software and tools that themselves support the rest of the infrastructure, so that we can rapidly deploy products from all corners of the organization. We need help in this journey to offer a truly exceptional customer experience. This is where you come in!
What you will be doing:- Lead technical initiatives aimed at improving the reliability of the global Elastic infrastructure, taking an engineering approach to the prevention, detection, and timely mitigation of issues.
- Contribute to SRE engineering through auto-remediation and system engineering efforts to continue our efforts in reducing human intervention in automation of processes and operational tasks.
- Developing and maintaining software, tooling and automations to support the ever growing scaling demands of this global infrastructure.
- Champion an environment focused on collaboration, operational excellence, and uplifting others.
- Respond to major incidents, correcting and improving systems to prevent incidents and grow at scale. Participate in a weekly on-call rotation, using a follow-the-sun model.
- A well-rounded view of and true appreciation for reliability, borne of real-world experience operating production services. You have examples of using software engineering practices and SRE principles to solve operational problems.
- A background in software engineering, and can confidently collaborate with engineers to identify and resolve issues. Ideally with experience in public cloud and managed Kubernetes services
- Outstanding interpersonal skills, and are able to build strong relationships with your inclusive communication methods. Examples of working in distributed teams or working remotely is desirable.
You don't need to have all of these items, but these represent the types of work you will do as a Site Reliability Engineer at Elastic.
- You have operated a SaaS product in a public cloud ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform
- You have built or managed a Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, and the vital automation to support it.
- You have written non-trivial programs in Go
- You have worked with containerized services (such as Docker.)
- You have experience in system administration with professional skills in Linux on distributed systems at scale.
- You have designed, implemented or diagnosed and resolved issues with the Elastic Stack.
- You have demonstrable experience in leading and improving alerting and major incident management standard processes metrics systems (e.g. Elastic Stack, Graphite, Prometheus, Influx) to diagnose issues and quantify impacts to share with others at varying level of the organization.
- You are experienced in contributing in a self-organizing and collaborative team environment.
- You have mentored, coached, and grown team members to bring out the best in them.
As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do.
We strive to have parity of benefits across regions and while regulations differ from place to place, we believe taking care of our people is the right thing to do.
- Competitive pay based on the work you do here and not your previous salary
- Health coverage for you and your family in many locations
- Ability to craft your calendar with flexible locations and schedules for many roles
- Generous number of vacation days each year
- Double your charitable giving - We match up to $1500 (or local currency equivalent)
- Up to 40 hours each year to use toward volunteer projects you love
- Embracing parenthood with minimum of 16 weeks of parental leave
Different people approach problems differently. We need that. Elastic is an equal opportunity/affirmative action employer committed to diversity, equity, and inclusion. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation.
We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. To request an accommodation during the application or the recruiting process, please email candidate_accessibility@elastic.co We will reply to your request within 24 business hours of submission.
Applicants have rights under Federal Employment Laws, view posters linked below: Family and Medical Leave Act (FMLA) Poster; Pay Transparency Nondiscrimination Provision Poster; Employee Polygraph Protection Act (EPPA) Poster and Know Your Rights (Poster)
Please see here for our Privacy Statement.