Senior Site Reliability Engineer

Full Time
Mountain View, California, United States
11 months ago

About Applied

Autonomy is one of the leading technological advances of this century that will come to impact our lives. The work you’ll do at Applied will meaningfully accelerate the efforts of the top autonomy teams in the world. At Applied, you will have a unique perspective on the development of cutting-edge technology while working with major players across the industry and the globe.

Applied Intuition provides software solutions to safely develop, test, and deploy autonomous vehicles at scale. The company’s suite of simulation, validation, and drive log management software enables development teams to create thousands of scenarios in minutes, run simulations at scale, and verify and validate algorithms for production deployment. Headquartered in Silicon Valley with offices in Detroit, Washington, D.C., Munich, Stockholm, Seoul, and Tokyo, Applied consists of software, robotics, and automotive experts with experiences from top global companies. Leading autonomy programs and 17 of the top 20 global OEMs use Applied’s solutions to bring autonomy to market faster.

About the role

Modern autonomous system development is heavily reliant on realistic, large scale simulations. Our software solutions are relied on by our customers 24x7 to test and validate changes to their autonomy stacks. Many of our solutions sit directly in our customers' development flows.

Our global Site Reliability Engineering (SRE) team works around the clock to deploy, scale, monitor and optimize our simulation infrastructure deployments. Our SRE team is responsible for continually improving the reliability, resiliency and efficiency of our services.

Unlike many SRE teams, at Applied you will build relationships and work directly with some of our customers. This is a unique opportunity to gain deeper insight into the autonomy industry whilst working on large scale infrastructure.

At Applied, you will:

  • Work directly with our autonomy customers to deliver reliable, efficient large-scale simulation infrastructure across cloud providers
  • Build automated infrastructure to minimize operational work
  • Follow industry best practices in pursuit of minimal downtime
  • Document, root cause, and design systemic fixes for reliability issues detected in production

We're looking for someone who has:

  • A Bachelor’s degree in a technical field, or equivalent practical experience
  • 5+ years experience with modern, cloud deployments (kubernetes, docker, microservices)
  • 5+ years of experience with programming in Python, Go or C++
  • 3+ years of experience designing, analyzing, and troubleshooting distributed systems and working with Linux systems internals and administration
  • Ability to debug, optimize code, and automate routine tasks
  • Excellent communication skills

Nice to have:

  • Experience working with large-scale systems, storage, or networking
  • Experience setting, tracking and reporting on Service Level Objectives
  • Experience with at-scale load & scale validation techniques (e.g. load testing, traffic replay or shadowing)
  • Experience operating services at or above 99.9% availability
  • Experience utilizing open source tooling

The salary range for this position is $65,000 USD to $400,000 USD annually. This salary range is an estimate, and the actual salary may vary based on the Company's compensation practices.

Don’t meet every single requirement? If you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyway. You may be just the right candidate for this or other roles.

Applicants will be required to be fully vaccinated against COVID-19 upon commencing employment. Reasonable accommodations will be considered on a case-by-case basis for exemptions to this requirement in accordance with applicable federal and state law. Applicants should be aware that for external-facing roles that involve close contact with Company employees or other third parties on the Company's premises, accommodations that involve remaining unvaccinated against COVID-19 may not be deemed reasonable. The Company will engage in the interactive process on an individualized basis taking into account the particular position.

Applied Intuition is an equal opportunity employer and federal contractor or subcontractor. Consequently, the parties agree that, as applicable, they will abide by the requirements of 41 CFR 60-1.4(a), 41 CFR 60-300.5(a) and 41 CFR 60-741.5(a) and that these laws are incorporated herein by reference. These regulations prohibit discrimination against qualified individuals based on their status as protected veterans or individuals with disabilities, and prohibit discrimination against all individuals based on their race, color, religion, sex, sexual orientation, gender identity or national origin. These regulations require that covered prime contractors and subcontractors take affirmative action to employ and advance in employment individuals without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status or disability. The parties also agree that, as applicable, they will abide by the requirements of Executive Order 13496 (29 CFR Part 471, Appendix A to Subpart A), relating to the notice of employee rights under federal labor laws.