Site Reliability Engineer

Full Time
Montreal, QC, Canada
5 months ago

At Lyft, our mission is to improve people’s lives with the world’s best transportation. To create the best transportation experience for all, we start in our own community by creating an open, inclusive, and diverse organization where all team members are recognized for what they bring. We believe that trip by trip, we’re changing the way our world works. We imagine a world where cities feel small again, where transportation and tech bring people together, instead of taking them apart. We see the future as community-driven.

Started as a donation-based ridesharing network in San Francisco, Lyft today provides millions of rides every day in more than 200 cities. We’re moving to a world where on-demand transportation is a viable alternative to vehicle ownership and we need people with vision and insight to help get us there. Our team has a history of enabling rich and creative features that set the standard for the ride/bike/scooter-sharing industry. We constantly innovate and incorporate cutting edge technologies to make the lives of our communities better. 

Our Montreal office developed the first automated bike-share system in America, a system that has since been deployed in multiple cities around the world (London, Montreal, New-York, San-Francisco to name a few). These are also some of the biggest bike-share systems in the world!

 

The Transit, Bikes, and Scooters (TBS) infrastructure team at Lyft in Montreal is growing, and we are looking for a Site Reliability Engineer to support our production systems, platforms, and the tools our developers use, while ensuring the reliability of our systems.

Every engineering team at Lyft is responsible for running and operating the software that they build. The Infrastructure team works towards standardizing and supporting all the rapidly evolving teams throughout our organization, assessing their architecture, helping them design scalable services, and fostering excellent operational practices. It's a mission-critical role of ensuring that our systems are always healthy, monitored, automated, and designed to scale.

The nature of work is interdisciplinary, and our teammates come from varying backgrounds e.g. (Site Reliability Engineer (SRE), Systems Engineer, Software Engineer, DevOps Engineer, Infrastructure Engineer, Production Engineer). We urge you to apply even if you feel uncertain that you have the exact background.

Technical interviews and interactions with the other offices in the company will be mainly in English; however, the working environment in Montreal is bilingual. 

Responsibilities:
  • Help define the team’s roadmap and architecture based on technology and business needs
  • Design and implement effective infrastructure abstractions that increase velocity of our application teams
  • Be responsible for, design, develop, deploy, monitor, operate and maintain existing or new elements of our systems infrastructure.
  • Build holistic visibility into SLIs, SLOs, SLAs, dependency graphs, past performance of software, network, and system to ensure that we can continue to scale without increasing operational burden or toil
  • Use the core Site Reliability Engineering principles of change management, monitoring, emergency response, capacity planning, and production readiness reviews to run the platform
  • Step back to observe patterns and develop innovative tools and automation to minimize toil. Use those learnings to drive the best operational practices.
  • Partner with the broader Lyft organization to build a culture of rigorously learning from incidents
  • Unblock, support, and effectively communicate across teams to achieve results
  • Have a good grasp and ability to explain the various tradeoffs made in decisions
  • Share your knowledge by giving brown bags, tech talks, and evangelizing appropriate tech and engineering best practices.
Experience:
  • 5+ years of software engineering/production infrastructure industry experience
  • Experience designing, debugging and running fault-tolerant large-scale distributed systems
  • Experience with high level programming languages (Python, Go, Java, etc.)
  • Experience working with public cloud platforms (e.g., AWS, Google Cloud Platform, Microsoft Azure, etc.)
  • Experience bringing software to production at high scale
  • Experience with common CI tools (Jenkins, Buildkite, CircleCI, TeamCity), and proficiency in at least one of those tools an asset
  • Experience working with databases, relational or NoSQL an asset
  • Experience in Linux system administration, or familiarity with managing a fleet of Linux servers an asset
  • Must be fluent in spoken and written English and minimally be willing to learn French if required
Benefits:
  • Great health, dental and vision insurance options, family plans
  • Life insurance and disability benefits
  • Mental health benefits
  • A Healthcare Spending Account
  • Free lunch, coffee, and tea when working in one of our offices
  • In addition to company-observed holidays (12 in 2021), team members get 15 days paid time off, with an additional day for each year of service
  • 4 floating days off per year
  • 10 paid sick days per year
  • 18 weeks of fully paid parental leave. Biological, adoptive, and foster parents are all eligible
  • And other special benefits related to our own services!

Lyft proudly pursues and hires a diverse workforce. Lyft believes that every person has a right to equal employment opportunities without discrimination because of race, ancestry, place of origin, colour, ethnic origin, citizenship, creed, sex, sexual orientation, gender identity, gender expression, age, marital status, family status, disability, pardoned record of offences, or any other basis protected by applicable law or by Company policy.  Lyft also strives for a healthy and safe workplace and strictly prohibits harassment of any kind.  Accommodation for persons with disabilities will be provided upon request in accordance with applicable law during the application and hiring process.  Please contact your recruiter now if you wish to make such a request.

This role will be in-office on a hybrid schedule — Team Members will be expected to work in the office 3 days per week on Mondays, Thursdays and a team-specific third day. Additionally, hybrid roles have the flexibility to work from anywhere for up to 4 weeks per year. #Hybrid