Staff Software Engineer, Observability

Full Time
Toronto, ON, Canada
9 months ago

At Lyft, our mission is to improve people’s lives with the world’s best transportation. To do this, we start with our own community by creating an open, inclusive and diverse organization. Our Infrastructure team is passionate about building software to solve problems at massive scale. We do this often, and when we believe our solution is worth sharing with the community, such as Envoy Proxy, we open source our ideas for the benefit of others.

As an Observability team member, you are responsible for the operation and maintenance of our logging and metrics infrastructure. You ensure all teams at Lyft are aware of the operational health of their products by monitoring system availability and take a holistic view of our platform performance. You build software and platforms to automate infrastructure platform operations and management. By measuring and monitoring our operations you find opportunities to improve our systems in order to push our platform forward. You provide our partners with the support they need to help them build robust large scale distributed systems.We count on the reliability of our infrastructure to empower Lyft teams to provide our customers rich experiences that are highly available with rock solid performance to ensure our transportation platform continues to connect people and places. As we grow our team, we are seeking experienced Infrastructure Engineer to ensure that as our Infrastructure continues to scale, our platform continues to provide an essential and dependable service that transports millions of people every day. Specifically we are searching for someone who brings fresh perspectives, enjoys collaborating with cross-functional teams in order to continually improve our products and services for our customers.

Responsibilities:
  • Provide technical mentorship within the team and lead by example in developing robust, scalable, and efficient observability solutions
  • Drive cross-functional collaboration with engineering teams to advance Lyft's observability capabilities, ensuring they align with business objectives and developers' requirements
  • Play a pivotal role in steering the observability roadmap and strategic direction, utilizing a comprehensive understanding of business context to influence key decisions and initiatives
  • Design, develop, and deploy advanced tooling and systems that enhance the reliability, scalability, and efficiency of our platform
  • Operate and improve our infrastructure using industry best practices and tools, setting standards for excellence
  • Document infrastructure operations processes and insights, identify repeatable actions, and lead the automation of repetitive tasks
  • Participate in our team's on-call rotations, respond to incidents, and provide expert support to other teams in mitigating customer-impacting events
Experience:
  • 8+ years of experience in roles focused on software development, automation, and systems engineering, with a proven track record of technical leadership
  • Bachelor's Degree or equivalent experience in Computer Science or a relevant discipline, with a strong foundation in observability principles
  • Proven expertise in architecting and scaling observability infrastructure to support comprehensive monitoring and analysis in large production environments
  • Advanced proficiency in creating production-ready code in high-level languages, such as Go, Python
  • Extensive experience operating large-scale infrastructure in public cloud environments, such as AWS, and with Managed Services like Amazon OpenSearch Service and Amazon Managed Service for Prometheus
  • Deep experience with Kubernetes and Envoy Proxy, managing multi-cluster environments in large-scale production settings
  • Familiarity with distributed storage technologies such as S3, RDS, DynamoDB, Aurora, and distributed configuration systems such as Zookeeper and etcd
  • Expertise in deploying and managing monitoring, alerting, and logging systems at massive-scale, such as Prometheus, Grafana, Kibana, Telegraph, and M3
Benefits:
  • Extended health and dental coverage options, along with life insurance and disability benefits
  • Mental health benefits
  • Family building benefits
  • Access to a Health Care Savings Account
  • In addition to provincial observed holidays, team members get 15 days paid time off, with an additional day for each year of service 
  • 4 Floating Holidays each calendar year prorated based off of date of hire
  • 10 paid sick days per year regardless of province
  • 18 weeks of paid parental leave. Biological, adoptive, and foster parents are all eligible

Lyft proudly pursues and hires a diverse workforce. Lyft believes that every person has a right to equal employment opportunities without discrimination because of race, ancestry, place of origin, colour, ethnic origin, citizenship, creed, sex, sexual orientation, gender identity, gender expression, age, marital status, family status, disability, pardoned record of offences, or any other basis protected by applicable law or by Company policy.  Lyft also strives for a healthy and safe workplace and strictly prohibits harassment of any kind.  Accommodation for persons with disabilities will be provided upon request in accordance with applicable law during the application and hiring process.  Please contact your recruiter now if you wish to make such a request.

This role will be in-office on a hybrid schedule — Team Members will be expected to work in the office 3 days per week on Mondays, Thursdays and a team-specific third day. Additionally, hybrid roles have the flexibility to work from anywhere for up to 4 weeks per year.