Site Reliability Engineer
Lookout, Inc. is the endpoint to cloud security company purpose-built for the intersection of enterprise and personal data. We safeguard data across devices, apps, networks and clouds through our unified, cloud-native security platform — a solution that's as fluid and flexible as the modern digital world. By giving organizations and individuals greater control over their data, we enable them to unleash its value and thrive. Lookout is trusted by enterprises of all sizes, government agencies and millions of consumers to protect sensitive data, enabling them to live, work and connect — freely and safely. To learn more about the Lookout Cloud Security Platform, visit www.lookout.com and follow Lookout on our blog, LinkedIn and Twitter.
We are looking for a Site Reliability Engineer to join our team and develop software systems and automated solutions for operational aspects in an organization.
Site Reliability Engineer responsibilities include monitoring computer systems and building alerts for various operational issues that Cloud services can experience.
Ultimately, you will work with our Platform team to ensure our organization can continue to deliver products and services in Lookout Security Platform.
Responsibilities
- Participate in on-call rotations and respond to incidents to ensure system availability and performance.
- Conduct post-incident reviews (PIRs) to analyze incidents, identify root causes, and implement preventive measures
- After incidents, document actions in order to create required documentation during incident response.
- Expertise with monitoring tools and log analysis tools.
- Familiarity with collaboration and communication tools
- Work closely with software development teams to ensure the reliability and performance of applications in production.
- Implement and maintain monitoring and alerting systems to identify and resolve issues proactively.
- Knowledge of software development
- Analyze system performance and plan for future capacity needs by working with teams to scale infrastructure resources based on demand.
Skill set:
- Hands-on experience with cloud platforms such as AWS, GCP.
- Hands-on experience with observability tools like Datadog, Splunk, Elasticsearch is a plus
- Knowledge of networking principles, protocols, and troubleshooting.
- Understanding of SRE principles and practices, including error budgeting, Service Level Objectives (SLOs), and Service Level Indicators (SLIs).
- Knowledge of Linux/Unix system administration.
- Understanding of system commands, file systems, and processes.
- Familiarity with containerization technologies (Docker) and container orchestration tools (Kubernetes).
- Strong collaboration and communication skills to work effectively with cross-functional teams.
- Should be completely flexible to work in rotational shifts as required.