Senior Site Reliability Engineer
Get to know Okta
Okta is The World’s Identity Company. We free everyone to safely use any technology—anywhere, on any device or app. Our Workforce and Customer Identity Clouds enable secure yet flexible access, authentication, and automation that transforms how people move through the digital world, putting Identity at the heart of business security and growth. At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box - we’re looking for lifelong learners and people who can make us better with their unique experiences. Join our team! We’re building a world where Identity belongs to you.
The Engineering Opportunity
We are seeking an exceptional Software Engineer with a SRE/Devops focus who is experienced in building software systems to manage and deploy reliable, performant infrastructure and product code at scale on a cloud infrastructure.
This engineer will join our group responsible for designing, implementing and maintaining services/frameworks that automates actions against production infrastructure. These services enable engineers across product and infrastructure engineering groups to safely, reliably and repeatedly execute runbooks and other actions on test, preview and production environments.
As part of this team, you will also work on new efforts to keep Okta’s infrastructure practices at par with the best industry standards. You will also interface with teams involved with deployments, operations, release engineering, product and data to address process bottlenecks with code and automate time consuming jobs. You will work hands-on with Kubernetes on GCP / AWS to help Okta’s services run seamlessly in both cloud environments. You will also be involved in the maintenance and debugging of team owned services as part of incident response.
What you’ll be doing
- Design, build, maintain and deploy tools that allow Okta’s engineers to execute infrastructure production changes and deploy code.
- Manage multiple environments spanning a globally distributed infrastructure.
- Improve environment visibility and management in a repeatable and automatable way.
- Collaborate with all engineering and operations teams to improve overall product health and reliability.
- Respond to production incidents and determine how we can prevent them in the future.
- Triage and troubleshoot complex production issues to ensure reliability and performance.
- Design and build scalable and extensible platforms/services/tools in Java, Python, Go with a focus on automation and reliability.
- Work cross functionally with Operations and Product teams to identify bottlenecks and manual processes. Build solutions that provide scale and reliability to address these issues.
- Leverage industry best practices in infrastructure, automation, orchestration to explore greenfield opportunities that will form the basis of future infrastructure improvements.
- Identify areas for automation that are self-serviceable to reduce manual onboarding. Develop tools and processes to address these areas.
- Work on improving the security posture of team owned services and infrastructure. This would involve base image maintenance, updating hosts with newer library versions from vendors as well as services with vulnerability free libraries if and when they are identified.
What we are looking for
- 5+ years of Software Development in Java, Go, Python or similar backend languages
- 5+ years of development experience building, maintaining and debugging services, internal tools and frameworks
- 3+ years experience automating and deploying large scale production services in AWS, GCP or similar
- 3+ years of hands on experience working with Kubernetes, with a good understanding of Kuberentes fundamentals
- Working knowledge of database technologies ( Mysql, MongoDB, NoSql databases etc)
- Experience using public cloud (GCP, AWS)
- Experience with GitOps, Docker, Kubernetes, Terraform, Helm, Kustomize, CI/CD(Spinnaker, Argo CD etc)
- Experience working with stakeholders from different backgrounds and strong verbal and written communication skills
- General understanding of security and networking concepts
- Knowledge of infrastructure as code tools such as Terraform
- Knowledge of configuration management tools such as Chef, Ansible or Puppet
- Proficient using Docker and supporting infrastructure and strong Linux and networking fundamentals
- Experience with artifact management a plus (Artifactory, AR, ECR)
What you can look forward to as an Full-Time Okta employee!
- Amazing Benefits
- Making Social Impact
- Fostering Diversity, Equity, Inclusion and Belonging at Okta
Okta cultivates a dynamic work environment, providing the best tools, technology and benefits to empower our employees to work productively in a setting that best and uniquely suits their needs. Each organization is unique in the degree of flexibility and mobility in which they work so that all employees are enabled to be their most creative and successful versions of themselves, regardless of where they live. Find your place at Okta today! https://www.okta.com/company/careers/.
Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws. If reasonable accommodation is needed to participate in the job application or interview process, please use this Form to request an accommodation.
Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Privacy Policy at https://www.okta.com/privacy-policy/.