Staff Production Operations Engineer

Index Exchange

Full Time

Toronto, ON, Canada

11 months ago

Apply now

We shaped the earliest forms of ad tech, and we’re looking for the technical expertise to help shape its future. Our customers have unique problems that can only be solved at internet scale, and that’s where the technical skills of our team make a real difference.

Our exchange handles over 350 billion requests every day (for comparison Google serves an estimated 9 billion searches a day), all running in our own global data centers. Every member of our technology team has an enormous amount of autonomy in building and managing our systems to support and enable our growing level of scale. Through the transparency of our technology, dedication to innovation and integrity, and long-standing customer relationships, we lead through change.

What’s it like to work at Index?

We have more than 550 Indexers around the globe dedicated to building a safe and transparent marketplace that provides a trusted experience for consumers.

Index is an exciting and fast-paced place to work. We’re built on our values of change, support, learning and teaching, trust, and intention. We pride ourselves on our independence and openness, not only in our technology, but in our teams, too. Our diverse and inclusive culture celebrates how we can leverage our unique differences to help drive Index forward.

Our culture of success is truly supportive and collaborative. In working together across our teams, we’re continually investing in the people and technology to solve the industry’s most complex problems. As we extend the promise of ad tech to every channel, we’re looking for talented engineers to help advance Index, and the industry, forward.

Are you ready to join the programmatic evolution?

Index Exchange funds the open web. Content and journalism across the internet are funded through advertising, and we are the engine that helps to make that happen transparently, safely and efficiently. Handling hundreds of billions of auctions per day within milliseconds requires an intense understanding of the exchange and the ecosystem that we live in.

Our business is growing significantly every year and is poised to grow even faster. Our people and our platforms are the foundation and enabler of that growth. We are significantly expanding our technology teams, and are looking for technologists with a passion for high performance software development, and a drive to deliver software products and platforms that enable and empower industries at a global scale.

About the Team:

The global Production Operations group is integral to ensuring the operational stability and reliability of our worldwide 24x7 on-premises and cloud environments. As the first line of defense this team has ownership of operations engineering. Collaborating closely with IT, SRE, Network, and Data engineering teams, and key stakeholders across business, product, and software engineering teams. We play a crucial role in maintaining systems health, responding to incidents, and optimizing the performance, efficiency, and stability of complex global systems.

Here's what you'll be doing:

As a Staff Production Operations Engineer, you will lead efforts to ensure our systems and networks operate seamlessly. You will be responsible for overseeing internal metrics, executing effective incident responses, and contributing to system optimizations. This role demands a deep understanding of systems, network, and hardware fundamentals, alongside the ability to quickly adapt and learn complex global systems operations.

Environment Stewardship

Own the monitoring and maintenance of the health, security, and performance of on-premises and hybrid-cloud infrastructure.
Execute timely and effective incident responses, minimizing downtime and ensuring swift resolution.
Build and update disaster recovery plans and security protocols and drive the maintenance of system backups.
Respond to alerts within our established SLOs and assist in incident triage, ensuring that the right teams are engaged to address issues promptly.

Support, Collaboration, and Reporting

Influence the team’s direction and foster accountability, trust, and focus on goals.
Act as a primary contact for operational issues, providing technical support and ensuring issues are resolved efficiently.
Collaborate with product and software engineering teams to provide operational insights and relay requirements.
Foster a collaborative environment by bringing people together to come up with better designs and approaches to complex problems.

Automation, Tooling & Research

Identify opportunities for system optimization and performance improvements. Engage technical leads and management and drive the change.
Research and implement advancements in technology and industry best practices.
Develop and maintain automation frameworks to streamline processes and reduce manual tasks.
Own risk identification and mitigation to proactively address present or anticipated operational challenges.
Identify and implement catalysts for future optimization including provisioning techniques, deployment optimization, ancillary services, pipelines, ansible playbooks, power usage, bandwidth etc.

Documentation and Knowledge Sharing

Create and ensure maintenance of comprehensive documentation for system configurations, processes, and incident resolution procedures.
Participate in knowledge sharing and provide cross-training to other departments.
Maintain runbooks and technical documentation, ensuring familiarity with internal and external escalation pathways.

24x7x365

Joining a globally distributed team that maintains coverage 24X7. As a member of this team and broader group, you may be required to occasionally work some weekends, holidays, and after hours to respond to high-urgency or emergency events outside of your local time-zone.

Here's what you need:

Technical Expertise

In-depth understanding of the Linux operating environment: kernel tuning, network stack tuning, system observability & instrumentation, and security & access management.
Solid understanding of layer 2-7 networking fundamentals and the relationship between servers & services, and the transit of their packets through network hardware.
In-depth experience engineering and maintaining a private-cloud infrastructure: Bare-metal, vSphere, KVM, Kubernetes.
Experience with tools like Ansible, Terraform, Docker, Kafka, Nexus
Experiencing with observability platforms: Prometheus, ELK, Jaeger, Grafana, Nagios, Zabbix
Familiarity with Big Data tools: Hadoop, HDFS, Spark, HBase
Ability to write code in Go, Python, Bash, or Perl for automation.

Work Experience

7+ years of proven experience in previous roles or one of the following roles:
- DevOps
- Linux System Administration
- Site Reliability Engineering
Built or maintained a private-cloud infrastructure running centos/rocky linux on a mix of bare-metal, virtualization, and containerization.
Managed public cloud environments such as AWS, GCP, Azure and their federation into on-premise environments.
Life-cycle management of bare metal servers such as Dell and Supermicro in globally distributed data centers (e.g. break-fix, baseband/firmware updates).
Built or maintained on-premise and cloud Kubernetes clusters: Kubeadm,EKS, GKE
Built or operated automation & orchestration frameworks for deployment & maintenance pipelines: e.g. Kafka, StackStorm, Ansible, Argo CD, Terraform to push out code or configuration updates, and building new infrastructure systems

Soft Skills

Communication: Clear and effective communication within and across teams. While we place a huge premium on technical skill, we value just as much your ability to work with other people.
Curiosity: things can (and will) break for different reasons; your curiosity will help drive you to identify and fix the things that go wrong
Alertness: we can never predict when things will go wrong so it is your job to be vigilant and prepared to respond when they do; you must be ready to reach out, ask questions and sound the alarm when necessary
Analytical Thinking: Monitor and analyze activity, collaborate with other departments to maintain technical defense.
Reliability: Prioritize the reliability of our systems, ensuring our exchange customers can trust in our services 24x7. Adhere to operational procedures, best practices, and security protocols.
Continuous Improvement: Embrace a culture of continuous learning and innovation, always seeking ways to enhance our operational efficiency.
Customer-Centricity: Committed to providing the best possible experience for our customers, both internal and external.
Accountability: Take ownership of our responsibilities and hold ourselves accountable for the quality of our work.

Why You’ll Love Working Here:

Comprehensive health, dental, and vision plans at no cost to you
Time off and flexible work schedules
Retirement plan with a 5% company match
Stock options and equity packages
Generous parental leave
Monthly wellness stipend plus fitness discounts and quarterly wellness group activities
Community engagement opportunities and donation-matching program
Annual virtual company retreats and regular community-led team events
One day off per year to volunteer
A workplace that supports a diverse, equitable, and inclusive environment – learn more here

Notification

Index Exchange is aware that there have been recent scams directed toward candidates regarding job interviews and offers.

Please be vigilant and do not accept interview requests, job offers, or other hiring-related documents from anyone other than our dedicated recruitment team, from the domain of @indexexchange.com. Our interview process consists of several steps, including phone screens and video interviews. We do not conduct interviews via an email questionnaire or request money at any point in the process.

If you do receive these requests, please let us know immediately at: report.scam@indexexchange.com.

We remain dedicated to resolving this matter and we appreciate your support.

Equal employment opportunity

At Index Exchange, we believe that successful products are built by teams just as diverse as the audience who uses them. As such, we are committed to equal employment opportunities. We celebrate diversity of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or expression, or veteran status. Additionally, we realize that diversity is deeper than any status or classification—diversity is the human experience. For those who show grit, passion, and humility—Index will welcome you.

Accessibility for applicants with disabilities

Index Exchange is committed to working with and providing access and reasonable accommodations to applicants with disabilities. Please let us know if you’d like to request a reasonable accommodation.

Index Everywhere, Index Anywhere

Our corporate headquarters are in Toronto, with major offices in New York, Montreal, Kitchener, London, San Francisco, and many other global cities. As a major global advertising exchange, we are committed to operating as a tightly knit global team and embracing and empowering talent wherever our colleagues may be.

#Ll-LP1

#LI-ONSITE

engineer operations production

Apply now

Index Exchange

Staff Production Operations Engineer

Full Time

7 months ago

London, UK

Index Exchange

Staff Production Operations Engineer

Full Time

1 month ago

London, UK

SpaceX

Operations Engineer (Starlink Production)

Full Time

9 months ago

Bastrop, TX 78602, USA

Staff Production Operations Engineer

For Candidates

For Startups

Search by Role

Search by City

Search by Tech

About