Senior/Tech Lead – DataOps Engineer
On a mission to make video easy for anyone …
It is an exciting time to join Synthesia as we reached a hallmark by becoming a Unicorn, having raised $90 million in Series C funding and now evaluated at $1 billion!! ✨ 🦄
Synthesia is the world’s #1 AI video generation platform. Well, it’s actually a video production studio — in a browser. As in, no cameras or film crews at all. You simply choose an avatar, enter your script in one of 60 languages, and your video is ready in minutes. In Synthesia, you can build personalised on-the-fly videos, give your chatbot a human face or run 24/7 weather channels in different languages, to name just a few of the possibilities. 🎬
We believe the future of media is synthetic, and we are on a mission to turn cameras into code and make everyone a creator. To learn more, check out our brand video that explains what we’re doing at Synthesia.
What will you be doing?In Research at Synthesia, we develop fully photo-real and controllable human avatars to enable users to create video on demand directly from text through text-to-video. We solve fundamental problems in generative AI to create lifelike representations that reproduce how someone looks, moves and speaks in video to create our avatars. We are building a foundational model for video of people - based on large model training using the latest generative models.
We are expanding our ML Platform team to supercharge large model training on PB scale datasets and we are looking for an outstanding Senior DataOps Engineer to join the team to manager our audio and video data platform. You would be responsible for ensuring the efficient architecture, operation, and maintenance of our platform, including data ingestion, processing, storage, and retrieval. You would work closely with cross-functional teams to implement best practices in data management, monitor system performance, troubleshoot issues, and optimize workflows to support data analysis and application development. Additionally, you must have a strong sense of ownership, taking full responsibility for the reliability and performance of the data platforms you manage, as well as managing requirements from multiple stakeholders.Some tasks you would work on include:
- Data Ops for data management, versioning, usage tracking, logging.
- Setup of a data-lake and data transform pipelines for large scale audio-visual datasets.
- Integration of third party annotation services for continuous data annotation and active learning.
- Setup of metadata stores and APIs to access data-sets on demand for ML training.
- Support for data streaming to train large models.
- Data pipelines - deploy custom ML data transformations, working with our ML team.
- Data access - create transient data-sets on demand to support ML model training.
- Data tracking - usage tracking and monitoring across all data sources.
- Establish the workflow for continual data delivery and annotation.
- 5+ years minimum experience in Data Engineering / Data Ops / Data Science.
- Previous experience managing large scale datasets with continuous data collection.
- Previous experience setting up data ops (ingest / storage / transform / access) end-to-end for multiple teams.
- Good understanding of the specificity of audio/video data at PB scale.
- Experience with Streaming / Batch Data Pipelines (Airflow, Apache Beam, Spark etc.).
- Experience with event-driven systems.
- Experience in handling heterogeneous types of data (e.g. audio / text / video / tabular data).
- Experience with any type of Relational Database Management Systems.
- Outstanding communication skills.
💸 You will be compensated well (salary + stock options + bonus)
🏥 Your will get Private Health Insurance
🚲 You get a cycle to work salary sacrifice scheme to commute to the office
🏝 You get 25 days of annual leave + public holidays
🥳 You will join an established company culture with regular socials and company retreats
👉 You can participate in a generous referral scheme
🚀 You will have huge opportunities for your career growth