LLM Red Team

Full Time
Mountain View, CA, USA
3 months ago

At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.

 

Snapshot 

 

Our team is responsible for enabling AI systems to reliably work as intended, including identifying potential risks from current and future AI systems, and conducting technical research to mitigate them. On this team, you will discover and evaluate vulnerabilities in our frontier AI systems, enabling other teams to implement approaches that mitigate the risks.

About us 

 

Artificial Intelligence could be one of humanity’s most useful inventions. At Google DeepMind, we’re a team of scientists, engineers, machine learning experts and more, working together to advance the state of the art in artificial intelligence. We use our technologies for widespread public benefit and scientific discovery, and collaborate with others on critical challenges, ensuring safety and ethics are the highest priority.

Conducting research into any transformative technology comes with responsibility to build mechanisms for safe and reliable development and deployment at every step. Technical safety research at Google DeepMind investigates questions related to evaluations, reward learning, fairness, interpretability, robustness, and generalization in machine learning systems. Proactive research in these areas is essential to the fulfillment of the long-term goal of Google DeepMind: to build safe and socially beneficial AI systems.

The role

 

This team aims to work on the forefront of technical approaches to designing state-of-the-art LLM attack simulations to identify unknown vulnerabilities and threats that can bypass implemented guardrails. 

We’re seeking to build a team of creative problem solvers to provide the most complete AI risk position. Combined with Google DeepMind’s existing AI systems, these experts will analyze, prompt, and optimize internal models to accelerate the future of AGI safely and responsibly.  

Key Responsibilities:

  • Expose issues in existing and future GDM developed LLMs and other generative models related to a suite of risk areas such as deceptive content, toxicity, bias, plagiarism, weapons development, cyber attacks, and other criminal activity 
  • Develop, qualitatively probe, and quantitatively evaluate attacks on LLMs for a variety of threat models, including adversarial attacks, jailbreaks, and prompt injection attacks
  • Evaluate models for falsehoods, verbal manipulation, and dangerous scientific capability
  • Develop vulnerability testing scripts (Python) to share with ML and SWE engineering teams for internal development
  • Analyze and visualize results, synthesize and communicate findings, and suggest mitigations for public technology releases
  • Develop new prompt engineering methods and support the development of scalable tooling in collaboration with other GDM technical staff
  • Create training materials for future red teaming efforts & adversarial analysis

About you

 

We seek out individuals who thrive in ambiguity and who are willing to help with whatever moves prototypes forward. We regularly need to invent novel solutions to problems, and often change course if our ideas don’t work, so flexibility and adaptability to work on any project is a must.

In order to set you up for success in this role at Google DeepMind, we are looking for the following skills and experience:

  • BSc/BA, MSc or PhD/DPhil degree in computer science, mathematics, applied stats, machine learning or similar experience working in industry
  • Experience with novel exploration and control of LLM behavior, e.g. red teaming LLMs to evade safeguards, using frontier LLMs programmatically to develop applications, identifying novel LLM capabilities through directed exploration, or improving LLM capability through iterative behavior exploration and finetuning 
  • Proven knowledge and experience of Python
  • Knowledge of machine learning
  • Ability to conduct analysis of model performance via statistics and data visualization

In addition, the following would be an advantage: 

  • Academic work in language models, particularly on adversarial robustness
  • Previous disclosures of vulnerabilities or security issues in LLMs
  • Experience in applying experimental ideas to real-world problems
  • Experience with a ML framework such as Tensorflow, PyTorch, or JAX
  • Cross-functional collaboration experience
  • Prior experience collaborating with researchers

 

What we

offer

 

At Google DeepMind, we want employees and their families to live happier and healthier lives, both in and out of work, and our benefits reflect that. Some select benefits we offer: enhanced maternity, paternity, adoption, and shared parental leave, private medical and dental insurance for yourself and any dependents, and flexible working options. We strive to continually improve our working environment, and provide you with excellent facilities such as healthy food, an on-site gym, faith rooms, terraces etc.

We are also open to relocating candidates to a core GDM location and offer a bespoke service and immigration support to make it as easy as possible (depending on eligibility).

The US base salary range for this full-time position is between $136,000 - $245,000 + bonus + equity + benefits. Your recruiter can share more about the specific salary range for your targeted location during the hiring process.

Application deadline: 12pm GMT Thursday 7th September 16 2024 

Note: In the event your application is successful and an offer of employment is made to you, any offer of employment will be conditional on the results of a background check, performed by a third party acting on our behalf. For more information on how we handle your data, please see our Applicant and Candidate Privacy Policy.