Aligning AI and Robot Behaviors with Human Intents

People must be easily able to specify, model, inspect, and revise AI and robot behaviors. I design methods and tools to enable these interactions.

Specify

Writing specifications for AI systems is critical yet notoriously hard because these systems lack common sense reasoning, making it easy to write specifications that result in unintended and potentially dangerous side effects. I study how experts and non-experts write specifications, and how these specifications should be interpreted.

A project overview image. This shows an icon of a robot and a person. There is an arrow from the person to the robot, with the text 'specify behaviors' above the arrow

The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications

Trial-and-error reward design is unsanctioned, but the implications of this widespread practice have not been studied. We conduct empirical computational and user study experiments, and we find that trial and error leads to the design of reward functions which are overfit and otherwise misdesigned. Even in a trivial setting, we find that reward function misdesign is rampant. Published at AAAI Conference on Artificial Intelligence, 2023.

Project Webpage

A visual comparing optimal advantage versus partial return. The partial return is equal (0) when the robot makes progress toward or goes away from the goal. The advantage is higher when the robot makes progress toward the goal, as desired.

Learning Optimal Advantage from Preferences and Mistaking it for Reward

Typical reinforcement learning from human preferences (RLHF) approaches assume that human preferences arise only from trajectory segments' sums of reward. In past work, we showed that regret is a better model of human preferences (published at TMLR 2024). In this work, we consider the consequences if preferences arise instead from this better-supported regret preference model. Published at AAAI Conference on Artificial Intelligence, 2024.

Project Webpage

Inspect

After writing a specification and using some algorithm to optimize it, how can a person assess whether a robot or an AI has learned the behavior that meets their needs and expectations? Is it aligned to their intent?

A project overview image. It shows a decision surface, with highlighted points corresponding to an adversarial example, a picture of a corgi, a picture of a corgi butt, and a picture of a loaf of bread. The level sets for 50 percent confidence examples (e.g., the corgi butt and the adversarial examples) are highlighted.

Bayes-TrEx: A Bayesian Sampling Approach to Model Transparency by Example

Looking at expressive examples can help us better understand neural network behaviors and design better models. Bayes-TrEx is a tool to find these expressive examples. Published at AAAI Conference on Artificial Intelligence, 2021.

Project Webpage

A project overview image. Above it shows three controllers in a 2D navigation task: an RRT controler, a IL controller using smoothing and Lidar, and a DS modulation controller. Below we show an example 3D reaching task: a robot is positioned in front of a table, and a small red target is present.

RoCUS: Robot Controller Understanding via Sampling (An Extension to Bayes-TrEx)

We sample representative robot behaviors. We show how exposing these representative behaviors can help with the revision of a dynamical system robot controller's specifications. Published at Conference on Robot Learning (CoRL), 2021.

Project Webpage

Six example saliency maps for an image of a crow.

Do Feature Attribution Methods Work?

We design a principled evaluation mechanism for assessing priority attribution methods, and contribute to the growing body of literature suggesting these methods cannot be trusted in the wild. Published at AAAI Conference on Artificial Intelligence, 2022.

Project Webpage

A scene showing how a user might view a logical summary and a system state. The image shows a car with cars to its left, right, and behind. The description says 'I speed up when one or both of: (1) Both of: - a vehicle is not in front of me - the next exit is not 42. (2) All of: - a vehicle is to my right. - a vehicle is not in front of me. - a vehicle is behind me.

Communicating Logical Statements Effectively

How should we best present logical sentences to a human? I study whether different logical forms are easier or harder for people to parse. I find that people are more resilient than anticipated! Published at International Joint Conference on AI (IJCAI), 2019.

Project Webpage

Model

How do humans come to understand the behavioral patterns encoded in a specification? More generally, how do humans maintain and mitigate uncertainty about their beliefs about AI systems?

A visual overview of the Variation Theory of Learning

Revisiting Human-Robot Teaching and Learning Through the Lens of Human Concept Learning Theory

We look at how cognitive theories of human concept learning should inform human-robot interaction interfaces, especially for teaching and learning tasks. Published at ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2022.

Project Webpage

Above: two side-by-side photos of a robot moving with contrast, in which a robot has two different policies (i and not i) in the same environment. Below: two side-by-side photos of a robot moving with generalization, in which a robot has the same policy (i) in two different environments.

Varying How We Teach: Adding Contrast Helps Humans Learn about Robot Motions

In this preliminary work, we test the consequences of applying some of the insights from the Variation Theory of Learning to assist humans in learning about robot motions. Published at HRI Workshop on Human-Interactive Robot Learning, 2023.

Project Webpage

Sidequests

A project overview image. It shows an example of thematic analysis, where themes are grouped into clusters. The image is zoomed out, so you can't read specific details.

Machine Learning Practice Outside Big Tech: Resource Constraints Challenge Responsible Development

We interviewed industry practitioners from startups, government, and non-tech companies about their use of machine learning in developing products. We analyze these interviews with thematic analysis. Published at AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2021.

Project Webpage

A small TurtleBot robot, kitted out with a cookie delivery box.

Piggybacking Robots: Human-robot Overtrust in University Dormitory Security

My award-winning undergraduate senior thesis, a project which set out to answer the question of whether we place too much trust in robotic systems, specifically in the physical security domain. Published at ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2017.

Project Webpage

Media Coverage

A comic depicting a robot with a plate of cookies, trying to enter someone's house.

PhD Comics

A comic depicting a robot trying to get a human to take a cookie in exchange for placing themself in danger.

Soonish by Kelly and Zach Weinersmith


Advocacy

Serena and colleague Willie Boag standing in front of a doorway in Congress.

Science Policy

In 2021-2022, I served as President and in 2020-2021 as Vice President of MIT's Science Policy Initiative. I advocate for using science to inform policy, and for using policy to make science just and equitable. Pictured above with colleague Willie Boag. Not an endorsement for Senator Alexander.

Serena's students posing for a photo on a staircase.

Equity and Inclusion

I'm a strong advocate for the inclusion of women and underrepresented minorities in science. In 2019, I served as co-president of MIT's GW6: Graduate Women of Course 6. Pictured above: high school students from an introductory CS class I taught in Puebla, Mexico.

Conferences and Journals

Models of Human Preference for Learning Reward Functions
W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi
Transactions on Machine Learning Research (TMLR) 2024
Learning Optimal Advantage from Preferences and Mistaking it for Reward
W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum
AAAI Conference on Artificial Intelligence 2024
Quality-Diversity Generative Sampling for Learning with Synthetic Data
Allen Chang, Matthew Fontaine, Serena Booth, Maja Matarić, Stefanos Nikolaidis
AAAI Conference on Artificial Intelligence 2024
The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications
AAAI Conference on Artificial Intelligence 2023
Extended Abstract: Graduate Student Descent Considered Harmful? A Proposal for Studying Overfitting in Reward Functions
Multidisciplinary Conference on Reinforcement Learning and Decision Making 2022
Spotlight, Extended Abstract: Partial Return Poorly Explains Human Preferences
W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi
Multidisciplinary Conference on Reinforcement Learning and Decision Making 2022
Revisiting Human-Robot Teaching and Learning Through the Lens of Human Concept Learning Theory
Serena Booth, Sanjana Sharma, Sarah Chung, Julie Shah, Elena L. Glassman
ACM/IEEE International Conference on Human-Robot Interaction (HRI) 2022
Do Feature Attribution Methods Correctly Attribute Features?
AAAI Conference on Artificial Intelligence 2022
Bayes-TrEx: A Bayesian Sampling Approach to Model Transparency by Example
AAAI Conference on Artificial Intelligence 2021
RoCUS: Robot Controller Understanding via Sampling
Conference on Robot Learning (CoRL) 2021
Machine Learning Practice Outside Big Tech: How Resource Constraints Challenge Responsible Development
AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) 2021
Evaluating the Interpretability of the Knowledge Compilation Map: Communicating Logical Statements Effectively
International Joint Conference on AI (IJCAI) 2019
Piggybacking Robots: Human-robot Overtrust in University Dormitory Security
Serena Booth, James Tompkin, Hanspeter Pfister, Jim Waldo, Krzysztof Gajos, Radhika Nagpal
ACM/IEEE International Conference on Human-Robot Interaction (HRI) 2017

Workshops

Learning Optimal Advantage from Preferences and Mistaking it for Reward
W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum
2023 ICML Workshop on The Many Facets of Preference-based Learning (MFPL) 2023
Varying How We Teach: Adding Contrast Helps Humans Learn about Robot Motions
HRI Workshop on Human-Interactive Robot Learning 2023
The Irrationality of Neural Rationale Models
2022 NAACL Workshop on Trustworthy Natural Language Processing (TrustNLP) 2022
Do Priority Attribution Methods Correctly Attribute Priorities?
NeurIPS 2021 XAI4Debugging Workshop 2021
How to Understand Your Robot: A Design Space Informed by Human Concept Learning
Serena Booth, Sanjana Sharma, Sarah Chung, Julie Shah, Elena L. Glassman
ICRA 2021 Workshop on Social Intelligence in Humans and Robots (SIHR) 2021
Sampling Prediction-Matching Examples in Neural Networks: A Probabilistic Programming Approach
AAAI 2020 Workshop on Statistical Relational Artificial Intelligence (StarAI) 2020
Modeling Blackbox Agent Behaviour via Knowledge Compilation
Christian Muise, Salomon Wollenstein Betech, Serena Booth, Julie Shah, Yasaman Khazaeni
AAAI 2020 Workshop on Plan, Activity, and Intent Recognition (PAIR) 2020