Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences.
Erdem Bıyık, Dylan P. Losey, Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, Dorsa Sadigh
Keywords: Preferences
Abstract:
We implement our algorithm on two high degree-of-freedom robots, PR2 and Baxter, and present three intuitive mechanisms for providing such incremental feedback. In our experimental evaluation we consider two context rich settings -- household chores and grocery store checkout -- and show that users are able to train the robot with just a few feedbacks (taking only a few minutes).