Rémy Degenne
Adaptive Testing: Bandits find correct answers fast
Abstract
Testing is the task of finding out which of several possible actions leads to the best outcome by repeatedly trying actions and observing their random effects. A company may want to find which web page A or B generates the most interaction with its clients. Clinical trials try to determine which drug quantity has the best efficiency-toxicity trade-off.
In the sequential testing framework, an agent repeatedly selects one of the actions and observes a random outcome. The agent wants to find the action with the best mean outcome as quickly as possible and with high certainty. A simple strategy is to try each action in turn until enough information is gathered. Bandit algorithms instead select their future actions based on past observations: they adapt to the data as it comes. This adaptive behavior makes them stop faster.