Motivated by operations research applications, such as inventory control and real-time bidding, we consider undiscounted reinforcement learning in Markov decision processes under model uncertainty …
We study a general problem of allocating limited resources to heterogeneous customers over time under model uncertainty. Each type of customer can be serviced using different actions, each of which…
We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of nonstationary stochastic bandit settings. These settings capture applicat…