Results 291 to 300 of about 577,383 (337)

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

Neural Information Processing Systems, 2022
We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov decision process (MDP) is communicating with a finite, although unknown, diameter.
Shipra Agrawal, Randy Jia
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy