Results 291 to 300 of about 577,383 (337)
Some of the next articles are maybe not open access.
Related searches:
Related searches:
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds
Neural Information Processing Systems, 2022We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov decision process (MDP) is communicating with a finite, although unknown, diameter.
Shipra Agrawal, Randy Jia
semanticscholar +1 more source

