Results 51 to 54 of about 236 (54)
Some of the next articles are maybe not open access.
Neural Temporal Difference and Q Learning Provably Converge to Global Optima
Mathematics of Operations ResearchQi Cai, Zhuoran Yang, Jason D Lee
exaly
Neural Temporal Difference and Q Learning Provably Converge to Global Optima
Mathematics of Operations Research