Results 121 to 130 of about 443,344 (379)

Learning a Pessimistic Reward Model in RLHF [PDF]

open access: yesarXiv
This work proposes `PET', a novel pessimistic reward fine-tuning method, to learn a pessimistic reward model robust against reward hacking in offline reinforcement learning from human feedback (RLHF). Traditional reward modeling techniques in RLHF train an imperfect reward model, on which a KL regularization plays a pivotal role in mitigating reward ...
arxiv  

Reward is enough

open access: yesArtificial Intelligence, 2021
David Silver   +3 more
semanticscholar   +1 more source

A Data-Driven Game Theoretic Strategy for Developers in Software Crowdsourcing: A Case Study

open access: yesApplied Sciences, 2019
Crowdsourcing has the advantages of being cost-effective and saving time, which is a typical embodiment of collective wisdom and community workers’ collaborative development. However, this development paradigm of software crowdsourcing has not been
Zhifang Liao   +3 more
doaj   +1 more source

A review of artificial intelligence in brachytherapy

open access: yesJournal of Applied Clinical Medical Physics, EarlyView.
Abstract Artificial intelligence (AI) has the potential to revolutionize brachytherapy's clinical workflow. This review comprehensively examines the application of AI, focusing on machine learning and deep learning, in various aspects of brachytherapy.
Jingchu Chen   +4 more
wiley   +1 more source

Positive-Unlabeled Reward Learning [PDF]

open access: yesarXiv, 2019
Learning reward functions from data is a promising path towards achieving scalable Reinforcement Learning (RL) for robotics. However, a major challenge in training agents from learned reward models is that the agent can learn to exploit errors in the reward model to achieve high reward behaviors that do not correspond to the intended task. These reward
arxiv  

Stereotactic radiosurgery for multiple small brain metastases using gamma knife versus single‐isocenter VMAT: Normal brain dose based on lesion number and size

open access: yesJournal of Applied Clinical Medical Physics, EarlyView.
Abstract Purpose The study evaluates rapid linear accelerator (Linac) single isocenter stereotactic radiosurgery (SRS) with Hyperarc for large target numbers. We compared to Gamma Knife (GK), which suffers from long treatment times and investigated causes of differences. Methods Linac SRS and GK treatment plans for patients receiving 18 Gy to the gross
Abram Abdou   +4 more
wiley   +1 more source

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards [PDF]

open access: yesarXiv, 2019
While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima. We introduce a
arxiv  

Toward a human‐centric co‐design methodology for AI detection of differences between planned and delivered dose in radiotherapy

open access: yesJournal of Applied Clinical Medical Physics, EarlyView.
Abstract Introduction Many artificial intelligence (AI) solutions have been proposed to enhance the radiotherapy (RT) workflow, but limited applications have been implemented to date, suggesting an implementation gap. One contributing factor to this gap is a misalignment between AI systems and their users.
Luca M. Heising   +11 more
wiley   +1 more source

Transductive Reward Inference on Graph [PDF]

open access: yesarXiv
In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning. Reward inference is the key to learning effective policies in practical scenarios, while direct environmental interactions are either too ...
arxiv  

Home - About - Disclaimer - Privacy