Results 121 to 130 of about 443,344 (379)
Learning a Pessimistic Reward Model in RLHF [PDF]
This work proposes `PET', a novel pessimistic reward fine-tuning method, to learn a pessimistic reward model robust against reward hacking in offline reinforcement learning from human feedback (RLHF). Traditional reward modeling techniques in RLHF train an imperfect reward model, on which a KL regularization plays a pivotal role in mitigating reward ...
arxiv
A Data-Driven Game Theoretic Strategy for Developers in Software Crowdsourcing: A Case Study
Crowdsourcing has the advantages of being cost-effective and saving time, which is a typical embodiment of collective wisdom and community workers’ collaborative development. However, this development paradigm of software crowdsourcing has not been
Zhifang Liao+3 more
doaj +1 more source
A review of artificial intelligence in brachytherapy
Abstract Artificial intelligence (AI) has the potential to revolutionize brachytherapy's clinical workflow. This review comprehensively examines the application of AI, focusing on machine learning and deep learning, in various aspects of brachytherapy.
Jingchu Chen+4 more
wiley +1 more source
Positive-Unlabeled Reward Learning [PDF]
Learning reward functions from data is a promising path towards achieving scalable Reinforcement Learning (RL) for robotics. However, a major challenge in training agents from learned reward models is that the agent can learn to exploit errors in the reward model to achieve high reward behaviors that do not correspond to the intended task. These reward
arxiv
Abstract Purpose The study evaluates rapid linear accelerator (Linac) single isocenter stereotactic radiosurgery (SRS) with Hyperarc for large target numbers. We compared to Gamma Knife (GK), which suffers from long treatment times and investigated causes of differences. Methods Linac SRS and GK treatment plans for patients receiving 18 Gy to the gross
Abram Abdou+4 more
wiley +1 more source
Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards [PDF]
While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima. We introduce a
arxiv
Abstract Introduction Many artificial intelligence (AI) solutions have been proposed to enhance the radiotherapy (RT) workflow, but limited applications have been implemented to date, suggesting an implementation gap. One contributing factor to this gap is a misalignment between AI systems and their users.
Luca M. Heising+11 more
wiley +1 more source
Transductive Reward Inference on Graph [PDF]
In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning. Reward inference is the key to learning effective policies in practical scenarios, while direct environmental interactions are either too ...
arxiv