Results 271 to 280 of about 233,719 (314)
Some of the next articles are maybe not open access.
2023
This chapter aims to present a theoretical foundation on hacking, focusing on the perpetrator's profile, his modus operandi, and typologies. First, a conceptualization and characterization of the phenomenon's key terms is presented. Next, the chapter addresses the historical evolution of the perception of the phenomenon, from the moment of its ...
Carolina Roque +3 more
openaire +2 more sources
This chapter aims to present a theoretical foundation on hacking, focusing on the perpetrator's profile, his modus operandi, and typologies. First, a conceptualization and characterization of the phenomenon's key terms is presented. Next, the chapter addresses the historical evolution of the perception of the phenomenon, from the moment of its ...
Carolina Roque +3 more
openaire +2 more sources
Reward Shaping to Mitigate Reward Hacking in RLHF
arXiv.orgReinforcement Learning from Human Feedback (RLHF) is essential for aligning large language models (LLMs) with human values. However, RLHF is susceptible to \emph{reward hacking}, where the agent exploits flaws in the reward function rather than learning ...
Jiayi Fu +5 more
semanticscholar +1 more source
ODIN: Disentangled Reward Mitigates Hacking in RLHF
International Conference on Machine LearningIn this work, we study the issue of reward hacking on the response length, a challenge emerging in Reinforcement Learning from Human Feedback (RLHF) on LLMs. A well-formatted, verbose but less helpful response from the LLMs can often deceive LLMs or even
Lichang Chen +8 more
semanticscholar +1 more source
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
arXiv.orgReward hacking--where agents exploit flaws in imperfect reward functions rather than performing tasks as intended--poses risks for AI alignment. Reward hacking has been observed in real training runs, with coding agents learning to overwrite or tamper ...
Mia Taylor +4 more
semanticscholar +1 more source
Nursing Standard, 1988
The new book on occupational health for nurses called Nurses At Risk written by [illegible word] Salvage, former Nursing Standard journalist and Rosemary Rogers, currently our Clinical News Editor, received widespread press coverage.
openaire +2 more sources
The new book on occupational health for nurses called Nurses At Risk written by [illegible word] Salvage, former Nursing Standard journalist and Rosemary Rogers, currently our Clinical News Editor, received widespread press coverage.
openaire +2 more sources
InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling
Neural Information Processing SystemsDespite the success of reinforcement learning from human feedback (RLHF) in aligning language models with human values, reward hacking, also termed reward overoptimization, remains a critical challenge.
Yuchun Miao +5 more
semanticscholar +1 more source
Feedback Loops With Language Models Drive In-Context Reward Hacking
International Conference on Machine LearningLanguage models influence the external world: they query APIs that read and write to web pages, generate content that shapes human behavior, and run system commands as autonomous agents.
Alexander Pan +3 more
semanticscholar +1 more source
Book review: Christopher Hadnagy, Social Engineering: The Science of Human Hacking
European Journal of Research in Applied SciencesChristopher Hadnagy, Social Engineering: The Science of Human Hacking (2nd ed.). John Wiley & Sons, Inc, 2018, 320 pp., ₹1,645.
Meghna Chukkath
semanticscholar +1 more source
Modeling Malicious Hacking Data Breach Risks
North American Actuarial Journal, 2020Malicious hacking data breaches cause millions of dollars in financial losses each year, and more companies are seeking cyber insurance coverage. The lack of suitable statistical approaches for scoring breach risks is an obstacle in the insurance ...
Hong Sun, Maochao Xu, Peng Zhao
semanticscholar +1 more source
AI-powered growth hacking: benefits, challenges and pathways
Management DecisionPurpose This paper aims to (1) unveil how artificial intelligence (AI) can be implemented in growth-hacking strategies; and (2) identify the challenges and enabling factors associated with AI’s implementation in these strategies.Design/methodology ...
Gabriele Santoro +3 more
semanticscholar +1 more source

