Jailbroken: How Does LLM Safety Training Fail? [PDF]
Large language models trained for safety and harmlessness remain susceptible to adversarial misuse, as evidenced by the prevalence of"jailbreak"attacks on early releases of ChatGPT that elicit undesired behavior. Going beyond recognition of the issue, we
Alexander Wei +2 more
semanticscholar +1 more source
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! [PDF]
Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama models and OpenAI's APIs for fine-tuning GPT-3.5 Turbo on custom datasets also
Xiangyu Qi +6 more
semanticscholar +1 more source
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset [PDF]
In this paper, we introduce the \textsc{BeaverTails} dataset, aimed at fostering research on safety alignment in large language models (LLMs). This dataset uniquely separates annotations of helpfulness and harmlessness for question-answering pairs, thus ...
Jiaming Ji +8 more
semanticscholar +1 more source
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions [PDF]
Training large language models to follow instructions makes them perform better on a wide range of tasks and generally become more helpful. However, a perfectly helpful model will follow even the most malicious instructions and readily generate harmful ...
Federico Bianchi +6 more
semanticscholar +1 more source
Refined exposure assessment of ethyl lauroyl arginate based on revised proposed uses as a food additive [PDF]
Following a request from the European Commission, the European Food Safety Authority (EFSA) carried out a refined exposure assessment of ethyl lauroyl arginate (LAE) from its use as a food additive, for children and adults, based on revised proposed uses.
European Food Safety Authority +1 more
doaj +1 more source
2020 Safety Young Investigator Award: Announcement and Interview with the Winner
After an extensive voting period, we are proud to present the winner of the Safety Young Investigator Award: [...]
Safety Editorial Office
doaj +1 more source
Refined exposure assessment of Brown HT (E 155) [PDF]
The European Food Safety Authority (EFSA) carried out an exposure assessment of Brown HT (E 155) taking into account additional information on its use in foods as consumed. In 2010, the EFSA Panel on Food Additives and Nutrient Sources added to Food (ANS)
European Food Safety Authority +1 more
doaj +1 more source
Statement on the validity and robustness of information provided on irradiated iron oxides [PDF]
Following a Rapid Alert System for Food and Feed (RASFF) notification concerning the use of an unauthorised irradiated colouring agent (brown iron oxide) as coatings of food supplements, the European Commission asked EFSA to assess the scientific ...
European Food Safety Authority +1 more
doaj +1 more source
Refined exposure assessment for Brilliant Black BN (E 151) [PDF]
The European Food Safety Authority (EFSA) carried out an exposure assessment of Brilliant Black BN (E 151), taking into account new information on its use as a food additive in foods.
European Food Safety Authority +1 more
doaj +1 more source
Refined exposure assessment for caramel colours (E 150a, c, d) [PDF]
This EFSA statement is a refined exposure assessment of caramel colours (E 150a, E 150c and E 150d) taking into account additional information on its use in foods as consumed.
European Food Safety Authority +1 more
doaj +1 more source

