Fine-tuning lowers safety and disrupts evaluation consistency

From National Research Council Canada

Download
  1. (PDF, 987 KiB)
Linkhttps://aclanthology.org/2025.llmsec-1.10/
AuthorSearch for: 1ORCID identifier: https://orcid.org/0000-0002-0752-6705; Search for: 1; Search for: 1ORCID identifier: https://orcid.org/0000-0001-6241-6114; Search for: 1ORCID identifier: https://orcid.org/0000-0003-2550-3918
Affiliation
  1. National Research Council Canada. Digital Technologies
FormatText, Article
ConferenceThe First Workshop on LLM Security (LLMSEC), August 1, 2025, Vienna, Austria
Abstract
Publication date
PublisherAssociation for Computational Linguistics
Licence
In
LanguageEnglish
Peer reviewedYes
Export citationExport as RIS
Report a correctionReport a correction (opens in a new tab)
Record identifier42e2a7cf-eb1e-40b2-8c1f-eda8c7c0704c
Record created2025-09-18
Record modified2025-09-19
Date modified: