| DOI | Resolve DOI: https://doi.org/10.1109/PST65910.2025.11268866 |
|---|
| Author | Search for: Towhid, Md. Shamim1; Search for: Iqbal, Shahrear1ORCID identifier: https://orcid.org/0000-0001-7819-5715; Search for: Pinto Neto, Euclides Carlos1ORCID identifier: https://orcid.org/0000-0002-1241-6391; Search for: Shahriar, Nashid2; Search for: Buffett, Scott1; Search for: Sultana, Madeena3; Search for: Taylor, Adrian3 |
|---|
| Affiliation | - National Research Council Canada. Digital Technologies
- University of Regina
- Defence Research and Development Canada
|
|---|
| Format | Text, Article |
|---|
| Conference | 2025 22nd Annual International Conference on Privacy, Security, and Trust (PST), August 26-28, 2025, Fredericton, New Brunswick, Canada |
|---|
| Subject | reinforcement learning; large language models; knowledge graphs; cybersecurity automation; autonomous cyber defense |
|---|
| Abstract | As cyber threats continue to evolve, there is a need for autonomous cyber defense (ACD) strategies capable of fast and context-aware responses. Reinforcement learning (RL) has shown promise for automating cyber defense by exploring and learning effective countermeasures, yet it often struggles with sparse reward signals and insufficient context to handle diverse attack scenarios. Furthermore, the convergence time taken by an RL agent is often high, which makes it difficult to train the RL agent in online settings. To address these challenges, we propose a large language model (LLM)-enhanced RL method that builds and queries a knowledge graph (KG) derived from agent-environment interactions. We leverage the pre-trained knowledge of an LLM on different cybersecurity frameworks and use the LLM to analyze a part of the KG to generate appropriate actions for the RL agent. We infuse the knowledge extracted from the LLM into the RL agent’s training loop in two ways. First, the state vector of the RL agent is augmented with the most effective action and its corresponding reward, as determined from the KG. Second, the suggested action from the LLM is used as a reference policy. In addition, we introduce a regularization term in the loss function to make the RL policy close to the reference policy. To validate our approach, we develop a custom RL environment guided by the MITRE ATT&CK framework, enabling the agent to generate tailored mitigation strategies for detected cyber attacks. Experimental results show that our proposed approach significantly outperforms the baseline RL by over 75% in terms of taking better mitigation actions. |
|---|
| Date published | 2025-12-03 |
|---|
| Publisher | Institute of Electrical and Electronics Engineers |
|---|
| In | |
|---|
| Language | English |
|---|
| Peer reviewed | Yes |
|---|
| Export citation | Export as RIS |
|---|
| Report a correction | Report a correction (opens in a new tab) |
|---|
| Record identifier | 54ad03ef-8f8d-4509-9e6b-ae2b369343bd |
|---|
| Record created | 2026-04-16 |
|---|
| Record modified | 2026-05-27 |
|---|