Offensive Security with Large Language Models (2)
Introduction
In the ever-evolving world of offensive security, large language models (LLMs) are breaking new ground. In the first part of our Offensive Security with LLM series, we explored how LLMs are revolutionizing fuzzing techniques, offering automated solutions for vulnerability detection. Now, in this second installment, we’re diving deeper into how LLMs are changing the game for static analysis — especially when source code is available. Additionally, we’ll uncover how web services integrated with LLMs might open up new avenues for potential vulnerabilities.
The first area we’ll explore is how LLMs can overcome the inefficiencies of traditional static analysis tools, elevating the security research process. At Theori, we’ve been pushing the boundaries, researching how LLMs can boost the productivity of security researchers, and we’re excited to share the results from a cutting-edge study on this topic. Let’s dive in.
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
When it comes to discovering security vulnerabilities, two main approaches are commonly used: Static Application Security Testing (SAST) and Dynamic Application Security Testing (DAST). SAST, or white-box testing, involves analyzing the source code without executing the application. In contrast, DAST, known as black-box testing, identifies vulnerabilities while the application is running, without access to the source code. While well-known static analysis tools such as CodeQL and others provide valuable insights, they also face several inherent limitations:
Missing taint specifications of third-party library APIs: To perform static analysis with CodeQL, researchers must first manually analyze the APIs related to source, sink, and sanitize used in the third-party library. Missing or incomplete specifications for these libraries can result in inaccurate findings. Additionally, as these third-party libraries are updated, analysts must re-examine the code and apply changes to CodeQL, making this process resource-intensive and error-prone.
Lack of Precise Context-Sensitive and Intuitive Reasoning: Static analysis tools often struggle to understand the broader context of the code, leading to the incorrect identification of sources or sinks.
Due to the limitations in context analysis and reasoning, static analysis tools may incorrectly identify sources or sinks, leading to invalid results. In some cases, vulnerabilities that do not actually pose a risk may appear in the analysis. Additionally, true vulnerabilities could go unnoticed, resulting in false negatives, as the tool fails to fully grasp the context of the code.
For example, in the case of CVE-2021–41269, this vulnerability occurs when an attacker provides an invalid expression value, causing the parse() function to throw an IllegalArgumentException and execute statement 6. However, since the attacker’s input is passed as an argument to the buildConstraintViolationWithTemplate() function, the attacker can achieve code execution through Java Expression Language (Java EL).

To detect this vulnerability using static analysis, one must analyze a significant portion of the code, identifying the control flow and the source/sink between the internally developed code and the third-party library code. Additionally, understanding the context in which exceptions are triggered is necessary, including checking how inputs are sanitized. (For more detailed information, you can refer to this link which explains this in detail.)
The study highlights the limitations of existing static analysis tools for such processes and proposes addressing these issues through LLMs.
Instead of requiring manual analysis of third-party libraries, LLM labels the control flow, source, and sink between internally developed code and third-party library code.
The results of this process are then converted into a format suitable for input into static analysis tools like CodeQL.
After feeding the transformed values into CodeQL, the LLM cross-verifies the output, reducing false positives and delivering the final results to the user.
This series of steps improves the accuracy and efficiency of static analysis. Particularly, the automated code understanding capabilities of LLM are expected to significantly reduce the time required to identify vulnerabilities in complex third-party libraries.
Moving forward, we’ll shift focus to injection vulnerabilities that can emerge in web applications integrated with LLMs. With the rise of generative AI, web services incorporating LLMs — such as chatbots — are becoming increasingly common. These services offer substantial benefits, like improving user accessibility, by interpreting natural language queries and delivering responses based on internal data. However, with the growing prevalence of such services, it’s crucial to address the unique security vulnerabilities that arise when integrating LLMs into web applications. Special precautions must be taken to ensure these systems remain secure and resistant to potential attacks.
From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?
Simplifying the integration between LLMs and web applications has become easier with middleware solutions like Langchain, which allow developers to incorporate various LLM modules seamlessly. The diagram below provides a visual representation of this integration process.

However, if user inputs are not properly validated, malicious inputs can lead to SQL Injection attacks via the chatbot. The following example demonstrates how an attacker can bypass the mitigation applied to a prompt and instruct the LLM to execute an SQL query that deletes a specific table.

In this scenario, the attacker delivers a malicious query through the chatbot interface, tricking the LLM into generating a dangerous SQL command. Without proper validation, these malicious queries could lead to unauthorized actions on the database, posing a significant security risk.
These vulnerabilities, like SQL injections via LLMs, have been discussed extensively in Langchain’s forums. To mitigate such risks, experts suggest intercepting SQL queries before execution, setting up block lists for certain actions, and restricting database permissions to limit the potential damage from harmful queries. Ensuring robust input validation and database safeguards is essential for protecting LLM-integrated web applications from such attacks.
The paper suggests the following defense techniques:
SQL Query Rewriting
The previously mentioned issues lack adequate defenses against attackers trying to retrieve other users’ data through malicious input. A suitable mitigation involves intercepting and rewriting SQL queries when such attempts are detected, preventing unauthorized data access.
Preloading Data into the LLM Prompt
This approach entails preloading user-specific data into the LLM prompt. While effective in some cases, it can introduce complications around performance, cost, and technical limitations.
Auxiliary LLM Guard
An additional safeguard is implementing an auxiliary LLM guard. This system intercepts database queries generated by the LLM and evaluates whether they are malicious. If the query is flagged as harmful, it is blocked before it can be executed.
These methods highlight the potential risks of SQL injection attacks through prompt-generated queries. If you’re developing or auditing services that involve LLMs, it’s critical to monitor how the LLM interacts with your database. Identifying potential attack vectors early and implementing robust security measures will help mitigate these threats.
Conclusion
In this article, we explored how LLMs enhance fuzzing and static code analysis to detect vulnerabilities, while also shedding light on the potential risks in LLM-integrated web services. At Xint, we remain at the forefront of security research, continuously investigating new attack methodologies. Our goal is to empower clients to “Innovate without Fear,” knowing their security is in good hands.