By Varun Nikhil G. Chopra
As security findings grow in volume and complexity, manual triage becomes the dominant bottleneck in incident response and detection engineering. Large language models offer promising capabilities for automated reasoning but in practice are constrained by data noise, limited context windows, and uncontrolled operational cost.
This paper presents an architecture designed for production environments that automates vulnerability triage by combining stateful agentic reasoning using LangGraph and OpenAI GPT-4o with high signal data retrieval from AWS DynamoDB. By applying filtering at the attribute level within the data layer, implementing recursive pagination to overcome DynamoDB scan limits, and enforcing cost governance through automated IAM circuit breakers, the system enables reliable and scalable research workflows. The result is an autonomous research agent capable of prioritizing high impact vulnerabilities while remaining observable, deterministic, and financially bounded.
Security researchers increasingly face a signal-to-noise crisis. Public vulnerability sources such as NVD and CISA KEV expose massive JSON payloads that were designed for human consumption and archival completeness, not automated reasoning.
Naively passing this data to an LLM introduces several failure modes:
Effective autonomous triage requires treating data selection, context construction, and cost control as first-class engineering problems rather than prompt-tuning exercises.
For the agent's core reasoning engine, I selected OpenAI GPT-4o. In an agentic workflow, the model's ability to consistently generate valid JSON for tool calling is more critical than raw creative output. GPT-4o was chosen for three specific architectural advantages:
I utilized the ChatOpenAI class from the LangChain community package, configuring the temperature to 0 to ensure deterministic, reproducible security analysis. Additionally, I enabled streaming to surface the agent's internal reasoning tokens in real time, providing critical observability and reducing the time to first insight during long running research tasks.
Python
llm = ChatOpenAI(
model="gpt-4o",
temperature=0,
streaming=True
)
LangGraph was selected over linear agent chains to support explicit state management, branching, and backtracking. This allows the agent to verify intermediate conclusions and re-query data when confidence is insufficient, a requirement for security-sensitive research.
DynamoDB Projection Expressions were used to remove non-essential fields at the storage layer. By retrieving only product identifiers, severity indicators, and CVE metadata, payload size was reduced by approximately 60 percent before reaching the LLM.
DynamoDB's native scan limit introduces blind spots when applied to security datasets. A recursive pagination strategy was implemented to ensure complete ingestion of high-severity findings, guaranteeing comprehensive coverage.
Autonomous systems require enforceable guardrails. AWS Budget Actions were used to apply automated IAM policy revocation if a fixed daily budget threshold was exceeded, creating a hard circuit breaker that physically prevents runaway execution.
The system is implemented as a modular, tool-augmented architecture. Rather than relying on a monolithic prompt, data retrieval, state management, and cost governance are intentionally separated into independent layers.
DynamoDB scan operations are limited to 1MB per request, which can exclude critical records in large
partitions. The system detects the LastEvaluatedKey and recursively continues scanning until
the dataset is fully traversed, ensuring no high-severity vulnerabilities are missed.
The primary challenge in autonomous triage is the signal to noise ratio. Security databases often contain extensive metadata that is useful for storage but irrelevant for high level reasoning. To address this, I implemented attribute filtering using projection expressions. By explicitly selecting only high signal attributes such as product name, severity, vulnerability identifiers, and risk summaries, I successfully stripped out approximately sixty percent of the nonessential noise from each record.
This optimization serves a critical architectural purpose by mitigating the lost in the middle phenomenon documented in recent language model research. Studies have demonstrated that model performance often follows a curve where accuracy degrades when critical information is buried in the middle of a long context window. By aggressively pruning the data at the retrieval layer, I ensured that the most relevant security findings remain within the high attention primacy and recency zones of the model. This strategic reduction of the input volume prevents hallucination and ensures that the agent reasoning remains grounded in the most impactful data points.
AWS Budget Actions monitor spend in near real time. When the defined threshold is reached, an automated
Lambda applies an AWSDenyAll policy to the agent's IAM role, immediately severing access and
preventing further execution. This mechanism provides deterministic cost containment without relying on
application-level safeguards.
A Streamlit-based dashboard was developed to expose the agent's internal reasoning steps, tool invocations, and decision boundaries. By making the reasoning trace observable, the system enables rapid human validation and iterative improvement without relying on opaque LLM outputs.
Figure 1: Streamlit dashboard interface showing the initial agent state and query input
Figure 2: Complete reasoning trace showing tool invocations, decision boundaries, and final vulnerability analysis
The following diagram illustrates the system architecture, showing the interaction between the user interface, agentic core, and governance layer:
📘 Deep Dive Available
For a comprehensive breakdown of the AWS infrastructure, IAM policies, and cloud architecture patterns, view the detailed technical implementation guide.
View AWS Architecture Guide →This architecture demonstrates that autonomous security agents must do more than generate text. By treating the LLM as an orchestrator, the database as a high-signal tool, and governance as enforceable infrastructure, the system enables scalable, observable, and financially bounded vulnerability research.
The result is an agent that does not merely chat about findings, but actively performs research in a way that is compatible with production security environments.