VulnGraph

Graph-Augmented LLMs for Explainable Software Vulnerability Detection

VulnGraph Architecture Screenshot

Overview

VulnGraph is a research project integrating graph neural networks (GNNs) with Large Language Models to improve vulnerability detection in Java code. The system extracts Abstract Syntax Trees (ASTs) and Control Flow Graphs (CFGs) using PROGEX, learns structural embeddings with GNNs, and fuses them with LLM-based semantic reasoning. This hybrid approach enables deeper code understanding and more accurate detection of subtle security flaws in large codebases.

Pipeline

1. Extracts ASTs and CFGs from Java source code
2. Learns graph embeddings via Graph Neural Networks
3. Generates semantic embeddings with an LLM
4. Combines both through proposed gated fusion for classification
5. Produces vulnerability label, fusion scores, saliency subgraphs, and natural language explanation

Experiment Results

• Dataset: 35.6k cleaned Java files curated from open-source repos and datasets
• Proposed Fusion Mechanism: Two-way gating method for dynamic weightage of structure and semantics
• Accuracy: 93.57% (GNN-only baseline: 85.21%, LLM-only baseline: 75.76%)
• Interpretability: Provides weighted fusion scores, saliency subgraphs, and text explanations

Tech. Stack

Python, Java, PROGEX, Graph Neural Networks (GNNs), PyTorch Geometric, LLMs, HF Transformers

Links

GitHub Repository Publication Page

← Back to Projects