Overview
VulnGraph is a research project integrating graph neural networks (GNNs) with Large Language Models to improve vulnerability detection in Java code. The system extracts Abstract Syntax Trees (ASTs) and Control Flow Graphs (CFGs) using PROGEX, learns structural embeddings with GNNs, and fuses them with LLM-based semantic reasoning. This hybrid approach enables deeper code understanding and more accurate detection of subtle security flaws in large codebases.
Pipeline
1. Extracts ASTs and CFGs from Java source code
2. Learns graph embeddings via Graph Neural Networks
3. Generates semantic embeddings with an LLM
4. Combines both through proposed gated fusion for classification
5. Produces vulnerability label, fusion scores, saliency subgraphs, and natural language explanation
Experiment Results
• Dataset: 35.6k cleaned Java files curated from open-source repos and datasets
• Proposed Fusion Mechanism: Two-way gating method for dynamic weightage of structure and semantics
• Accuracy: 93.57% (GNN-only baseline: 85.21%, LLM-only baseline: 75.76%)
• Interpretability: Provides weighted fusion scores, saliency subgraphs, and text explanations
Tech. Stack
Python, Java, PROGEX, Graph Neural Networks (GNNs), PyTorch Geometric, LLMs, HF Transformers