Cross-Language Vulnerability Detection

CrossLangVuln.app - Project Details

Overview

This thesis introduces a unified, cross-language vulnerability lifecycle framework integrating detection, validation, and remediation. It uses a Universal Abstract Syntax Tree (uAST) normalization and hybrid AI reasoning to achieve language-agnostic structural analysis. The system addresses the fragmentation of traditional security tools by unifying detection, execution-based validation, and iterative repair into a single, evidence-driven pipeline.

Three-Stage Pipeline

1. Detection: Parses code into a uAST and analyzes it via a hybrid model that fuses GraphSAGE structural encodings with Qwen2.5-Coder-1.5B semantic embeddings. Achieves 89.84–92.02% intra-language accuracy.
2. Validation: An autonomous agent confirms exploitability through sandboxed execution, rejecting 58.72–62.37% of detector false positives.
3. Remediation: An iterative loop applies minimal code patches for confirmed vulnerabilities, succeeding in 81.37–87.27% of cases within 2.3–3.4 iterations.

Architecture

The model integrates GraphSAGE-based structural encoders on uAST graphs with semantic embeddings from a Qwen2.5-Coder-1.5B model. A two-way gating mechanism adaptively fuses these modalities per sample. The end-to-end pipeline resolves 69.74% of vulnerabilities with a total failure rate of 12.27%, demonstrating stable multi-stage collaboration.

Key Capabilities

• Cross-Language Generalization: Zero-shot transfer F1 scores of 74.43–78.18% across Java, Python, and C++.
• Execution-Based Validation: Confirms 66.84–71.49% of genuine vulnerabilities while rejecting over half of false positives.
• Iterative Remediation: Surgical, minimal-diff patches that preserve functionality and eliminate the confirmed vulnerability.
• Privacy-Preserving: Entire framework uses locally-deployed models (~3GB footprint) with no cloud dependencies.

Tech. Stack

Python, PyTorch, PyTorch Geometric, Transformers (Hugging Face), Tree-sitter 0.21.0, Docker, LangChain, Qwen2.5-Coder-1.5B, MLX (Apple Silicon).

Links

Publication Page GitHub Repository

← Back to Projects

CrossLangVuln.app | Super Mario Edition | Built by Jugal Gajjar Ready