Cross-Language Vulnerability Detection & Repair Thesis Publication

Abstract

Software vulnerabilities remain critical threats to modern computing systems, yet existing detection tools operate in isolation, producing high false-positive rates and lacking exploitability confirmation. This thesis presents a unified cross-language vulnerability lifecycle framework integrating detection, validation, and remediation through Universal Abstract Syntax Tree (uAST) normalization and hybrid AI reasoning. The framework achieves 89.84–92.02% detection accuracy through structural-semantic fusion, with cross-language transfer F1 scores of 74.43–78.18% demonstrating meaningful zero-shot generalization across Java, Python, and C++. Execution-based validation confirms 66.84–71.49% of genuine vulnerabilities while rejecting 58.72–62.37% of detector false positives. Iterative repair 81.37–87.27% of cases, converging in 2.3–3.4 iterations, with end-to-end pipeline integration resolving 69.74% of vulnerabilities at 12.27% total failure rate. Ablation studies validate architectural necessity: removing Universal AST degrades cross-language performance by 23.42%, while disabling validation increases unnecessary repairs by 131.7% and reduces end-to-end success by 9.56 percentage points. Operating with lightweight locally-deployed models (~3GB footprint), the framework processes 1,700–2,400 samples daily on consumer-grade GPUs, enabling practical CI/CD integration without cloud dependencies. This demonstrates that scalable multi-language security lifecycle management is achievable through principled integration of structural abstraction, semantic reasoning, and execution-based evidence.

Motivation

Existing vulnerability detection techniques suffer from key limitations that hinder their practical effectiveness: static analysis produces high false positives without confirming exploitability, dynamic analysis is constrained by limited coverage and environmental complexity, and modern machine learning approaches remain largely language-specific while relying on static patterns. These challenges highlight the need for a unified, cross-language framework that integrates detection with execution-based validation, ensuring that identified vulnerabilities are not only detected accurately but also verified as genuinely exploitable, thereby improving reliability, scalability, and real-world applicability in polyglot software systems.

Architecture

The proposed framework follows a three-stage vulnerability lifecycle architecture consisting of detection, validation, and remediation. In the first stage, source code is parsed into a Universal Abstract Syntax Tree (uAST) and analyzed using a hybrid model that combines graph neural networks with lightweight language models to produce a binary vulnerability prediction. The second stage performs execution-based validation, where detected vulnerabilities are tested through autonomous exploit generation and sandboxed execution to confirm real-world exploitability. Finally, the remediation stage applies iterative patch generation, re-running detection after each modification until the vulnerability is resolved or a predefined iteration budget is reached.

The architecture is guided by three key principles. First, lifecycle integration ensures that each stage feeds verified outputs to the next, eliminating fragmentation seen in traditional tools. Second, evidence-driven processing guarantees that remediation only occurs after exploit confirmation, significantly reducing false positives. Third, cross-language generalization enables a single model to operate across Java, Python, and C++ using a unified structural representation. All components are designed to run on locally deployable models (<2B parameters), ensuring scalability, efficiency, and data privacy without reliance on cloud-based systems.

Methodology

The methodology begins with constructing a Universal Abstract Syntax Tree (uAST), which normalizes language-specific syntax into a shared structural representation. Using Tree-sitter parsing, source code from multiple languages is converted into a unified schema consisting of node-level representations, semantic mappings, and hierarchical relationships. This allows the system to capture language-invariant vulnerability patterns, such as taint flows and unsafe API usage, while preserving full syntactic fidelity. The resulting uAST is then represented as a graph and processed alongside raw source code to enable both structural and semantic reasoning.

Vulnerability detection is performed through a hybrid fusion model combining a GraphSAGE-based structural encoder and a Qwen2.5-Coder semantic encoder. Structural embeddings capture control/data-flow relationships, while semantic embeddings encode contextual signals such as API usage and developer intent. These representations are fused using a two-way gating mechanism that dynamically weights each modality per sample. Detected vulnerabilities are then validated through a plan-execute-verify loop, where an autonomous agent generates exploit hypotheses, executes them in a sandboxed environment with language-specific instrumentation, and confirms exploitability based on runtime evidence. Finally, confirmed vulnerabilities undergo iterative remediation, where minimal patches are generated, applied, and re-evaluated until the vulnerability is eliminated, ensuring both correctness and functional preservation.

Results

The proposed cross-language vulnerability framework demonstrates strong performance across detection, validation, and remediation stages. In detection, the hybrid model (uAST + LLM) achieves the best results, reaching 89.84–92.02% accuracy and up to 0.9109 F1 across Java, Python, and C++. In cross-language transfer, the hybrid approach significantly outperforms structural-only and semantic-only baselines, achieving 76–80% accuracy despite zero-shot transfer, highlighting the effectiveness of universal AST normalization for learning language-invariant vulnerability patterns.

Beyond detection, the system demonstrates strong end-to-end effectiveness. Validation achieves 66–71% exploit confirmation rates and removes over 60% of false positives, improving downstream reliability. The remediation stage achieves 81–87% repair success rates with an average of 2.3–3.4 iterations, and post-repair verification confirms 90%+ vulnerability elimination. Overall, the full pipeline successfully resolves 69.74% of vulnerabilities while maintaining a low 12.27% failure rate, demonstrating robust real-world applicability.

Project & Paper Links

Project Page View Paper