Conference Papers
Bridging Semantics & Structure for Software Vulnerability Detection using Hybrid Network Models
Authors: Jugal Gajjar, Kaustik Ranaware, Kamalasankari Subramaniakuppusamy
Venue: 14th International Conference on Complex Networks and Their Applications (COMPLEX NETWORKS 2025)
Abstract: Software vulnerabilities remain a persistent risk, yet static and dynamic analyses often overlook structural dependencies that shape insecure behaviors. Viewing programs as heterogeneous graphs, we capture control- and data-flow relations as complex interaction networks. Our hybrid framework combines these graph representations with light-weight (<4B) local LLMs, uniting topological features with semantic reasoning while avoiding the cost and privacy concerns of large cloud models. Evaluated on Java vulnerability detection (binary classification), our method achieves 93.57% accuracy-an 8.36% gain over Graph Attention Network-based embeddings and 17.81% over pretrained LLM baselines such as Qwen2.5 Coder 3B. Beyond accuracy, the approach extracts salient subgraphs and generates natural language explanations, improving interpretability for developers. These results pave the way for scalable, explainable, and locally deployable tools that can shift vulnerability analysis from purely syntactic checks to deeper structural and semantic insights, facilitating broader adoption in real-world secure software development.
Publication PageSecureFixAgent: A Hybrid LLM Agent for Automated Python Static Vulnerability Repair
Authors: Jugal Gajjar, Kamalasankari Subramaniakuppusamy, Relsy Puthal, Kaustik Ranaware
Venue: 2025 International Conference on Machine Learning and Applications (ICMLA)
Abstract: Modern software development pipelines face growing challenges in securing large codebases with extensive dependencies. Static analysis tools like Bandit are effective at vulnerability detection but suffer from high false positives and lack repair capabilities. Large Language Models (LLMs), in contrast, can suggest fixes but often hallucinate changes and lack self-validation. We present SecureFixAgent, a hybrid repair framework integrating Bandit with lightweight local LLMs (<8B parameters) in an iterative detect-repair-validate loop. To improve precision, we apply parameter-efficient LoRA-based fine-tuning on a diverse, curated dataset spanning multiple Python project domains, mitigating dataset bias and reducing unnecessary edits. SecureFixAgent uses Bandit for detection, the LLM for candidate fixes with explanations, and Bandit re-validation for verification, all executed locally to preserve privacy and reduce cloud reliance. Experiments show SecureFixAgent reduces false positives by 10.8% over static analysis, improves fix accuracy by 13.51%, and lowers false positives by 5.46% compared to pre-trained LLMs, typically converging within three iterations. Beyond metrics, developer studies rate explanation quality 4.5/5, highlighting its value for human trust and adoption. By combining verifiable security improvements with transparent rationale in a resource-efficient local framework, SecureFixAgent advances trustworthy, automated vulnerability remediation for modern pipelines.
Publication PageMalCodeAI: Autonomous Vulnerability Detection and Remediation via Language Agnostic Code Reasoning
Authors: Jugal Gajjar, Kamalasankari Subramaniakuppusamy, Noha El Kachach
Venue: IEEE 26th International Conference on Information Reuse and Integration (IRI 2025)
Abstract: The growing complexity of cyber threats and the limitations of traditional vulnerability detection tools necessitate novel approaches for securing software systems. We introduce MalCodeAI, a language-agnostic, multi-stage AI pipeline for autonomous code security analysis and remediation. MalCodeAI combines code decomposition and semantic reasoning using fine-tuned Qwen2.5-Coder-3B-Instruct models, optimized through Low-Rank Adaptation (LoRA) within the MLX framework, and delivers scalable, accurate results across 14 programming languages. In Phase 1, the model achieved a validation loss as low as 0.397 for functional decomposition and summarization of code segments after 200 iterations, 6 trainable layers, and a learning rate of 2 x 10-5. In Phase 2, for vulnerability detection and remediation, it achieved a best validation loss of 0.199 using the same number of iterations and trainable layers but with an increased learning rate of 4 x 10-5, effectively identifying security flaws and suggesting actionable fixes. MalCodeAI supports red-hat-style exploit tracing, CVSS-based risk scoring, and zero-shot generalization to detect complex, zero-day vulnerabilities. In a qualitative evaluation involving 15 developers, the system received high scores in usefulness (mean 8.06/10), interpretability (mean 7.40/10), and readability of outputs (mean 7.53/10), confirming its practical value in real-world development workflows. This work marks a significant advancement toward intelligent, explainable, and developer-centric software security solutions.
Publication PagePreprints
MLCPD: A Unified Multi-Language Code Parsing Dataset with Universal AST Schema
Authors: Jugal Gajjar, Kamalasankari Subramaniakuppusamy
Platform: arXiv (cs.SE), 2025
Abstract: We introduce the MultiLang Code Parser Dataset (MLCPD), a large-scale, language-agnostic dataset unifying syntactic and structural representations of code across ten major programming languages. MLCPD contains over seven million parsed source files normalized under our proposed universal Abstract Syntax Tree (AST) schema, enabling consistent cross-language reasoning, structural learning, and multilingual software analysis. Unlike existing corpora that focus purely on token-level code or isolated parsers, MLCPD provides both hierarchical tree representations and rich metadata for every file, ensuring lossless syntactic coverage and structural uniformity. Each entry includes a normalized schema, language-level metadata, and abstracted node semantics stored in Parquet format for scalable retrieval. Empirical analyses reveal strong cross-language structural regularitiesādemonstrating that syntactic graphs from languages as diverse as Python, Java, and Go can be aligned under a shared schema. We release the dataset publicly on Hugging Face and the accompanying codebase on GitHub, which includes complete pipelines for dataset reproduction, grammar compilation, and a visualization tool for exploring the unified AST across languages. Together, these resources establish MLCPD as an open, reproducible foundation for future research in cross-language representation learning and program analysis.
Publication PageMultimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models
Authors: Jugal Gajjar, Kaustik Ranaware
Platform: arXiv (cs.CL), 2025
Abstract: This project performs multimodal sentiment analysis using the CMU-MOSEI dataset, using transformer-based models with early fusion to integrate text, audio, and visual modalities. We employ BERT-based encoders for each modality, extracting embeddings that are concatenated before classification. The model achieves strong performance, with 97.87% 7-class accuracy and a 0.9682 F1-score on the test set, demonstrating the effectiveness of early fusion in capturing cross-modal interactions. The training utilized Adam optimization (lr=1e-4), dropout (0.3), and early stopping to ensure generalization and robustness. Results highlight the superiority of transformer architectures in modeling multimodal sentiment, with a low MAE (0.1060) indicating precise sentiment intensity prediction. Future work may compare fusion strategies or enhance interpretability. This approach utilizes multimodal learning by effectively combining linguistic, acoustic, and visual cues for sentiment analysis.
Publication PageJournal Publications
Building Trust: The Sentient AI Framework for Emotionally Intelligent AI
Authors: Jugal Gajjar, Sanjana Nathani
Journal: International Journal of Creative Research Thoughts (IJCRT), 2024
Abstract: Artificial Intelligence (AI) is the driving force behind most of the applications we use in our daily lives, and this necessitates advancements in human-AI interaction to go beyond basic functionalities. Through this article, we want to propose a novel methodology, the Sentient AI Framework (SAIF), that prioritizes the integration of emotional intelligence in AI systems, which makes them work in the better interest of humans by interpreting and responding while keeping human emotions into consideration. By integrating SAIF with intelligent systems like chatbots, virtual assistants, and robots, user interactions can be made more natural and engaging. This article discusses how SAIF can be developed and deployed to cultivate a sense of connection based on emotions such as trust and empathy, paving the way for the future where AI can be easily integrated into society and policy-making processes. Moreover, it discusses the key ethical aspects such as privacy, bias, explainability, and transparency to be considered. By developing such emotionally intelligent systems in collaboration with scholars, ethicists, and policymakers, we can ensure the ethical development and utilization of SAIF, thereby enhancing well-being and cultivating a better society.
Publication Page