JUGAL800813
x22
WORLD1-1
TIME000
Home Projects Publications Experiences About Contact Resume/CV
RSAT.app - Project Details

Overview

RSAT (Structured Attribution for Tables) is a reasoning-centric training framework designed to enable small language models (1–8B) to produce faithful, step-by-step reasoning grounded in explicit cell-level citations. The core motivation is to address attribution failures in table reasoning, where models often generate correct answers but rely on spurious or hallucinated evidence. RSAT enforces structured outputs and directly optimizes for attribution quality through reinforcement learning.

Pipeline

1. Curate ~1K high-quality reasoning traces with step-wise decomposition + cell citations
2. Perform Supervised Fine-Tuning (SFT) to teach output format and reasoning structure
3. Sample multiple candidate outputs per query during RL phase
4. Score each output using a multi-component reward function
5. Optimize model using Group Relative Policy Optimization (GRPO)
6. Output structured reasoning with validated citations and final answer

Training Strategy

RSAT adopts a two-phase SFT + GRPO pipeline. Unlike PPO, GRPO removes the critic model and enables direct scoring of individual outputs, which is essential for structured reasoning tasks. For each input, the model generates a group of candidate reasoning traces, and optimization is performed using relative ranking within the group.

Reward Design

Faithfulness (NLI-based): verifies whether each reasoning step is entailed by cited cells
Citation Validity: ensures referenced table cells exist and are correctly used
Parsimony: penalizes redundant or unnecessarily long reasoning chains
• Final reward is a weighted combination enabling balanced optimization

Results

3.7× improvement in attribution faithfulness over SFT-only baseline
• Significant reduction in unsupported reasoning steps
• Strong generalization across unseen table schemas
• Maintains competitive answer accuracy while improving interpretability

Tech. Stack

Python, PyTorch, HuggingFace Transformers, TRL (GRPO), NLI Models, JSON-based structured decoding

Links

Publication Page GitHub Repository

← Back to Projects
RSAT.app | Super Mario Edition | Built by Jugal Gajjar Ready
?
MARIO X
Hey! I'm Mario, Jugal's portfolio assistant. Ask me anything about his projects, publications, experience, skills, or education!