Enough Thinking Efficient Reasoning via GRPO + SEAL + MCP
🏭 Overview
Large Reasoning Models (LRMs) frequently over-generate chain-of-thought, even for simple problems, leading to unnecessary latency and cost. In this project, we study reasoning efficiency as a first-class optimization objective.
📊 Contributions
We present a two-stage reinforcement learning framework:
Phase-1 (GRPO): induces structured reasoning behavior. Phase-2 (SEAL): internalizes recurring reasoning patterns to reduce token usage without sacrificing correctness.
📈 Impact
- Phase-2 achieves 35–45% token reduction with only minor accuracy degradation
🔗 Links
- 💻 Code: GitHub Repository
