Elastic-Depth Pretraining (EDP)
๐งช Overview
Elastic-Depth Pretraining (EDP) is a framework that integrates adaptive depth allocation directly into auto-regressive transformer pretraining to address the inefficiency of uniform computational depth for all tokens.
๐ฌ Methodology
EDP dynamically allocates transformer depth per token using a second-order residual signal (acceleration). Easy tokens skip layers, hard tokens use full depth.
๐ Results
Result: 42% compute savings with comparable perplexity.
๐ Links
- ๐ป Code: GitHub Repository
