Elastic-Depth Pretraining (EDP)

🧪 Overview

Elastic-Depth Pretraining (EDP) is a framework that integrates adaptive depth allocation directly into auto-regressive transformer pretraining to address the inefficiency of uniform computational depth for all tokens.

🔬 Methodology

EDP dynamically allocates transformer depth per token using a second-order residual signal (acceleration). Easy tokens skip layers, hard tokens use full depth.

📊 Results

Result: 42% compute savings with comparable perplexity.

🔗 Links

💻 Code: GitHub Repository

Rishav Aryan

🧪 Overview

🔬 Methodology

📊 Results

🔗 Links