iOS World Agents – Embodied AI Evaluation Framework

🔍 Overview

Built an embodied simulation framework to evaluate LLM-driven agents performing real iOS actions inside simulator environments, enabling behavioral evaluation beyond static benchmarks.

🚀 Key Contributions

Designed JSON-based task schemas covering 50+ multi-step tasks across Safari, Maps, Calendar, Files, and Settings
Orchestrated GPT-4o, Gemini-1.5, Grok-2 for controlled cross-model behavioral comparison
Implemented Reflexion + TextGrad feedback loops, improving task completion by 8–10% without fine-tuning

🔗 Links

💻 Code: GitHub Repository

Rishav Aryan

🔍 Overview

🚀 Key Contributions

🔗 Links