CV
Summary
Research engineer specializing in post-training and alignment of reasoning models and multimodal agents, implementing reinforcement learning loops (GRPO, PPO) and self-adaptation frameworks to optimize performance. Proven ability to design, reproduce, and scale ML research under extreme compute constraints, including distributed GPU infrastructure and robust evaluation pipelines.
Publications
• Learning to Act Anywhere: Experience-Based Similarity for Universal Interface Agents..First Author, ACL 2026 Submission training-free cross-platform UI agents using FAISS-based Elastic Visual Memory, achieving ~40% higher task success under interface perturbations with 83 ms per-step latency.
• Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs..Co-Author, NLDB 2025 , systematic evaluation of multimodal ED identifying failure modes of generative LLM/VLM approaches and conditions where supervised fusion models outperform.
Independent Research Projects
Enough Thinking - Efficient Reasoning via RL-Driven Self-Adaptation
• Frontier reasoning models (e.g., DeepSeek-R1) exhibit “rumination,” consuming excessive tokens on simple logic and increasing inference costs.
• A base 0.5B LLM achieved a GSM8K accuracy of 0.20 and required 218 tokens per query without Induction of Explicit Verification.
• Engineered a proof-of-concept system to incentivize token efficiency and permanent weight-internalization of reasoning patterns using limited compute (Google Colab).
• Implemented Group Relative Policy Optimization (GRPO) from scratch to induce emergent reflection; designed a SEAL-inspired self-edit loop utilizing LoRA and ReSTEM RL to “cache” logic into parameters.
• Improved accuracy to 0.30 (+50%) while reducing token overhead by 48% (218 → 112 tokens); deployed via Model Context Protocol (MCP) for deterministic tool-use via API.
NetGuard: Autonomous Multi-Agent Framework for NIST-Compliant Threat Detection
• Real-time threat detection lacks automated coordination between analysis and NIST-compliant response.
• Manual log enrichment resulted in high “time-to-remediate” latency and inconsistent classification.
• Led the design of a serverless multi-agent system (Ingestor, Analyzer, Aggregator) for automated traffic analysis.
• Built modular agents using SecureGPT and AWS Lambda; implemented reflexion feedback loops for system self-improvement via API calls.
• Achieved 94.5% accuracy across 6,000+ data points, demonstrating expertise in “model coordination”. Experience
George Mason University Jul 2025 - Present
Research Assistant — Machine Learning & Reinforcement Learning
• Financial time-series data is highly non-stationary; standard RL policies often collapse or “overfit” during sudden market regime shifts.
• Static PPO-based allocation models failed to maintain alpha when market volatility spiked. Baseline performance was limited to a Sharpe ratio of ~0.85 – 0.95 and a Sortino ratio of ~1.10, with an alpha near +0.05 that vanished during regime changes.
• Engineered a regime-aware RL system capable of autonomously distinguishing between market states and deployed as a scalable distributed system to ensure steerable and stable policy learning.
• Developed latent state representations using VAE + GMM and implemented hierarchical PPO-based policies with realistic risk constraints, leveraging probability and statistics to optimize performance.
• Achieved a consistent Sharpe ratio of 1.23 and positive alpha of +0.23, satisfying the engineering rigor required for regulated foundation models for deployment in products.
Wall Street Quants Jul 2024 - Oct 2024
Quantitative Research Intern
• Cryptocurrency markets exhibit extreme non-stationarity and “regime blindness,” where standard momentum models frequently collapse or suffer from excessive drawdown during sudden market reversals.
• Baseline trading models were achieving 18% annualized returns but lacked the volatility filters and calibration necessary to survive high-variance periods across the top-10 crypto assets.
• Engineered and calibrated a suite of adaptive momentum and reversal strategies designed to balance high-yield returns with rigorous, real-time risk management.
• Developed and backtested strategies using Python and software engineering best practices, implementing RSI and moving average calibrations across thousands of simulated scenarios to identify the most stable parameters.
• Developed and backtested algorithmic trading strategies, applying statistics and calculus to boost annualized returns from 18% to 25% (+38% improvement) and improve the Sharpe ratio by 15%.
Foxmula May 2022 - Jul 2022
Machine Learning Intern
• HR analytics at scale often lack predictive depth, leading to reactive decision-making and significant “human-in-the-loop” latency when identifying employee dissatisfaction or promotion eligibility.
• The department relied on manual data triage, costing 10+ hours weekly in analysis and failing to provide a structured pipeline for proactive strategic initiatives.
• Developed and deployed an automated, end-to-end ML pipeline to identify key drivers of dissatisfaction and accurately predict promotion pipelines with high interpretability.
• Implemented ensemble models (XGBoost, Stacking) and engineered specialized tenure and skill-gap features; automated the entire deployment via AWS and TensorFlow.
• Achieved 90% precision (+15% improvement in accuracy) and saved 10+ hours weekly, transforming a manual bottleneck into an automated, data-driven system.
Technologies
• Programming Languages: Python, SQL
• ML & Frontier Reasoning: Transformers, Preference Optimization (RLHF/RLAIF), Reinforcement Learning , LoRA, Natural Language Processing , GMMs, VAEs
• Agentic & Multimodal Systems: Vision-Language Models(VLM), Retrieval-Augmented Generation (RAG), Model Context Protocol (MCP)
• Infrastructure & Scale: vLLM, PyTorch, FAISS, FastAPI, Docker, AWS (Lambda, EC2, S3), Distributed Systems, TensorFlow
• Research Practice: Problem Framing, End-to-End ML Pipelines, Ablation Studies, Robustness Evaluation
• Core Competencies: Software Engineering, Linear Algebra, Calculus, Probability, Statistics
Education
George Mason University — MS, Data Analytics and Engineering (Aug 2023 – May 2025)
Vellore Institute of Technology — B.Tech, ECE (Jun 2019 – Jul 2023)
Achievements
• Technical Peer Reviewer | Expert Systems with Applications (Elsevier):(Invited to perform rigorous technical peer reviews for 3+ manuscripts in a top-tier Q1 journal (Impact Factor: 8.5).)
• Kaggle AIMO-3 Progress Prize Participant:(Developed a Hybrid Reasoning-Execution system with SEAL-inspired self-adaptation for Olympiad-level mathematics.)
• International Keynote Speaker | Parwati Science College (India):(Invited by the Organizing Committee to deliver a technical keynote at the UGC-Sponsored International Seminar 2026. on “ARTIFICIAL INTELLIGENCE AND HUMAN CIVILIZATION: NEW POSSIBILITIES AT THE INTERSECTION OF SCIENCE, SOCIETY AND HUMANITIES”
