I am a Research Assistant at The Hong Kong Polytechnic University (PolyU). My research interests focus on model evaluation and agentic AI โ building reliable, steerable AI systems that can reason, plan, and interact with tools in real-world settings. I enjoy working at the intersection of evaluation methodology, agent architecture, and real-world deployment.
Contributed to infi-evalscope, integrating 20+ benchmarks including tau-bench, aider-polyglot, and ARC-AGI into a unified evaluation framework.