Xiaojun Wang
AI Testing & AI Quality Engineering
Focused on
- AI Testing
- LLM Evaluation
- AI Agent Testing
- Intelligent System Quality Engineering
Core Focus
Research areas and engineering disciplines I work in.
AI Testing
Systematic approaches to testing AI-powered systems, from model behavior to pipeline integrity.
AI Quality Engineering
Engineering reliable quality frameworks for AI systems across the development lifecycle.
LLM Testing
Evaluating large language model outputs, reasoning quality, safety, and consistency at scale.
AI Agent Testing
Measuring agentic system performance, tool-use accuracy, and multi-step task completion.
AI Evaluation
Building comprehensive evaluation frameworks — metrics, benchmarks, and methodologies for assessing AI system quality.
AI Quality Platform
Designing and building integrated platforms for AI quality management — from test orchestration to results analysis and reporting.
Intelligent System Testing
Testing methodologies for systems that learn, adapt, and operate under uncertainty.
AI Workflow Quality
Ensuring correctness and reliability of AI-orchestrated workflows and decision pipelines.
AI Reliability
Building robust, reproducible, and trustworthy AI systems for real-world deployment.
Projects
Practice and research in AI testing and quality engineering.
AI Testing Platform
ActiveA unified platform for designing, executing, and analyzing AI model tests across different providers and modalities.
Evaluation Engine
ActiveA modular evaluation framework supporting custom metrics, comparative analysis, and reproducible AI benchmarking.
Workflow Orchestration
ActiveQuality assurance tooling for AI-driven workflow pipelines — testing each node, validating data flow, and monitoring drift.
TestAILabs Practice
OngoingApplied AI testing research and methodology development. Bridging the gap between AI testing theory and production practice.
Insights
Writing on AI testing, evaluation, and quality engineering.
The State of AI Testing in 2026
A survey of current practices, tools, and challenges in testing AI-powered software systems.
ArticleWhy LLM Evaluation Is Different
Traditional software testing paradigms fall short when evaluating large language models. Here is why.
ArticleBuilding Reliable AI Agent Pipelines
Engineering patterns for testing and validating multi-step agent workflows in production.
ArticleMetrics That Matter in AI Quality
Beyond accuracy: a framework for choosing evaluation metrics that align with real-world requirements.