Xiaojun Wang

AI Testing & AI Quality Engineering

Focused on

  • AI Testing
  • LLM Evaluation
  • AI Agent Testing
  • Intelligent System Quality Engineering

Core Focus

Research areas and engineering disciplines I work in.

AI Testing

Systematic approaches to testing AI-powered systems, from model behavior to pipeline integrity.

AI Quality Engineering

Engineering reliable quality frameworks for AI systems across the development lifecycle.

LLM Testing

Evaluating large language model outputs, reasoning quality, safety, and consistency at scale.

AI Agent Testing

Measuring agentic system performance, tool-use accuracy, and multi-step task completion.

AI Evaluation

Building comprehensive evaluation frameworks — metrics, benchmarks, and methodologies for assessing AI system quality.

AI Quality Platform

Designing and building integrated platforms for AI quality management — from test orchestration to results analysis and reporting.

Intelligent System Testing

Testing methodologies for systems that learn, adapt, and operate under uncertainty.

AI Workflow Quality

Ensuring correctness and reliability of AI-orchestrated workflows and decision pipelines.

AI Reliability

Building robust, reproducible, and trustworthy AI systems for real-world deployment.

Projects

Practice and research in AI testing and quality engineering.

AI Testing Platform

Active

A unified platform for designing, executing, and analyzing AI model tests across different providers and modalities.

PlatformTestingEvaluation

Evaluation Engine

Active

A modular evaluation framework supporting custom metrics, comparative analysis, and reproducible AI benchmarking.

EvaluationBenchmarkingFramework

Workflow Orchestration

Active

Quality assurance tooling for AI-driven workflow pipelines — testing each node, validating data flow, and monitoring drift.

WorkflowQualityOrchestration

TestAILabs Practice

Ongoing

Applied AI testing research and methodology development. Bridging the gap between AI testing theory and production practice.

ResearchPracticeMethodology

View all projects →

Insights

Writing on AI testing, evaluation, and quality engineering.