Skip to content

Testing Guide

TeleFuser uses pytest for unit/integration testing and provides a batch regression testing framework for example pipelines.

Unit & Integration Testing

Test Structure

tests/
├── conftest.py              # Shared fixtures and pytest configuration
├── unit/                    # Unit tests by module
│   ├── core/               # Core module tests
│   ├── distributed/        # Distributed communication tests
│   ├── feature_cache/      # Feature cache tests
│   ├── kernel/             # Triton kernel tests
│   ├── models/             # Model architecture tests
│   ├── ops/                # Custom operations tests
│   ├── schedulers/         # Diffusion scheduler tests
│   ├── service/            # API service tests
│   └── utils/              # Utility function tests
└── integration/             # Integration tests

Running Tests

# Run all tests
pytest tests/

# Run specific test file
pytest tests/unit/core/test_config.py

# Run with verbose output
pytest tests/ -v

# Run tests matching a pattern
pytest tests/ -k "attention"

# Run tests in parallel (requires pytest-xdist)
pytest tests/ -n auto

Test Markers

TeleFuser defines custom markers for hardware-dependent tests:

Marker Description Usage
@pytest.mark.gpu Requires GPU Skipped if CUDA unavailable
@pytest.mark.multi_gpu Requires multiple GPUs Skipped if < 2 GPUs
@pytest.mark.slow Long-running tests Use -m "not slow" to skip
@pytest.mark.distributed Requires distributed setup Needs special environment
import pytest

@pytest.mark.gpu
def test_attention_forward():
    """Test that requires a GPU."""
    ...

@pytest.mark.multi_gpu
def test_parallel_inference():
    """Test that requires multiple GPUs."""
    ...

Common Fixtures

Defined in tests/conftest.py:

Hardware Detection

def test_with_device(device):
    """Use the appropriate device (CUDA or CPU)."""
    tensor = torch.randn(1, 3, 512, 512, device=device)

def test_gpu_count(gpu_count):
    """Check number of available GPUs."""
    assert gpu_count >= 0

Sample Data

def test_image_processing(sample_image_pil, sample_image_tensor):
    """Use sample image fixtures."""
    # sample_image_pil: 512x512 RGB PIL Image
    # sample_image_tensor: (1, 3, 512, 512) tensor

CUDA Cleanup

def test_memory_intensive(clear_cuda_cache):
    """Clear CUDA cache after test."""
    # Test code here...
    # CUDA cache automatically cleared after test

Random Seed

def test_reproducible(set_seed):
    """Set fixed random seed for reproducibility."""
    # torch.manual_seed(42) and np.random.seed(42) applied
    # Reset to random state after test

Writing Tests

GPU-Aware Tests

For tests that require GPU, check availability at module level:

import pytest
import torch

# Skip entire module if CUDA unavailable
try:
    import triton
    HAS_TRITON = True
except ImportError:
    HAS_TRITON = False
    pytest.skip("Triton not available", allow_module_level=True)

@pytest.mark.gpu
def test_triton_kernel():
    """Test Triton kernel."""
    ...

Mock Fixtures

Use provided mock fixtures for isolation:

def test_pipeline(mock_model_manager, mock_pipeline_config):
    """Test pipeline with mocked dependencies."""
    pipeline = MyPipeline(config=mock_pipeline_config)
    pipeline.model_manager = mock_model_manager

CI Integration

Tests run in CI with different configurations:

# CPU-only tests (default)
pytest tests/ -m "not gpu and not multi_gpu"

# GPU tests (requires GPU runner)
pytest tests/ -m "gpu"

# Full test suite
pytest tests/

CI Test Script

Located at scripts/run_ci_tests.sh:

#!/bin/bash
# Run full CI test suite
bash scripts/run_ci_tests.sh

Best Practices

  1. Use markers appropriately - Mark GPU-dependent tests to skip in CPU environments
  2. Clean up resources - Use clear_cuda_cache fixture for GPU tests
  3. Set seeds for reproducibility - Use set_seed fixture when randomness is involved
  4. Mock external dependencies - Use mock fixtures for model loading, API calls
  5. Keep tests isolated - Each test should be independent of others
  6. Name tests descriptively - Use test_<function>_<scenario>_<expected> pattern

Example Test

import pytest
import torch

from telefuser.ops.normalization import RMSNorm


class TestRMSNorm:
    """Test RMSNorm operation."""

    @pytest.mark.gpu
    def test_forward_cuda(self, device):
        """Test forward pass on GPU."""
        norm = RMSNorm(hidden_size=64).to(device)
        x = torch.randn(2, 10, 64, device=device)
        out = norm(x)
        assert out.shape == x.shape
        assert not torch.isnan(out).any()

    def test_forward_cpu(self):
        """Test forward pass on CPU."""
        norm = RMSNorm(hidden_size=64)
        x = torch.randn(2, 10, 64)
        out = norm(x)
        assert out.shape == x.shape

    def test_reproducibility(self, set_seed):
        """Test deterministic output."""
        norm = RMSNorm(hidden_size=64)
        x = torch.randn(2, 10, 64)
        out1 = norm(x.clone())
        out2 = norm(x.clone())
        assert torch.allclose(out1, out2)

Regression Testing

TeleFuser provides a batch regression testing framework for running example pipelines, comparing outputs against baselines, and generating reports.

Quick Start

# List all configured pipelines
python examples/run_examples.py --list

# Run a specific pipeline
python examples/run_examples.py --pipeline wan21_1_3b_t2v

# Run all enabled pipelines (sequential, default)
python examples/run_examples.py --all

# Run with real-time log output
python examples/run_examples.py --all --verbose

# Update baselines after successful runs
python examples/run_examples.py --all --update-baseline

# Parallel execution across multiple GPUs
python examples/run_examples.py --all --gpus 0,1,2,3

CLI Reference

python examples/run_examples.py [OPTIONS]

Options:
  --list                 List configured pipelines and exit
  --pipeline NAME        Run a specific pipeline by name
  --all                  Run all enabled pipelines
  --update-baseline      Update baseline outputs after successful runs
  --config PATH          Path to config YAML (default: example_config.yaml)
  --gpus GPU_IDS         GPU devices for parallel execution (e.g., '0,1,2,3')
                         Enables parallel scheduling when specified
  -v, --verbose          Show real-time log output from each pipeline

Execution Modes

Sequential Mode (Default)

Without --gpus, pipelines run sequentially using all visible GPUs:

# Uses all available GPUs, one pipeline at a time
python examples/run_examples.py --all

Parallel Mode

With --gpus, pipelines run in parallel across specified GPUs:

# 2 GPUs: run two 1-gpu pipelines simultaneously
python examples/run_examples.py --all --gpus 0,1

# 4 GPUs: run up to 4 pipelines in parallel (based on gpu_count)
python examples/run_examples.py --all --gpus 0,1,2,3

Scheduling Strategy:

  • Pipelines sorted by gpu_count descending (larger tasks first)
  • Greedy allocation: fill available GPUs optimally
  • Example with 4 GPUs:
  • 2-gpu pipeline → occupies GPUs [0,1]
  • Two 1-gpu pipelines → occupy GPUs [2] and [3]
  • Next 2-gpu pipeline → waits until [0,1] are released

Example Output:

Parallel execution with GPUs: [0, 1, 2, 3]
Pipelines to run: 5
------------------------------------------------------------
  Started: wan21_1_3b_t2v on GPUs [0, 1]
  Started: qwen_t2i on GPUs [2]
  Started: z_image_turbo_t2i on GPUs [3]
  Finished: qwen_t2i -> PASS (45.2s) PSNR=28.5, SSIM=0.92
  Started: qwen_t2i_lora on GPUs [2]
  ...

Configuration

The runner is configured via examples/example_config.yaml:

defaults:
  seed: 42
  timeout_seconds: 1800
  psnr_min: 25.0          # Minimum PSNR for video regression
  ssim_min: 0.85          # Minimum SSIM for video regression
  pixel_diff_max: 0.02    # Max pixel diff for image regression

output_root: work_dirs/example_outputs

pipelines:
  wan21_1_3b_t2v:
    script: wan_video/wan21_1_3b_text_to_video_h100.py
    gpu_count: 1
    output_type: video
    model_root: /path/to/model
    ppl_config_overrides:
      attn_impl: FLASH_ATTN_2

Environment Variables

Before running regression tests, set the TF_MODEL_ZOO_PATH environment variable to specify the model zoo root directory:

# Set model zoo path
export TF_MODEL_ZOO_PATH=/path/to/model_zoo

# Run regression tests
python examples/run_examples.py --all

This environment variable is used by example scripts to locate model files: - vae_path: TF_MODEL_ZOO_PATH/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth - model_root: TF_MODEL_ZOO_PATH/LongCat-Video (for LongCat examples) - vfi_model_path: TF_MODEL_ZOO_PATH/RIFEv4.26_0921/flownet.pkl

If not set, defaults to "model_zoo" (relative to working directory).

Pipeline Config Fields

Field Type Default Description
script str required Path to example script (relative to examples/)
enabled bool true Skip if false
gpu_count int 1 GPUs to allocate
output_type str video video or image
timeout_seconds int 1800 Max execution time
seed int 42 Random seed
model_root str|null null Override model directory
prompt str|null null Override generation prompt
input_image_path str|null null Input image for I2V/edit pipelines
input_video_path str|null null Input video for VSR/continue pipelines
ppl_config_overrides dict {} Override PPL_CONFIG keys
psnr_min float 25.0 Video: minimum PSNR vs baseline
ssim_min float 0.85 Video: minimum SSIM vs baseline
pixel_diff_max float 0.02 Image: max mean pixel difference
max_elapsed_seconds float|null null Performance threshold
max_gpu_memory_mb float|null null GPU memory threshold

Output Structure

work_dirs/example_outputs/
├── 2026-04-02/                                    # Date-based output directory
│   ├── wan_video__wan21_1_3b_t2v_1gpu_480x832.mp4
│   └── qwen_image__qwen_t2i_1gpu_1024x1024.png
├── baseline/                                      # Baseline outputs
│   └── wan_video__wan21_1_3b_t2v_1gpu_480x832.mp4
├── logs/                                          # Log files
│   ├── 20260402_120000_wan_video__wan21_1_3b_t2v_1gpu.log
│   └── 20260402_130000_qwen_image__qwen_t2i_1gpu.log
└── example_report.json                            # Summary report

Output Naming Convention

Output files:

{example_dir}__{example_name}_{gpu_count}gpu_{resolution}.{ext}

Example: wan_video__wan21_1_3b_text_to_video_h100_1gpu_480x832.mp4

Log files:

{timestamp}_{example_dir}__{example_name}_{gpu_count}gpu.log

Example: 20260402_120000_wan_video__wan21_1_3b_text_to_video_h100_1gpu.log

Regression Metrics

The runner compares outputs against baselines using:

  • Video: PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity)
  • Image: Mean pixel difference

Metrics Thresholds

Configure in YAML or per-pipeline:

psnr_min: 25.0      # Higher = stricter
ssim_min: 0.85      # Range [0, 1], higher = stricter
pixel_diff_max: 0.02 # Range [0, 1], lower = stricter

Baseline Management

  • First run: Output automatically saved as baseline
  • Subsequent runs: Compared against baseline
  • Update baseline: --update-baseline flag

Error Classification

Category Description Analysis Hint
MODEL_LOAD_ERROR Failed to load model Check model_root path and file integrity
INFERENCE_ERROR Error during inference Check traceback in log_path
OUTPUT_ERROR Failed to save output Check directory permissions and disk space
OOM_ERROR GPU out of memory Reduce batch_size or resolution
TIMEOUT Execution exceeded time limit Increase timeout_seconds or check for deadlock

Report Structure

example_report.json contains:

{
  "generated_at": "2026-04-02T12:00:00",
  "environment": {
    "pytorch_version": "2.6.0",
    "cuda_version": "12.8",
    "gpu_count": 8
  },
  "summary": {
    "total": 20,
    "pass": 18,
    "fail": 1,
    "error": 1,
    "timeout": 0
  },
  "results": { ... },
  "failed_details": [
    {
      "name": "wan21_1_3b_t2v",
      "status": "ERROR",
      "error_category": "INFERENCE_ERROR",
      "error_message": "...",
      "reproduce_command": "python examples/run_examples.py --pipeline wan21_1_3b_t2v",
      "log_path": "work_dirs/example_outputs/logs/20260402_120000_wan_video__wan21_1_3b_t2v_1gpu.log",
      "last_50_lines_log": "...",
      "analysis_hint": "Check traceback in log_path to locate the specific module"
    }
  ],
  "reproduce_all_failed": "python examples/run_examples.py --pipeline wan21_1_3b_t2v && ..."
}

Features

  • Subprocess isolation: Each pipeline runs in isolated process with pinned GPUs
  • Parallel execution: Run multiple pipelines simultaneously across GPU pool (use --gpus)
  • Intelligent scheduling: Greedy allocation prioritizes larger tasks, maximizes GPU utilization
  • Baseline management: Auto-save first run, update with flag
  • Regression metrics: PSNR/SSIM for video, pixel diff for image
  • GPU memory tracking: Peak VRAM usage per pipeline
  • Output validation: NaN/Inf detection
  • Enhanced reporting: Reproduce commands and analysis hints for failures

Adding New Pipelines

  1. Create example script in appropriate directory under examples/
  2. Add entry to example_config.yaml:
pipelines:
  my_new_pipeline:
    script: my_category/my_script.py
    gpu_count: 1
    output_type: video
    model_root: /path/to/model
  1. Run to generate baseline:
    python examples/run_examples.py --pipeline my_new_pipeline