tschau-sepp/DETECTION_IMPROVEMENT_PLAN.md
2026-05-10 18:41:16 +02:00

3.2 KiB

Card Object Detection Reliability Improvement Plan

This document outlines the strategy to improve the reliability of the card detection pipeline, moving from a heuristic-based approach to a robust computer vision pipeline.

Current Limitations

  • Localization: Relies on brightness thresholds, which are highly sensitive to lighting conditions and shadows.
  • Geometry: Crops raw bounding boxes without correcting for perspective (table angle), forcing the ML models to handle distortion.
  • Stability: Live detection is susceptible to frame-by-frame jitter (flickering).

Proposed Improvements

1. Robust Localization (The "Where" Problem)

Transition from brightness-based search to shape and edge detection:

  • Edge-based Detection: Implement Canny Edge Detection and Contour Approximation to identify rectangular shapes regardless of absolute brightness.
  • Color Space Shift: Move from RGB to HSV (Hue, Saturation, Value) or LAB color spaces to decouple lighting (Value/Lightness) from color information.
  • End-to-End Detection (Long-term): Evaluate lightweight object detection models (e.g., YOLOv8-nano or SSD MobileNet) to replace manual region finding.

2. Perspective Correction (The "Geometry" Problem)

Eliminate image skew to provide standardized input to classifiers:

  • Four-Point Transform (Warping): Identify the four corners of the detected card contour and apply a Perspective Transform (Homography) to "flatten" the card into a normalized top-down rectangle.
  • Standardized Input: Ensure the ML models always receive a centered, non-distorted crop, reducing the reliance on massive geometric data augmentation.

3. Enhanced Classification (The "What" Problem)

Improve the precision of identity recognition:

  • Unified Multi-Head Model: Combine Suit and Value models into a single network with two output heads to reduce latency and exploit shared features.
  • Advanced Data Augmentation: Expand the training set with:
    • Motion Blur: Simulating handheld camera movement.
    • Perspective Distortions: To handle imperfect warping.
    • Lighting Variations: Simulating varied environmental lighting.
  • Confidence Calibration: Implement a minimum confidence threshold to avoid false positives in noisy environments.

4. Temporal Stability (The "Flicker" Problem)

Prevent identity jumping in live mode:

  • Object Tracking: Implement a Centroid Tracker or Kalman Filter to maintain card identity across frames instead of detecting from scratch every time.
  • Temporal Smoothing: Use a "Voting" mechanism where a card's identity is only confirmed if the model is consistent over a sliding window of 5-10 frames.

Implementation Roadmap

Phase Focus Key Change Expected Impact
Phase 1 Stability Edge detection + Temporal smoothing Reduced flickering and lighting sensitivity.
Phase 2 Geometry Perspective Warping (Flattening) Significant boost in classification accuracy.
Phase 3 Intelligence Unified Model + Expanded Dataset Higher precision and lower inference latency.
Phase 4 Architecture Full Object Detection Model (YOLO) Industry-standard reliability and speed.