tschau-sepp/DETECTION_IMPROVEMENT_PLAN.md

4.6 KiB

Card Object Detection Reliability Improvement Plan

This document outlines the strategy to improve the reliability of the card detection pipeline, moving from a heuristic-based approach to a robust computer vision pipeline.

Current Limitations

  • Localization: Relies on brightness thresholds, which are highly sensitive to lighting conditions and shadows.
  • Geometry: Crops raw bounding boxes without correcting for perspective (table angle), forcing the ML models to handle distortion.
  • Stability: Live detection is susceptible to frame-by-frame jitter (flickering).

Proposed Improvements

1. Robust Localization (The "Where" Problem)

Transition from brightness-based search to shape and edge detection:

  • Edge-based Detection: Implement Canny Edge Detection and Contour Approximation to identify rectangular shapes regardless of absolute brightness.
  • Color Space Shift: Move from RGB to HSV (Hue, Saturation, Value) or LAB color spaces to decouple lighting (Value/Lightness) from color information.
  • End-to-End Detection (Long-term): Evaluate lightweight object detection models (e.g., YOLOv8-nano or SSD MobileNet) to replace manual region finding.

2. Perspective Correction (The "Geometry" Problem)

Eliminate image skew to provide standardized input to classifiers:

  • Four-Point Transform (Warping): Identify the four corners of the detected card contour and apply a Perspective Transform (Homography) to "flatten" the card into a normalized top-down rectangle.
  • Standardized Input: Ensure the ML models always receive a centered, non-distorted crop, reducing the reliance on massive geometric data augmentation.

3. Enhanced Classification (The "What" Problem)

Improve the precision of identity recognition:

  • Unified Multi-Head Model: Combine Suit and Value models into a single network with two output heads to reduce latency and exploit shared features.
  • Advanced Data Augmentation: Expand the training set with:
    • Motion Blur: Simulating handheld camera movement.
    • Perspective Distortions: To handle imperfect warping.
    • Lighting Variations: Simulating varied environmental lighting.
  • Confidence Calibration: Implement a minimum confidence threshold to avoid false positives in noisy environments.

4. Temporal Stability (The "Flicker" Problem)

Prevent identity jumping in live mode:

  • Object Tracking: Implement a Centroid Tracker or Kalman Filter to maintain card identity across frames instead of detecting from scratch every time.
  • Temporal Smoothing: Use a "Voting" mechanism where a card's identity is only confirmed if the model is consistent over a sliding window of 5-10 frames.

Implementation Roadmap

Phase Focus Key Change Expected Impact
Phase 1 Stability Edge detection + Temporal smoothing Reduced flickering and lighting sensitivity.
Phase 2 Geometry Perspective Warping (Flattening) Significant boost in classification accuracy.
Phase 3 Intelligence Unified Model + Expanded Dataset Higher precision and lower inference latency.
Phase 4 Architecture Full Object Detection Model (YOLO) Industry-standard reliability and speed.

Evaluation & Validation

To measure the impact of these improvements, the following metrics will be tracked:

  • Precision & Recall: Measure the accuracy of card identity (Suit + Value) across diverse lighting environments.
  • Latency: Track the time from frame capture to identity assignment to ensure real-time performance (<100ms).
  • Stability Score: Percentage of frames where a card's identity remains constant while stationary.
  • False Positive Rate: Frequency of "ghost" cards detected in empty table areas.

Technical Infrastructure

Implementation will leverage the following tools:

  • OpenCV.js: For Canny Edge Detection, Contour Approximation, and Perspective Transforms (Homography).
  • TensorFlow.js: For the classification heads and potential YOLO implementation.
  • Synthetic Dataset Generator: A script to generate warped and blurred card images to augment the training set without manual labeling.

Testing Strategy

  • Baseline Benchmarking: Create a "Golden Set" of 100 static images with known labels to test every architectural change.
  • Environmental Stress Tests: Test under three specific lighting scenarios: Low-light, Direct Overhead Light (shadows), and Natural Side Light.
  • Integration Testing: Verify that the Perspective Correction doesn't introduce latency that disrupts the Temporal Smoothing window.