Card Object Detection Reliability Improvement Plan

This document outlines the strategy to improve the reliability of the card detection pipeline, moving from a heuristic-based approach to a robust computer vision pipeline.

Current Limitations

Localization: Relies on brightness thresholds, which are highly sensitive to lighting conditions and shadows.
Geometry: Crops raw bounding boxes without correcting for perspective (table angle), forcing the ML models to handle distortion.
Stability: Live detection is susceptible to frame-by-frame jitter (flickering).

Proposed Improvements

1. Robust Localization (The "Where" Problem)

Transition from brightness-based search to shape and edge detection:

Edge-based Detection: Implement Canny Edge Detection and Contour Approximation to identify rectangular shapes regardless of absolute brightness.
Color Space Shift: Move from RGB to HSV (Hue, Saturation, Value) or LAB color spaces to decouple lighting (Value/Lightness) from color information.
End-to-End Detection (Long-term): Evaluate lightweight object detection models (e.g., YOLOv8-nano or SSD MobileNet) to replace manual region finding.

2. Perspective Correction (The "Geometry" Problem)

Eliminate image skew to provide standardized input to classifiers:

Four-Point Transform (Warping): Identify the four corners of the detected card contour and apply a Perspective Transform (Homography) to "flatten" the card into a normalized top-down rectangle.
Standardized Input: Ensure the ML models always receive a centered, non-distorted crop, reducing the reliance on massive geometric data augmentation.

3. Enhanced Classification (The "What" Problem)

Improve the precision of identity recognition:

Unified Multi-Head Model: Combine Suit and Value models into a single network with two output heads to reduce latency and exploit shared features.
Advanced Data Augmentation: Expand the training set with:
- Motion Blur: Simulating handheld camera movement.
- Perspective Distortions: To handle imperfect warping.
- Lighting Variations: Simulating varied environmental lighting.
Confidence Calibration: Implement a minimum confidence threshold to avoid false positives in noisy environments.

4. Temporal Stability (The "Flicker" Problem)

Prevent identity jumping in live mode:

Object Tracking: Implement a Centroid Tracker or Kalman Filter to maintain card identity across frames instead of detecting from scratch every time.
Temporal Smoothing: Use a "Voting" mechanism where a card's identity is only confirmed if the model is consistent over a sliding window of 5-10 frames.

Implementation Roadmap

Phase	Focus	Key Change	Expected Impact
Phase 1	Stability	Edge detection + Temporal smoothing	Reduced flickering and lighting sensitivity.
Phase 2	Geometry	Perspective Warping (Flattening)	Significant boost in classification accuracy.
Phase 3	Intelligence	Unified Model + Expanded Dataset	Higher precision and lower inference latency.
Phase 4	Architecture	Full Object Detection Model (YOLO)	Industry-standard reliability and speed.

Evaluation & Validation

To measure the impact of these improvements, the following metrics will be tracked:

Precision & Recall: Measure the accuracy of card identity (Suit + Value) across diverse lighting environments.
Latency: Track the time from frame capture to identity assignment to ensure real-time performance (<100ms).
Stability Score: Percentage of frames where a card's identity remains constant while stationary.
False Positive Rate: Frequency of "ghost" cards detected in empty table areas.

Technical Infrastructure

Implementation will leverage the following tools:

OpenCV.js: For Canny Edge Detection, Contour Approximation, and Perspective Transforms (Homography).
TensorFlow.js: For the classification heads and potential YOLO implementation.
Synthetic Dataset Generator: A script to generate warped and blurred card images to augment the training set without manual labeling.

Testing Strategy

Baseline Benchmarking: Create a "Golden Set" of 100 static images with known labels to test every architectural change.
Environmental Stress Tests: Test under three specific lighting scenarios: Low-light, Direct Overhead Light (shadows), and Natural Side Light.
Integration Testing: Verify that the Perspective Correction doesn't introduce latency that disrupts the Temporal Smoothing window.

4.6 KiB Raw Blame History