# Card Object Detection Reliability Improvement Plan This document outlines the strategy to improve the reliability of the card detection pipeline, moving from a heuristic-based approach to a robust computer vision pipeline. ## Current Limitations - **Localization**: Relies on brightness thresholds, which are highly sensitive to lighting conditions and shadows. - **Geometry**: Crops raw bounding boxes without correcting for perspective (table angle), forcing the ML models to handle distortion. - **Stability**: Live detection is susceptible to frame-by-frame jitter (flickering). ## Proposed Improvements ### 1. Robust Localization (The "Where" Problem) Transition from brightness-based search to shape and edge detection: - **Edge-based Detection**: Implement **Canny Edge Detection** and **Contour Approximation** to identify rectangular shapes regardless of absolute brightness. - **Color Space Shift**: Move from RGB to **HSV (Hue, Saturation, Value)** or **LAB** color spaces to decouple lighting (Value/Lightness) from color information. - **End-to-End Detection (Long-term)**: Evaluate lightweight object detection models (e.g., **YOLOv8-nano** or **SSD MobileNet**) to replace manual region finding. ### 2. Perspective Correction (The "Geometry" Problem) Eliminate image skew to provide standardized input to classifiers: - **Four-Point Transform (Warping)**: Identify the four corners of the detected card contour and apply a **Perspective Transform (Homography)** to "flatten" the card into a normalized top-down rectangle. - **Standardized Input**: Ensure the ML models always receive a centered, non-distorted crop, reducing the reliance on massive geometric data augmentation. ### 3. Enhanced Classification (The "What" Problem) Improve the precision of identity recognition: - **Unified Multi-Head Model**: Combine Suit and Value models into a single network with two output heads to reduce latency and exploit shared features. - **Advanced Data Augmentation**: Expand the training set with: - **Motion Blur**: Simulating handheld camera movement. - **Perspective Distortions**: To handle imperfect warping. - **Lighting Variations**: Simulating varied environmental lighting. - **Confidence Calibration**: Implement a minimum confidence threshold to avoid false positives in noisy environments. ### 4. Temporal Stability (The "Flicker" Problem) Prevent identity jumping in live mode: - **Object Tracking**: Implement a **Centroid Tracker** or **Kalman Filter** to maintain card identity across frames instead of detecting from scratch every time. - **Temporal Smoothing**: Use a "Voting" mechanism where a card's identity is only confirmed if the model is consistent over a sliding window of 5-10 frames. ## Implementation Roadmap | Phase | Focus | Key Change | Expected Impact | | :--- | :--- | :--- | :--- | | **Phase 1** | **Stability** | Edge detection + Temporal smoothing | Reduced flickering and lighting sensitivity. | | **Phase 2** | **Geometry** | Perspective Warping (Flattening) | Significant boost in classification accuracy. | | **Phase 3** | **Intelligence** | Unified Model + Expanded Dataset | Higher precision and lower inference latency. | | **Phase 4** | **Architecture** | Full Object Detection Model (YOLO) | Industry-standard reliability and speed. |