4.6 KiB
4.6 KiB
Card Object Detection Reliability Improvement Plan
This document outlines the strategy to improve the reliability of the card detection pipeline, moving from a heuristic-based approach to a robust computer vision pipeline.
Current Limitations
- Localization: Relies on brightness thresholds, which are highly sensitive to lighting conditions and shadows.
- Geometry: Crops raw bounding boxes without correcting for perspective (table angle), forcing the ML models to handle distortion.
- Stability: Live detection is susceptible to frame-by-frame jitter (flickering).
Proposed Improvements
1. Robust Localization (The "Where" Problem)
Transition from brightness-based search to shape and edge detection:
- Edge-based Detection: Implement Canny Edge Detection and Contour Approximation to identify rectangular shapes regardless of absolute brightness.
- Color Space Shift: Move from RGB to HSV (Hue, Saturation, Value) or LAB color spaces to decouple lighting (Value/Lightness) from color information.
- End-to-End Detection (Long-term): Evaluate lightweight object detection models (e.g., YOLOv8-nano or SSD MobileNet) to replace manual region finding.
2. Perspective Correction (The "Geometry" Problem)
Eliminate image skew to provide standardized input to classifiers:
- Four-Point Transform (Warping): Identify the four corners of the detected card contour and apply a Perspective Transform (Homography) to "flatten" the card into a normalized top-down rectangle.
- Standardized Input: Ensure the ML models always receive a centered, non-distorted crop, reducing the reliance on massive geometric data augmentation.
3. Enhanced Classification (The "What" Problem)
Improve the precision of identity recognition:
- Unified Multi-Head Model: Combine Suit and Value models into a single network with two output heads to reduce latency and exploit shared features.
- Advanced Data Augmentation: Expand the training set with:
- Motion Blur: Simulating handheld camera movement.
- Perspective Distortions: To handle imperfect warping.
- Lighting Variations: Simulating varied environmental lighting.
- Confidence Calibration: Implement a minimum confidence threshold to avoid false positives in noisy environments.
4. Temporal Stability (The "Flicker" Problem)
Prevent identity jumping in live mode:
- Object Tracking: Implement a Centroid Tracker or Kalman Filter to maintain card identity across frames instead of detecting from scratch every time.
- Temporal Smoothing: Use a "Voting" mechanism where a card's identity is only confirmed if the model is consistent over a sliding window of 5-10 frames.
Implementation Roadmap
| Phase | Focus | Key Change | Expected Impact |
|---|---|---|---|
| Phase 1 | Stability | Edge detection + Temporal smoothing | Reduced flickering and lighting sensitivity. |
| Phase 2 | Geometry | Perspective Warping (Flattening) | Significant boost in classification accuracy. |
| Phase 3 | Intelligence | Unified Model + Expanded Dataset | Higher precision and lower inference latency. |
| Phase 4 | Architecture | Full Object Detection Model (YOLO) | Industry-standard reliability and speed. |
Evaluation & Validation
To measure the impact of these improvements, the following metrics will be tracked:
- Precision & Recall: Measure the accuracy of card identity (Suit + Value) across diverse lighting environments.
- Latency: Track the time from frame capture to identity assignment to ensure real-time performance (<100ms).
- Stability Score: Percentage of frames where a card's identity remains constant while stationary.
- False Positive Rate: Frequency of "ghost" cards detected in empty table areas.
Technical Infrastructure
Implementation will leverage the following tools:
- OpenCV.js: For Canny Edge Detection, Contour Approximation, and Perspective Transforms (Homography).
- TensorFlow.js: For the classification heads and potential YOLO implementation.
- Synthetic Dataset Generator: A script to generate warped and blurred card images to augment the training set without manual labeling.
Testing Strategy
- Baseline Benchmarking: Create a "Golden Set" of 100 static images with known labels to test every architectural change.
- Environmental Stress Tests: Test under three specific lighting scenarios: Low-light, Direct Overhead Light (shadows), and Natural Side Light.
- Integration Testing: Verify that the Perspective Correction doesn't introduce latency that disrupts the Temporal Smoothing window.