63 lines
4.6 KiB
Markdown
63 lines
4.6 KiB
Markdown
# Card Object Detection Reliability Improvement Plan
|
|
|
|
This document outlines the strategy to improve the reliability of the card detection pipeline, moving from a heuristic-based approach to a robust computer vision pipeline.
|
|
|
|
## Current Limitations
|
|
- **Localization**: Relies on brightness thresholds, which are highly sensitive to lighting conditions and shadows.
|
|
- **Geometry**: Crops raw bounding boxes without correcting for perspective (table angle), forcing the ML models to handle distortion.
|
|
- **Stability**: Live detection is susceptible to frame-by-frame jitter (flickering).
|
|
|
|
## Proposed Improvements
|
|
|
|
### 1. Robust Localization (The "Where" Problem)
|
|
Transition from brightness-based search to shape and edge detection:
|
|
- **Edge-based Detection**: Implement **Canny Edge Detection** and **Contour Approximation** to identify rectangular shapes regardless of absolute brightness.
|
|
- **Color Space Shift**: Move from RGB to **HSV (Hue, Saturation, Value)** or **LAB** color spaces to decouple lighting (Value/Lightness) from color information.
|
|
- **End-to-End Detection (Long-term)**: Evaluate lightweight object detection models (e.g., **YOLOv8-nano** or **SSD MobileNet**) to replace manual region finding.
|
|
|
|
### 2. Perspective Correction (The "Geometry" Problem)
|
|
Eliminate image skew to provide standardized input to classifiers:
|
|
- **Four-Point Transform (Warping)**: Identify the four corners of the detected card contour and apply a **Perspective Transform (Homography)** to "flatten" the card into a normalized top-down rectangle.
|
|
- **Standardized Input**: Ensure the ML models always receive a centered, non-distorted crop, reducing the reliance on massive geometric data augmentation.
|
|
|
|
### 3. Enhanced Classification (The "What" Problem)
|
|
Improve the precision of identity recognition:
|
|
- **Unified Multi-Head Model**: Combine Suit and Value models into a single network with two output heads to reduce latency and exploit shared features.
|
|
- **Advanced Data Augmentation**: Expand the training set with:
|
|
- **Motion Blur**: Simulating handheld camera movement.
|
|
- **Perspective Distortions**: To handle imperfect warping.
|
|
- **Lighting Variations**: Simulating varied environmental lighting.
|
|
- **Confidence Calibration**: Implement a minimum confidence threshold to avoid false positives in noisy environments.
|
|
|
|
### 4. Temporal Stability (The "Flicker" Problem)
|
|
Prevent identity jumping in live mode:
|
|
- **Object Tracking**: Implement a **Centroid Tracker** or **Kalman Filter** to maintain card identity across frames instead of detecting from scratch every time.
|
|
- **Temporal Smoothing**: Use a "Voting" mechanism where a card's identity is only confirmed if the model is consistent over a sliding window of 5-10 frames.
|
|
|
|
## Implementation Roadmap
|
|
|
|
| Phase | Focus | Key Change | Expected Impact |
|
|
| :--- | :--- | :--- | :--- |
|
|
| **Phase 1** | **Stability** | Edge detection + Temporal smoothing | Reduced flickering and lighting sensitivity. |
|
|
| **Phase 2** | **Geometry** | Perspective Warping (Flattening) | Significant boost in classification accuracy. |
|
|
| **Phase 3** | **Intelligence** | Unified Model + Expanded Dataset | Higher precision and lower inference latency. |
|
|
| **Phase 4** | **Architecture** | Full Object Detection Model (YOLO) | Industry-standard reliability and speed. |
|
|
|
|
## Evaluation & Validation
|
|
To measure the impact of these improvements, the following metrics will be tracked:
|
|
- **Precision & Recall**: Measure the accuracy of card identity (Suit + Value) across diverse lighting environments.
|
|
- **Latency**: Track the time from frame capture to identity assignment to ensure real-time performance (<100ms).
|
|
- **Stability Score**: Percentage of frames where a card's identity remains constant while stationary.
|
|
- **False Positive Rate**: Frequency of "ghost" cards detected in empty table areas.
|
|
|
|
## Technical Infrastructure
|
|
Implementation will leverage the following tools:
|
|
- **OpenCV.js**: For Canny Edge Detection, Contour Approximation, and Perspective Transforms (Homography).
|
|
- **TensorFlow.js**: For the classification heads and potential YOLO implementation.
|
|
- **Synthetic Dataset Generator**: A script to generate warped and blurred card images to augment the training set without manual labeling.
|
|
|
|
## Testing Strategy
|
|
- **Baseline Benchmarking**: Create a "Golden Set" of 100 static images with known labels to test every architectural change.
|
|
- **Environmental Stress Tests**: Test under three specific lighting scenarios: Low-light, Direct Overhead Light (shadows), and Natural Side Light.
|
|
- **Integration Testing**: Verify that the Perspective Correction doesn't introduce latency that disrupts the Temporal Smoothing window.
|
|
|