Breaking the Limits of Single-View Vehicle Tracking: A Multi-Camera Collaborative System Pioneers a New Era of Smart Transportation
Digital technologies and artificial intelligence are rapidly accelerating the development of smart cities. Led by Prof. Jenq-Neng Hwang of the University of Washington, in collaboration with the National Center for High-performance Computing (NCHC), the research project "Vehicle Tracking in Multi-Target Multi-Camera Tracking Systems" leverages deep learning and multi-camera collaboration technologies to overcome the limitations of traditional vehicle tracking approaches, delivering significant advancements in smart traffic management.
From an academic perspective, the research achieved international recognition by winning first place in the Multi-Modal Visual Pattern Recognition Challenge – Track 1 at The International Conference on Pattern Recognition (ICPR), one of the world's leading conferences in pattern recognition and computer vision. Building upon this technology, the jointly developed "Digital City – Intelligent Traffic Congestion Early Warning and Police Operations Support System" (Figure 1) was further honored with the 2024 Future Tech Award. With NCHC's computing resources and data platform support, the system has significantly improved the accuracy and stability of vehicle tracking, establishing an innovative model for future smart city traffic management.
Traditional vehicle tracking systems typically rely on single cameras or license plate recognition technologies, which face substantial limitations in real-world applications. Tracking accuracy often degrades under constrained camera coverage, poor lighting conditions, or occluded license plates. Moreover, in complex traffic scenarios involving multiple vehicles, heavy congestion, or highly dynamic vehicle behaviors, single-camera tracking becomes increasingly inadequate. As a result, developing a multi-target, multi-camera tracking system capable of overcoming these challenges has become a critical issue for enhancing traffic surveillance effectiveness.
The Vehicle Tracking in Multi-Target Multi-Camera Tracking System utilizes video streams from multiple intersection cameras along the Hsinchu West Coast Expressway as its primary data source (Figure 2), combined with advanced image processing techniques. The system employs deep learning models—particularly convolutional neural networks (CNNs)—for object detection and feature extraction, enabling cross-camera vehicle association and tracking. Its key innovation lies in multi-view collaborative operation, which effectively resolves blind spots inherent to single-camera systems. When vehicles move across different camera coverage areas, the system automatically maintains tracking continuity and identity consistency. In addition, by incorporating state-of-the-art deep learning algorithms, the system can accurately distinguish vehicles with similar appearances and dynamically adjust tracking strategies based on real-time traffic flow, ensuring stable performance in complex and changing road conditions (Figure 3).

Figure 1. Digital City – Intelligent Traffic Congestion Early Warning and Police Operations Support System

Figure 2. Experimental field in Hsinchu City

Figure 3. Continuous cross-camera vehicle tracking
In this project, NCHC provided high-performance computing resources, including NVIDIA V100 GPUs, to support the processing of large volumes of real-time video data and the training and optimization of deep learning models, such as the YOLOv8 vehicle detection model and the ResNet-based vehicle feature extraction model. NCHC also delivered a comprehensive data processing platform that integrates multi-camera video streams and ensures system stability and efficiency through reliable cloud storage and data management mechanisms.
Looking ahead, Prof. Hwang's research team will continue to optimize deep learning algorithms to further enhance tracking accuracy in complex traffic environments. The team also plans to expand the application scope by integrating the system with Vision Language Models (VLMs), strengthening computer vision capabilities and accelerating rapid response mechanisms for unexpected events. These efforts will lay a critical foundation for the future of smart transportation management in Taiwan's smart cities.