Computer Vision and Machine Learning
Based on presentations and discussions during the respective workshop on 23 June 2016 in Oberkochen, Germany
Outreach of Computer Vision and Machine Learning
Computer Vision (CV) and Machine Learning (ML) have seen a tremendous evolution within the last 15 years. One of the main drivers of this success is the application of machine learning methods to computer vision tasks (image registration, segmentation, 3D reconstruction, tracking, object detection, image classification, …). These days it is widely agreed that difficult computational problems in data analytics (that cannot be solved analytically) are best solved with machine learning algorithms based on training data.
State-of-the-art
The current state-of-the-art CV allows the detection and tracking of single objects classes (such as faces, pedestrians or cars) in an unconstrained setting at a level that allows the realization of smart cameras that recognize smiling persons, driver assistance (pedestrian detection), surveillance applications and image-based web search [1, 2]. Image classification works on par with human level performance for databases with as many as 1000 classes (ImageNet) [3] and objects (e.g. birds) can be classified into fine-grained species with an accuracy of over 80% for 200 classes [4]. This level is sufficiently good for an app to support birders.
The field of structure-from-motion has reached performance levels that allow applications such as video editing and augmentation of large-scale 3D reconstruction from community web databases (e.g. Flickr) with the accuracy of a laser scanner [5]. The field of registration has reached maturity up to a level that allows photographs to be stitched seamlessly [6], e.g. from handheld cameras for panoramas.
Latest trends
During the last 5 years two lines of successful research have emerged: i) the integration of depth sensors (such as Microsoft Kinect, e.g. [7]) and ii) the application of deep learning techniques to basic computer vision tasks [8]. In particular the revival of deep learning methods improved the performance on many basic level tasks by leveraging large amounts of data in a learning framework. It has been agreed in the workshop that the next wave of innovation is likely to happen in the field of robotics where methods based on reinforcement learning can potentially model decision making processes.
On the computational side, the major trend is the advent of easily programmable interfaces for graphics processing units (GPUs). Interfaces such as CUDA or OpenCL are frequently used these days and allow the acceleration and parallelization of previously slow algorithms up to frame-rate speed. In particular learning and evaluation of deep convolutional models is facilitated by GPUs. Undoubtedly the current success of deep learning methods would not have been possible without modern GPUs.
Ref: Go >
References
[1] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. European Conference on Computer Vision (ECCV), 2014.
[2] R. Benenson, M. Omran, J. Hosang, B. Schiele ECCV workshop on computer vision for road scene understanding and autonomous driving
[3] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575, 2014
[4] S. Branson, G. Van Horn, S. Belongie, P. Perona Bird Species Categorization Using Pose Normalized Deep Convolutional Nets British Machine Vision Conference (BMVC), Nottingham, 2014.
[5] N. Snavely, SM Seitz, R. Szeliski Modeling the world from internet photo collections International Journal of Computer Vision, 80(2), pages 189-210, 2008
[6] M. Brown, D. Lowe. Automatic Panoramic Image Stitching using Invariant Features. International Journal of Computer Vision. 74(1), pages 59-73, 2007
[7] R. Newcombe, D. Fox, S. Seitz DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-Time, Computer Vision and Pattern Recognition (CVPR), 2015
[8] Y. LeCun, Y. Bengio, G. E. Hinton Deep Learning. Nature, Vol. 521, pp 436-444
[9] A. Geiger, P. Lenz, R. Urtasun Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite Computer Vision and Pattern Recognition (CVPR), 2012
[10] D. Scharstein and R. Szeliski A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1/2/3):7-42, April-June 2002.