Skip to content. Skip to navigation
CIM Menus
 

Classifying and Monitoring for Big Data


Daniel Keren
Department of Computer Science Haifa University

August 21, 2017 at  9:30 AM
McConnell Engineering Room 437

The talk will address two topics relevant to big data: efficient classification in the presence of many categories, and monitoring highly dynamic, massive data streams.
First half

A canonical problem in machine learning is category classification (e.g. find all instances of human faces, cars etc., in an image). Typically, the input for training a classifier is a relatively small sample of positive examples, and a much larger sample of negative examples, which in current applications can consist of images from thousands of categories.
The difficulty of the problem sharply increases with the dimension and size of the negative example set. We propose to alleviate this problem by applying a "hybrid" classifier, which replaces the negative samples by a prior, and then finds a hyperplane which separates the positive samples from this prior. The method is extended to kernel space and to an ensemble-based approach. The resulting classifiers achieve an identical or better classification rate than SVM, while requiring far smaller memory and lower computational complexity to train and apply.
Some of the material covered was presented in ICML (2015), and IEEE-PAMI (2016).

Second half

Traditional algorithms and data structures assume that data is relatively static and concentrated at a central server. Technological advancement radically changed this picture, and today we must learn to handle BD3: Big, Dynamic, Distributed Data. Clearly, the classical solution of centralizing the data and then processing it at one location is impossible to implement, due to both communication and processing bottlenecks. I will present a general paradigm to approach this problem, with a few applications (e.g. monitoring properties of large, dynamic, distributed graphs).
Some of the material covered was presented in KDD (2015,2016,2017), VLDB (2015), and IPDPS (2017).


Speaker Bio

Daniel Keren obtained a Ph.D at the Hebrew University and spent three years doing post-doctoral work at Brown University. Since 1994 he is with the Dept. of Computer Science, Haifa University. His main interests are machine learning and monitoring of large, dynamic data.