Este evento ha pasado.

How big is too big? Clustering in BIG DATA with the Fantastic Four

Name: How big is too big? Clustering in BIG DATA with the Fantastic Four
Start: 2017-09-19T19:00:00-03:00
End: 2017-09-19T22:00:00-03:00
Location: Aula Magna

19 septiembre, 2017 @ 7:00 pm - 10:00 pm

19/09
19 a 22 hs
Aula Magna

What is big data? For this talk “big” refers to the number of samples (n) and/or number of dimensions (p) in static sets of feature vector data; or the size of (similarity or distance) matrices for relational clustering. Objectives of clustering in static sets of big numerical data are acceleration for loadable data and feasibility for non-loadable data. Three ways currently in favor to achieve these objectives are (i) streaming (online) clustering, which avoids the growth in (n) entirely; (ii) chunking and distributed processing; and (iii) sampling followed by very fast (usually 1-2% of the overall processing time) non-iterative extension to the remainder of the data. Kernel-based methods are mentioned, but not covered in this talk.

This talk describes the use of sampling followed by non-iterative extension that extend each of the “Fantastic Four” to the big data case. Three methods of sampling are covered: random, progressive, and minimax. The last portion of this talk summarizes a few of the many acceleration methods for each of the Fantastic Four. WHICH ARE? Four classical clustering methods have withstood the tests of time. I call them the Fantastic Four:
Gaussian Mixture Decomposition (GMD, 1898)
Hard c-means (often called “k-means,” HCM, 1956)
Fuzzy c-means (reduces to HCM in the limit, FCM, 1973)
SAHN Clustering (principally single linkage (SL, 1909)

For more information, contact:

Leticia Seijas: lseijas@fi.mdp.edu.ar
Daniela López De Luise: daniela_ldl@ieee.org

Detalles

Fecha:: 19 septiembre, 2017
Hora:: 7:00 pm - 10:00 pm

Local

: Aula Magna
: Av. Eduardo Madero 399 + Google Map