Data Science

A list of info, algorithms around data science.

K-means clustering

K-means is an unsupervised clustering algorithm that partition a set of observiations in k clusters.


From wikipedia:

Given a set of observations $(x_1, x_2, \ldots, x_n)$ where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k sets $S = {S_1, S_2, \ldots, S_k}$ so as to minimize the within-cluster sum of squares:

\arg_{S} \min \sum_{i=1}^{k}