From 3c62d2d3b1600c9ee86c29e54b3b5f53de65de3a Mon Sep 17 00:00:00 2001 From: BlackLight Date: Thu, 18 Nov 2010 19:50:53 +0100 Subject: [PATCH] README added --- README | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) create mode 100644 README diff --git a/README b/README new file mode 100644 index 0000000..16840e0 --- /dev/null +++ b/README @@ -0,0 +1,80 @@ +fkmeans is a tiny C library that allows you to perform k-means clustering +algorithm over arbitrary sets of n-dimensional data. All you need to do is: + +- Include the file kmeans.h in your sources; + +- Consider your data set as a vector of vectors of double items (double**), + where each vector is an n-dimensional item of your data set; + +- If you want to perform the k-means algorithm over your data and you already + know the number k of clusters there contained, or its estimate, you want to + execute some code like this (in this example, the data set is 3-dimensional, + i.e. it contains N vectors whose size is 3, and we know it contains n_clus + clusters): + + kmeans_t *km; + double **dataset; + ... + km = kmeans_new ( dataset, N, 3, n_clus ); + kmeans ( km ); + ... + kmeans_free ( km ); + + If you don't already know the number of clusters contained in your data set, + you can use the function kmeans_auto() for automatically attempting to find + the best one using Schwarz's criterion. Be careful, this operation can be very + slow, especially if executed on data set having many elements. The example + above would simply become something like: + + kmeans_t *km; + double **dataset; + ... + km = kmeans_auto ( dataset, N, 3 ); + ... + kmeans_free ( km ); + +- Once the clustering has been performed, the clusters of data can be simply + accessed from your kmeans_t* structure, as they are held as a double*** field + named "clusters". Each vector in this structure represents a cluter, whose + size is specified in the field cluster_sizes[i] of the structure. Each cluster + contains the items that form it, each of it is an n-dimensional vector. The + number of clusters is specified in the field "k" of the structure, the + number of dimensions of each element is specified in the field "dataset_dim" + and the number of elements in the originary data set is specified in the field + "dataset_size". So, for example: + + for ( i=0; i < km->k; i++ ) + { + printf ( "cluster %d: [ ", i ); + + for ( j=0; j < km->cluster_sizes[i]; j++ ) + { + printf ( "(" ); + + for ( k=0; k < km->dataset_size; k++ ) + { + printf ( "%f, ", km->clusters[i][j][k] ); + } + + printf ( "), "); + } + + printf ( "]\n" ); + } + + The library however already comes with a sample implementation, contained in + "test.c", and typing "make" this example will be built; + +- After you write your source, remember to include the file "kmeans.c", + containing the implementation of the library, in the list of your sources + files; + +- That's all. Include "kmeans.h", write your code using + kmeans_new()+kmeans()+kmeans_free() or kmeans_auto()+kmeans_free(), explore + your clusters, remember to include "kmeans.c" in the list of your source + files, and you're ready for k-means clustering. + +Author: Fabio "BlackLight" Manganiello, + , + http://0x00.ath.cx +