README added

2025-07-13 15:48:07 +02:00 · 2010-11-18 19:50:53 +01:00 · 2010-11-18 19:50:53 +01:00 · 3c62d2d3b1
commit 3c62d2d3b1
parent 77f676e4d2
1 changed files with 80 additions and 0 deletions
--- a/80
+++ b/80
@ -0,0 +1,80 @@
+fkmeans is a tiny C library that allows you to perform k-means clustering
+algorithm over arbitrary sets of n-dimensional data. All you need to do is:
+
+- Include the file kmeans.h in your sources;
+
+- Consider your data set as a vector of vectors of double items (double**),
+  where each vector is an n-dimensional item of your data set;
+
+- If you want to perform the k-means algorithm over your data and you already
+  know the number k of clusters there contained, or its estimate, you want to
+  execute some code like this (in this example, the data set is 3-dimensional,
+  i.e. it contains N vectors whose size is 3, and we know it contains n_clus
+  clusters):
+
+    kmeans_t *km;
+    double **dataset;
+    ...
+    km = kmeans_new ( dataset, N, 3, n_clus );
+    kmeans ( km );
+    ...
+    kmeans_free ( km );
+
+  If you don't already know the number of clusters contained in your data set,
+  you can use the function kmeans_auto() for automatically attempting to find
+  the best one using Schwarz's criterion. Be careful, this operation can be very
+  slow, especially if executed on data set having many elements. The example
+  above would simply become something like:
+
+    kmeans_t *km;
+    double **dataset;
+    ...
+    km = kmeans_auto ( dataset, N, 3 );
+    ...
+    kmeans_free ( km );
+
+- Once the clustering has been performed, the clusters of data can be simply
+  accessed from your kmeans_t* structure, as they are held as a double*** field
+  named "clusters". Each vector in this structure represents a cluter, whose
+  size is specified in the field cluster_sizes[i] of the structure. Each cluster
+  contains the items that form it, each of it is an n-dimensional vector. The
+  number of clusters is specified in the field "k" of the structure, the
+  number of dimensions of each element is specified in the field "dataset_dim"
+  and the number of elements in the originary data set is specified in the field
+  "dataset_size". So, for example:
+
+    for ( i=0; i < km->k; i++ )
+    {
+	    printf ( "cluster %d: [ ", i );
+
+	    for ( j=0; j < km->cluster_sizes[i]; j++ )
+	    {
+		    printf ( "(" );
+
+		    for ( k=0; k < km->dataset_size; k++ )
+		    {
+			    printf ( "%f, ", km->clusters[i][j][k] );
+		    }
+
+		    printf ( "), ");
+		}
+
+	    printf ( "]\n" );
+	}
+
+  The library however already comes with a sample implementation, contained in
+  "test.c", and typing "make" this example will be built;
+
+- After you write your source, remember to include the file "kmeans.c",
+  containing the implementation of the library, in the list of your sources
+  files;
+
+- That's all. Include "kmeans.h", write your code using
+  kmeans_new()+kmeans()+kmeans_free() or kmeans_auto()+kmeans_free(), explore
+  your clusters, remember to include "kmeans.c" in the list of your source
+  files, and you're ready for k-means clustering.
+
+Author: Fabio "BlackLight" Manganiello,
+        <blacklight@autistici.org>,
+        http://0x00.ath.cx
+