mirror of
https://github.com/BlackLight/fkmeans.git
synced 2024-11-23 20:25:10 +01:00
README added
This commit is contained in:
parent
77f676e4d2
commit
3c62d2d3b1
1 changed files with 80 additions and 0 deletions
80
README
Normal file
80
README
Normal file
|
@ -0,0 +1,80 @@
|
|||
fkmeans is a tiny C library that allows you to perform k-means clustering
|
||||
algorithm over arbitrary sets of n-dimensional data. All you need to do is:
|
||||
|
||||
- Include the file kmeans.h in your sources;
|
||||
|
||||
- Consider your data set as a vector of vectors of double items (double**),
|
||||
where each vector is an n-dimensional item of your data set;
|
||||
|
||||
- If you want to perform the k-means algorithm over your data and you already
|
||||
know the number k of clusters there contained, or its estimate, you want to
|
||||
execute some code like this (in this example, the data set is 3-dimensional,
|
||||
i.e. it contains N vectors whose size is 3, and we know it contains n_clus
|
||||
clusters):
|
||||
|
||||
kmeans_t *km;
|
||||
double **dataset;
|
||||
...
|
||||
km = kmeans_new ( dataset, N, 3, n_clus );
|
||||
kmeans ( km );
|
||||
...
|
||||
kmeans_free ( km );
|
||||
|
||||
If you don't already know the number of clusters contained in your data set,
|
||||
you can use the function kmeans_auto() for automatically attempting to find
|
||||
the best one using Schwarz's criterion. Be careful, this operation can be very
|
||||
slow, especially if executed on data set having many elements. The example
|
||||
above would simply become something like:
|
||||
|
||||
kmeans_t *km;
|
||||
double **dataset;
|
||||
...
|
||||
km = kmeans_auto ( dataset, N, 3 );
|
||||
...
|
||||
kmeans_free ( km );
|
||||
|
||||
- Once the clustering has been performed, the clusters of data can be simply
|
||||
accessed from your kmeans_t* structure, as they are held as a double*** field
|
||||
named "clusters". Each vector in this structure represents a cluter, whose
|
||||
size is specified in the field cluster_sizes[i] of the structure. Each cluster
|
||||
contains the items that form it, each of it is an n-dimensional vector. The
|
||||
number of clusters is specified in the field "k" of the structure, the
|
||||
number of dimensions of each element is specified in the field "dataset_dim"
|
||||
and the number of elements in the originary data set is specified in the field
|
||||
"dataset_size". So, for example:
|
||||
|
||||
for ( i=0; i < km->k; i++ )
|
||||
{
|
||||
printf ( "cluster %d: [ ", i );
|
||||
|
||||
for ( j=0; j < km->cluster_sizes[i]; j++ )
|
||||
{
|
||||
printf ( "(" );
|
||||
|
||||
for ( k=0; k < km->dataset_size; k++ )
|
||||
{
|
||||
printf ( "%f, ", km->clusters[i][j][k] );
|
||||
}
|
||||
|
||||
printf ( "), ");
|
||||
}
|
||||
|
||||
printf ( "]\n" );
|
||||
}
|
||||
|
||||
The library however already comes with a sample implementation, contained in
|
||||
"test.c", and typing "make" this example will be built;
|
||||
|
||||
- After you write your source, remember to include the file "kmeans.c",
|
||||
containing the implementation of the library, in the list of your sources
|
||||
files;
|
||||
|
||||
- That's all. Include "kmeans.h", write your code using
|
||||
kmeans_new()+kmeans()+kmeans_free() or kmeans_auto()+kmeans_free(), explore
|
||||
your clusters, remember to include "kmeans.c" in the list of your source
|
||||
files, and you're ready for k-means clustering.
|
||||
|
||||
Author: Fabio "BlackLight" Manganiello,
|
||||
<blacklight@autistici.org>,
|
||||
http://0x00.ath.cx
|
||||
|
Loading…
Reference in a new issue