cluster_data#
- cluster_data(data, method='kmeans', **cluster_kwargs)#
Cluster data using a specified method.
The function will return an cluster class identical to sklearn clustering object format.
- Parameters:
- dataarray-like or sparse matrix, shape (n_samples, n_features)
The input data to cluster.
- methodstr, optional, default=’kmeans’
The clustering method to use. Available methods are: ‘kmeans’, ‘dbscan’, ‘spectral’, ‘affinity’, ‘meanshift’, ‘birch’, ‘optics’, ‘hdbscan’, ‘agglomerative’, ‘minibatchkmeans’, ‘louvain’, ‘leiden’.
- cluster_kwargsdict, optional
Additional keyword arguments to pass to the clustering method.
- Returns:
- cluster_classclustering class
The cluster class object. The object has attributes labels_ (array of cluster labels for each point) and n_clusters_ (number of clusters found). These are standard attributes for sklearn clustering objects. For ‘louvain’ and ‘leiden’ methods, the object also has attribute modularity_ (modularity score of the clustering).
- Raises:
- ValueError
If the specified method is not in the list of available methods.
Notes
For the ‘louvain’ and ‘leiden’ methods, the function uses the sknetwork implementation of the Louvain and Leiden algorithms for community detection using a KNN-graph of the data. The main ‘cluster_kwargs’ for the ‘louvain’ and ‘leiden’ methods are ‘k_neighbours’ (int, default=15), ‘resolution’ (float, default=1) and ‘n_jobs’ (int, default=-1). See scikit-networks clustering documentation for more details.
For all other methods, the function uses the corresponding sklearn clustering method. For information on the clustering methods and their parameters, see scikit-learn.