cluster_neighbourhoods#

cluster_neighbourhoods(domains_to_analyse, label_name, populations_to_analyse=None, neighbourhood_source=None, include_boundaries=None, exclude_boundaries=None, boundary_exclude_distance=0, network_names=[], network_kwargs={'network_type': 'Delaunay'}, k_hops=3, force_labels_to_include=[], labels_to_ignore=[], transform_neighbourhood_composition='sqrt', neighbourhood_label_name='Neighbourhood ID', cluster_method='kmeans', cluster_parameters={'n_clusters': 8}, neighbourhood_enrichment_as='zscore', return_observation_matrix_and_labels=False)#

Neighbourhood clustering of objects using label information in a MuSpAn domain.

The function generates a network from objects and locates their neighbourhoods using k-hop neighbourhoods on this network.

If a label name associated with categorical labels is provided, the composition of these labels within the neighbourhoods is calculated. If a list of label names associated with continuous labels is provided, the sum of these labels within the neighbourhoods is calculated.

The size of the neighbourhoods is determined by type of network and the number of hops from the source object. Neighbourhood enrichment is calculated as the z-score or log-fold change of the neighbourhood composition compared to the global composition. Neighbourhood labels are added to the objects in the domain.

This function allows for the analysis of multiple domains and the comparison of neighbourhoods across domains.

Parameters:

domains_to_analyselist or object: A list of MuSpAn domains or a single MuSpAn domain to be analysed.
label_namestr, or list of str: The name of the label to be used for clustering. If a string is provided, it is assumed to be a categorical label. If a list of strings is provided, each label it is assumed to be continuous (numeric).
populations_to_analysearray-like, query-like, or None, optional: Populations to be analysed within the domains. If None, defaults to the neighbourhood_source.
neighbourhood_sourcearray-like, query-like, or None, optional: Source of neighbourhoods within the domains. If None, defaults to populations_to_analyse.
include_boundariesarray-like, query-like, or None, optional: Boundaries to include in the analysis. If None, no boundaries are included.
exclude_boundariesarray-like, query-like, or None, optional: Boundaries to exclude from the analysis. If None, no boundaries are excluded.
boundary_exclude_distancefloat, optional: Distance to exclude from the boundaries. Default is 0.
network_nameslist of str, optional: Names of the networks to use for each domain. If provided, must match the number of domains. If not provided, a temporary network is generated for each domain using network_kwargs.
network_kwargsdict, optional: Keyword arguments for the network generation passed through to generate_network. See generate_network for all options. Default is {‘network_type’:’Delaunay’}.
k_hopsint, optional: Number of hops for neighbourhood construction. k-hop neighbourhoods are all nodes reachable in k edges from a source node. For proxmity-based and k-nearest neighbour networks, set k_hop = 1. Default is 3.
force_labels_to_includelist, optional: Labels to forcefully include in the analysis. Default is an empty list.
labels_to_ignorelist, optional: Labels to ignore in the analysis. Default is an empty list.
transform_neighbourhood_compositionstr, optional: Transformation method for the neighbourhood composition. Options are ‘log’ or ‘sqrt’. If ‘log’ is selected, the transformed Mt=log(1e4*M+1) is performed, for M original the orginal composition matrix. If ‘sqrt’ is selected, Mt=sqrt(M). Default is ‘sqrt’.
neighbourhood_label_namestr, optional: Name for the neighbourhood label. Default is ‘Neighbourhood ID’.
cluster_methodstr, optional: Clustering method to use. This is choice of method is passed to the helper function muspan.helpers.cluster_data(). Distance-based, hierarchical and graph-based methods available, see muspan.helpers.cluster_data(). Default is ‘kmeans’.
cluster_parametersdict, optional: Parameters for the clustering method. This is choice of parameters is matched to the clustering methods and is passed to the helper function muspan.helpers.cluster_data() Default is dict(n_clusters=8).
neighbourhood_enrichment_asstr, optional: Method to calculate neighbourhood enrichment. Options are ‘zscore’ or ‘log-fold’. The ‘log-fold’ option computes the log2 fold change. Default is ‘zscore’.
return_observation_matrix_and_labelsbool, optional: Whether to return the observation matrix and resultant cluster labels. Adds additional output in return tuple if True. Default is False.

Returns:

neighbourhood_enrichment_matrixnumpy.ndarray

Matrix representing the enrichment of neighbourhoods.

label_categorieslist

List of consistent global label categories across all domains used to cluster the neighbourhoods.

cluster_categoriesnumpy.ndarray

Array of unique cluster labels.

If return_observation_matrix is True:

observation_matrixnumpy.ndarray: The observation matrix (total source objects x unique labels) of label compositions within a k-hop neighbourhood for each object considered as a source. This is the raw matrix before clustering and transformation and can be used for parameterisation of clustering method or visualisation of neighbourhood space.
cluster_labelsnumpy.ndarray: The resultant cluster labels for each source object from clustering the observation matrix. The order corresponds to the rows in the observation_matrix.

Notes

Clustering of the spatially-resolved label information is performed using the muspan.helpers.cluster_data helper function. This supports a variety of clustering methods including distance-based, hierarchical and graph-based methods such as:

KMeans
DBSCAN
Spectral Clustering
Affinity Propagation
Mean Shift
Birch
OPTICS
HDBSCAN
MiniBatchKMeans
Louvain
Leiden.

For more details on the clustering methods and their parameters, see muspan.helpers.cluster_data.

Examples

Minimal example of clustering neighbourhoods based on a categorical label:

import muspan as ms 

# load example domain
domain = ms.datasets.load_example_domain('Synthetic-Points-Exclusion')

# compute neighbourhoods and cluster them
enrichment_matrix,label_cats, cluster_cats = ms.networks.cluster_neighbourhoods(domain,
                                                                                label_name='Celltype',
                                                                                neighbourhood_label_name='example neighbourhoods',
                                                                                network_kwargs=dict(network_type='proximity',max_edge_distance=50),
                                                                                k_hops=1,
                                                                                cluster_method='leiden',
                                                                                cluster_parameters=dict(resolution=0.25,k_neighbours=100))

# visualise the domain colored by the computed neighbourhood clusters
ms.visualise.visualise(domain,color_by='example neighbourhoods')