cluster_neighbourhoods#
- cluster_neighbourhoods(domains_to_analyse, label_name, populations_to_analyse=None, neighbourhood_source=None, include_boundaries=None, exclude_boundaries=None, boundary_exclude_distance=0, network_kwargs={'network_type': 'Delaunay'}, k_hops=3, force_labels_to_include=[], labels_to_ignore=[], transform_neighbourhood_composition=None, neighbourhood_label_name='Neighbourhood ID', cluster_method='kmeans', cluster_parameters={'n_clusters': 8}, neighbourhood_enrichment_as='zscore', return_observation_matrix=False)#
Neighbourhood clustering of objects in a MuSpAn domain.
The function generates a network of objects and locates their neighbourhoods using k-hop neighbourhoods on this network. The composition of a given label within the neighbourhoods is calculated and these neighbourhood composition are clustered, providing a new neighbourhood label.
The size of the neighbourhoods is determined by type of network and the number of hops from the source object. Neighbourhood enrichment is calculated as the z-score or log-fold change of the neighbourhood composition compared to the global composition. Neighbourhood labels are added to the objects in the domain.
This function allows for the analysis of multiple domains and the comparison of neighbourhoods across domains.
- Parameters:
- domains_to_analyselist or object
A list of MuSpAn domains or a single MuSpAn domain to be analysed.
- label_namestr
The name of the label to be used for clustering.
- populations_to_analysearray-like, query-like, or None, optional
Populations to be analysed within the domains. If None, defaults to the neighbourhood_source.
- neighbourhood_sourcearray-like, query-like, or None, optional
Source of neighbourhoods within the domains. If None, defaults to populations_to_analyse.
- include_boundariesarray-like, query-like, or None, optional
Boundaries to include in the analysis. If None, no boundaries are included.
- exclude_boundariesarray-like, query-like, or None, optional
Boundaries to exclude from the analysis. If None, no boundaries are excluded.
- boundary_exclude_distancefloat, optional
Distance to exclude from the boundaries. Default is 0.
- network_kwargsdict, optional
Keyword arguments for the network generation passed through to generate_network. See generate_network for all options. Default is {‘network_type’:’Delaunay’}.
- k_hopsint, optional
Number of hops for neighbourhood construction. k-hop neighbourhoods are all nodes reachable in k edges from a source node. Default is 3.
- force_labels_to_includelist, optional
Labels to forcefully include in the analysis. Default is an empty list.
- labels_to_ignorelist, optional
Labels to ignore in the analysis. Default is an empty list.
- transform_neighbourhood_compositionstr, optional
Transformation method for the neighbourhood composition. Options are ‘log’ or ‘sqrt’. If ‘log’ is selected, the transformed Mt=log(1e4*M+1) is performed, for M original the orginal composition matrix. If ‘sqrt’ is selected, Mt=sqrt(M). Default is None.
- neighbourhood_label_namestr, optional
Name for the neighbourhood label. Default is ‘Neighbourhood ID’.
- cluster_methodstr, optional
Clustering method to use. Default is ‘kmeans’.
- cluster_parametersdict, optional
Parameters for the clustering method. Default is dict(n_clusters=8).
- neighbourhood_enrichment_asstr, optional
Method to calculate neighbourhood enrichment. Options are ‘zscore’ or ‘log-fold’. The ‘log-fold’ option computes the log2 fold change. Default is ‘zscore’.
- return_observation_matrixbool, optional
Whether to return the observation matrix. Adds additional output in return tuple if True. Default is False.
- Returns:
- neighbourhood_enrichment_matrixnumpy.ndarray
Matrix representing the enrichment of neighbourhoods.
- consistent_global_labelslist
List of consistent global label categories across all domains used to cluster the neighbourhoods.
- unique_cluster_labelsnumpy.ndarray
Array of unique cluster labels.
- If return_observation_matrix is True:
- observation_matrixnumpy.ndarray
The observation matrix (total source objects x unique labels) of label compositions within a k-hop neighbourhood for each object considered as a source. This is the raw matrix before clustering and transformation and can be used for parameterisation of clustering method or visualisation of neighbourhood space.