inmoose.consensus_clustering.consensus_clustering.consensusClustering

class inmoose.consensus_clustering.consensus_clustering.consensusClustering(cluster, mink=2, maxk=10, nb_resampling_iteration=50, resample_proportion=0.5, n_bins=10)

Implementation of Consensus clustering, following the paper https://link.springer.com/content/pdf/10.1023%2FA%3A1023949509487.pdf

Parameters:

cluster (sklearn clustering class) – Clustering algorithm to use for consensus clustering NOTE: the class is to be instantiated with parameter n_clusters, and possess a fit_predict method, which is invoked on data.
mink (int) – smallest number of clusters to try, default = 2
maxk (int) – biggest number of clusters to try, default = 10
nb_resampling_iteration (int) – number of resamplings for each cluster number, default = 50
resample_proportion (float) – percentage to sample. Number between 0 and 1, default = 0.5
n_bins (int) – Number of bins used to compute histogram in compute_area_under_curve, default = 10
consensus_matrices (ndarray[float]) – consensus matrices for each k NOTE: every consensus matrix is retained, like specified in the paper
Ak (array[float]) – area under CDF for each number of clusters (see paper: section 3.3.1. Consensus distribution.)
deltaK (array[float]) – changes in areas under CDF (see paper: section 3.3.1. Consensus distribution.)
bestK (int) – number of clusters that was found to be best

__init__(cluster, mink=2, maxk=10, nb_resampling_iteration=50, resample_proportion=0.5, n_bins=10)

Methods

`__init__`(cluster[, mink, maxk, ...])
`build_clusters_consensus_df`()	Compute cluster consensus for each k from min_k to max_k and return a dataframe to use in the plot_clusters_consensus
`compute_area_delta`()	Compute the differences between areas under CDFs
`compute_area_under_curve`()	Compute area under the CDFs curve
`compute_bestk`()	Get best number of clusters
`compute_clusters_consensus`(prediction, k)	For one prediction, compute clusters consensus, showing cluster stability.
`compute_consensus_clustering`(data, random_state)	Fits a consensus matrix for each number of clusters
`compute_consensus_mat`(connectivity_mat, ...)	Compute consensus matrix defined as the normalized sum of the connectivity matrices of all the resampled datasets
`compute_items_consensus`(prediction, k)	For one prediction, compute item consensus, showing most representative cluster items.
`compute_iteration_connectivity_matrix`(...)	Compute connectivity matrix
`compute_iteration_indicator_mat`(...)	Compute indicator matrix for one iteration
`compute_summary_statistics`(k)	For one prediction, compute a summary statistics, cluster consensus and item consensus, showing cluster stability and most representative cluster items.
`line_plots_cluster_consensus`(cons_clust_df, ...)	Line plots of the % and number of cluster with cluster consensus > threshold
`plot_clustermap`(k, saving_path[, col_color])	Compute and save a consensus clustering heatmap and dendrogram showing the consensus clustering stability.
`plot_clusters_consensus`(cons_clust_df, fig_path)	Plot cluster consensus results showing the statibility of clusters
`plot_deltak`(fig_path)	Plot the curve showing the relative change in area under the curve
`predict`(k)	Predicts clusters on the consensus matrix, for k clusters using the consensus matrix
`predict_data`(data)	Predicts clusters on the data, for best found cluster number