inmoose.consensus_clustering.consensus_clustering.consensusClustering
- class inmoose.consensus_clustering.consensus_clustering.consensusClustering(cluster, mink=2, maxk=10, nb_resampling_iteration=50, resample_proportion=0.5, n_bins=10)
Implementation of Consensus clustering, following the paper https://link.springer.com/content/pdf/10.1023%2FA%3A1023949509487.pdf
- Parameters:
cluster (sklearn clustering class) – Clustering algorithm to use for consensus clustering NOTE: the class is to be instantiated with parameter n_clusters, and possess a fit_predict method, which is invoked on data.
mink (int) – smallest number of clusters to try, default = 2
maxk (int) – biggest number of clusters to try, default = 10
nb_resampling_iteration (int) – number of resamplings for each cluster number, default = 50
resample_proportion (float) – percentage to sample. Number between 0 and 1, default = 0.5
n_bins (int) – Number of bins used to compute histogram in compute_area_under_curve, default = 10
consensus_matrices (ndarray[float]) – consensus matrices for each k NOTE: every consensus matrix is retained, like specified in the paper
Ak (array[float]) – area under CDF for each number of clusters (see paper: section 3.3.1. Consensus distribution.)
deltaK (array[float]) – changes in areas under CDF (see paper: section 3.3.1. Consensus distribution.)
bestK (int) – number of clusters that was found to be best
- __init__(cluster, mink=2, maxk=10, nb_resampling_iteration=50, resample_proportion=0.5, n_bins=10)
Methods
__init__(cluster[, mink, maxk, ...])build_clusters_consensus_df()Compute cluster consensus for each k from min_k to max_k and return a dataframe to use in the plot_clusters_consensus
compute_area_delta()Compute the differences between areas under CDFs
compute_area_under_curve()Compute area under the CDFs curve
compute_bestk()Get best number of clusters
compute_clusters_consensus(prediction, k)For one prediction, compute clusters consensus, showing cluster stability.
compute_consensus_clustering(data, random_state)Fits a consensus matrix for each number of clusters
compute_consensus_mat(connectivity_mat, ...)Compute consensus matrix defined as the normalized sum of the connectivity matrices of all the resampled datasets
compute_items_consensus(prediction, k)For one prediction, compute item consensus, showing most representative cluster items.
compute_iteration_connectivity_matrix(...)Compute connectivity matrix
compute_iteration_indicator_mat(...)Compute indicator matrix for one iteration
compute_summary_statistics(k)For one prediction, compute a summary statistics, cluster consensus and item consensus, showing cluster stability and most representative cluster items.
line_plots_cluster_consensus(cons_clust_df, ...)Line plots of the % and number of cluster with cluster consensus > threshold
plot_clustermap(k, saving_path[, col_color])Compute and save a consensus clustering heatmap and dendrogram showing the consensus clustering stability.
plot_clusters_consensus(cons_clust_df, fig_path)Plot cluster consensus results showing the statibility of clusters
plot_deltak(fig_path)Plot the curve showing the relative change in area under the curve
predict(k)Predicts clusters on the consensus matrix, for k clusters using the consensus matrix
predict_data(data)Predicts clusters on the data, for best found cluster number