inmoose.consensus_clustering.consensus_clustering.consensusClustering

class inmoose.consensus_clustering.consensus_clustering.consensusClustering(cluster, mink=2, maxk=10, nb_resampling_iteration=50, resample_proportion=0.5, n_bins=10)

Implementation of Consensus clustering, following the paper https://link.springer.com/content/pdf/10.1023%2FA%3A1023949509487.pdf

Parameters:
  • cluster (sklearn clustering class) – Clustering algorithm to use for consensus clustering NOTE: the class is to be instantiated with parameter n_clusters, and possess a fit_predict method, which is invoked on data.

  • mink (int) – smallest number of clusters to try, default = 2

  • maxk (int) – biggest number of clusters to try, default = 10

  • nb_resampling_iteration (int) – number of resamplings for each cluster number, default = 50

  • resample_proportion (float) – percentage to sample. Number between 0 and 1, default = 0.5

  • n_bins (int) – Number of bins used to compute histogram in compute_area_under_curve, default = 10

  • consensus_matrices (ndarray[float]) – consensus matrices for each k NOTE: every consensus matrix is retained, like specified in the paper

  • Ak (array[float]) – area under CDF for each number of clusters (see paper: section 3.3.1. Consensus distribution.)

  • deltaK (array[float]) – changes in areas under CDF (see paper: section 3.3.1. Consensus distribution.)

  • bestK (int) – number of clusters that was found to be best

__init__(cluster, mink=2, maxk=10, nb_resampling_iteration=50, resample_proportion=0.5, n_bins=10)

Methods

__init__(cluster[, mink, maxk, ...])

build_clusters_consensus_df()

Compute cluster consensus for each k from min_k to max_k and return a dataframe to use in the plot_clusters_consensus

compute_area_delta()

Compute the differences between areas under CDFs

compute_area_under_curve()

Compute area under the CDFs curve

compute_bestk()

Get best number of clusters

compute_clusters_consensus(prediction, k)

For one prediction, compute clusters consensus, showing cluster stability.

compute_consensus_clustering(data, random_state)

Fits a consensus matrix for each number of clusters

compute_consensus_mat(connectivity_mat, ...)

Compute consensus matrix defined as the normalized sum of the connectivity matrices of all the resampled datasets

compute_items_consensus(prediction, k)

For one prediction, compute item consensus, showing most representative cluster items.

compute_iteration_connectivity_matrix(...)

Compute connectivity matrix

compute_iteration_indicator_mat(...)

Compute indicator matrix for one iteration

compute_summary_statistics(k)

For one prediction, compute a summary statistics, cluster consensus and item consensus, showing cluster stability and most representative cluster items.

line_plots_cluster_consensus(cons_clust_df, ...)

Line plots of the % and number of cluster with cluster consensus > threshold

plot_clustermap(k, saving_path[, col_color])

Compute and save a consensus clustering heatmap and dendrogram showing the consensus clustering stability.

plot_clusters_consensus(cons_clust_df, fig_path)

Plot cluster consensus results showing the statibility of clusters

plot_deltak(fig_path)

Plot the curve showing the relative change in area under the curve

predict(k)

Predicts clusters on the consensus matrix, for k clusters using the consensus matrix

predict_data(data)

Predicts clusters on the data, for best found cluster number