inmoose.deseq2.collapseReplicates

inmoose.deseq2.collapseReplicates(obj, groupby, run, renameCols=True)

Collapse technical replicates in an AnnData or DESeqDataSet

Collapse the samples (rows) in obj by summing within levels of a grouping factor groupby. The purpose of this function if to sum up read counts from technical replicates to create an object with a single row of read counts for each sample. Note: by “technical replicates”, we mean multiple sequencing runs of the same library, in contrast to “biological replicates” in which multiple libraries are prepared from separate biological units. Optionally renames the columns of returned object with the levels of the grouping factor.

Parameters:
  • obj (AnnData or DESeqDataSet)

  • groupby (Factor) – a grouping factor, as long as the rows of obj

  • run (??, optional) – the names of each unique row in obj if provided, a new row “runsCollapsed” will be added to obj.obs which pastes together the names of run

  • renameCols (bool, optional) – whether to rename the rows of the returned object using the levels of the grouping factor

Returns:

an object with as many rows as levels in groupby. This object has count data which is summed from the various rows which are grouped together, and its .obs is subset using the first row for each group in groupby.

Return type:

AnnData or DESeqDataSet

Examples

>>> dds = makeExampleDESeqDataSet(m=12)
>>> # make data with two technical replicates for three samples
>>> dds.obs["sample"] = Factor(np.repeat(np.arange(1,10), [2,1,1,2,1,1,2,1,1]))
>>> dds.obs["run"] = [f"run{i}" for i in range(12)]
>>> ddsColl = collapseReplicates(dds, dds.obs["sample"], dds.obs["run"])
>>> # examine the clinical data and rows names of the collapsed data
>>> ddsColl.obs
???
>>> ddsColl.index
???
>>> # check that the sum of the counts for "sample0" is the same
... # as the counts in the "sample0" row in ddsColl
>>> matchFirstLevel = dds.obs["sample"] == dds.obs["sample"].categories[0]
>>> np.all(np.sum(dds[matchFirstLevel,:].counts(), axis=0) == ddsColl[0,:].counts())
True