pasilla
This module is a port of the R Bioconductor pasilla package, version 1.31.0.
This module provides per-exon and per-gene read counts computed for selected genes from RNA-seq data that were presented in [Brooks2011]. The experiment studied the effect of RNAi knockdown of Pasilla, the Drosophila melanogaster ortholog of mammalian NOVA1 and NOVA2, on the transcriptome. The R package vignette describes how the data provided here were derived from the RNA-Seq read sequence data that are provided by NCBI Gene Expression Omnibus under accession numbers GSM461176 to GSM461181.
We describe below how to load the data to build an AnnData object (NB:
the snippet below is wrapped in the pasilla() function for convenience):
import importlib.resources
import pandas as pd
import anndata as ad
data_dir = importlib.resources.files("inmoose.data.pasilla")
cts = pd.read_csv(data_dir.joinpath("pasilla_gene_counts.tsv"), sep='\t', index_col=0)
anno = pd.read_csv(data_dir.joinpath("pasilla_sample_annotation.csv"), index_col=0)
# The columns of `cts` and the rows of `anno` use different labels and are
# not in the same order. We first need to harmonize them before building the
# AnnData object.
# first get rid of the "fb" suffix
anno.index = [i[:-2] for i in anno.index]
# second reorder the index
anno = anno.reindex(cts.columns)
# we are now ready to build the AnnData object
adata = ad.AnnData(cts.T, anno)
adata
Code documentation
- inmoose.data.pasilla.pasilla()
Retrieve the pasilla dataset as an
AnnDataobjectThe pasilla dataset is an RNA-Seq experiment on the effect of RNAi knockdown of Pasilla, the Drosophila melanogaster ortholog of mammalian NOVA1 and NOVA2, on the transcriptome [Brooks2011].
References
A.N. Brooks, L. Yang, M.O. Duff, K.D. Hansen, J.W. Park, S. Dudoit, S.E. Brenner, B.R. Graveley. 2011. Conservation of an RNA regulatory map between Drosophila and mammals. Genome Research doi:10.1101/gr.108662.110