inmoose.limma.classifyTestsF

inmoose.limma.classifyTestsF(self, cor_matrix=None, df=inf, p_value=0.01, fstat_only=False)

For each gene, classify a series of related t-statistics as significantly up or down using nested F-tests.

This function implements the nestedF multiple testing option offered by decideTests(). Users should generally use decideTests() rather than calling classifyTestsF() directly because, by itself, classifyTestsF() does not incorporate any multiple testing adjustment across genes. Instead, it simply tests across contrasts for each gene individually.

classifyTestsF() used a nested F-test approach giving particular attention to correctly classifying genes that have two or more significant t-statistics, i.e. which are differentially expressed in two or more conditions. For each row of tstat, the overall F-statistics is constructed from the t-statistics as for FStat. At least one contrast will be classified as significant if and only if the overall F-statistic is significant. If the overall F-statistic is significant, then the function makes a best choice as to which t-statistics contributed to this result. The methodology is based on the principle that any t-statistic should be called significant if the F-test is still significant for that row when all the larger t-statistics are set to the same absolute size as the t-statistic in question.

Compared to conventional multiple testing methods, the nested F-test approach achieves better consistency between related contrasts. (For example, if B is judged to be different from C, then at least one of B or C should be different to A.) the approach was first used by [Michaud2008]. The nested F-test approach provides weak control of the family-wise error rate, i.e. it correctly controls the type I error rate of calling any contrast as significant if all the null hypotheses are true. In other words, it provides error rate control at the overall F-test level but does not provide strict error rate control at the individual contrast level.

Usually, self is a limma linear model fitted object, from which a matrix of t-statistics can be extracted, but it can also be a numeric matrix of t-statistics. In either case, rows correspond to genes and columns to coefficients or contrasts. The cor_matrix is the same as the correlation matrix of the coefficients from which the t-statistics were calculated and df is the degrees of freedom of the t-statistics. All statistics for the same gene must have the same degrees of freedom.

If fstat_only=True, this function just returns the vector of overall F-statistics for each gene.

Parameters:

self (MArrayLM or ndarray) – matrix of t-statistics, or a MArrayLM object from which the t-statistics may be extracted
cor_matrix (ndarray) – covariance matrix of each of t-statistics. Will be extracted automatically from the MArrayLM object, but otherwise defaults to the identity matrix.
df (array_like) – array of degrees of freedom for the t-statistics. Should be broadcastable to the shape of tstats. Will be extract automatically from the MArrayLM object but otherwise defaults to np.inf.
p_value (float) – value between 0 and 1 giving the desired size of the test
fstat_only (bool) – if True then return the overall F-statistic as for FStat instead of classifying the test results.

Returns:

if fstats_only=False, then an object of class TestResults, which is essentially a matrix with elements -1, 0 or 1 depending on whether each t-statistics is classified as significantly negative, not significant or significantly positive respectively. if fstats_only=True, then an array of F-statistics is returned with attributes df1 and df2 giving the corresponding degrees of freedom.

Return type:

TestResults or ndarray