inmoose.limma.squeezeVar

inmoose.limma.squeezeVar(var, df, covariate=None, robust=False, winsor_tail_p=(0.05, 0.1))

Squeeze a set of sample variances together by computing empirical Bayes posterior means This function implements empirical Bayes algorithms proposed by [Smyth2004] and [Phipson2016].

A conjugate Bayesian hierarchical model is assumed for a set of sample variances. The hyperparameters are estimated by fitting a scaled F-distribution to the sample variances. The function returns the posterior variances and the estimated hyperparameters.

Specifically, the sample variances var are assumed to follow scaled chi-squared distributions, conditional on the true variances, and a scaled inverse chi-squared prior is assumed for the true variances. The scale and degrees of freedom of this prior distribution are estimated from the values of var.

The effect of this function is to squeeze the variances towards a common value, or to a global trend if a covariate is provided. The squeezed variances have a smaller expected mean square error to the true variances than do the sample variances themselves.

If covariate is not None, then the scale parameter of the prior distribution is assumed to depend on the covariate. If the covariate is average log-expression, then the effect is an intensity-dependent trend similar to that in [Sartor2006].

robust=True implements the robust empirical Bayes procedure of [Phipson2016] which allows some of the var values to be outliers.

Parameters:
  • var (array_like) – 1-D array of independent sample variances

  • df (array_like) – 1-D array of degrees of freedom for the sample variances

  • covariate – if not None, var_prior will depend on this numeric covariate. Otherwise, var_prior is constant.

  • robust (bool) – whether the estimation of df_prior and var_prior be robustified against outlier sample variances

  • winsor_tail_p (float or pair of floats) – left and right tail proportions of x to Winsorize. Only used when robust=True

Returns:

a dictionary with keys:

  • "var_post", 1-D array of posterior variances. Same length as var.

  • "var_prior", location or scale of prior distribution. 1-D array of same length as var if covariate is not None, otherwise a single value.

  • "df_prior", degrees of freedom of prior distribution. 1-D array of same length as var if robust=True, otherwise a single value.

Return type:

dict