This function computes the Likelihood Ratios for the gene sets under scrutiny, as well as estimations of genes dynamics inside those gene sets through mixed models.
Usage
TcGSA.LR(
expr,
gmt,
design,
subject_name = "Patient_ID",
time_name = "TimePoint",
crossedRandom = FALSE,
covariates_fixed = "",
time_covariates = "",
time_func = "linear",
group_name = "",
separateSubjects = FALSE,
minGSsize = 10,
maxGSsize = 500
)
# S3 method for class 'TcGSA'
print(x, ...)Arguments
- expr
a matrix or dataframe of gene expression. Its dimension are \(n\)x\(p\), with the \(p\) samples in column and the \(n\) genes in row.
- gmt
a gmt object containing the gene sets definition. See
GSA.read.gmtand definition on https://docs.gsea-msigdb.org/.- design
a matrix or dataframe containing the experimental variables that used in the model, namely
subject_name,time_name, andcovariates_fixedandtime_covariatesif applicable. Its dimension are \(p\)x\(m\) and its row are is in the same order as the columns ofexpr.- subject_name
the name of the factor variable from
designthat contains the information on the repetition units used in the mixed model, such as the patient identifiers for instance. Default is'Patient_ID'. See Details.- time_name
the name of a numeric variable from
designthat contains the information on the time replicates (the time points at which gene expression was measured). Default is'TimePoint'. See Details.- crossedRandom
logical flag indicating whether the random effects of the subjects and of the time points should be modeled as one crossed random effect or as two separated random effects. Default is
FALSE. See details.- covariates_fixed
a character vector with the names of numeric or factor variables from the
designmatrix that should appear as fixed effects in the model. See details. Default is"", which corresponds to no covariates in the model.- time_covariates
a character vector with the names of numeric or factor variables from the
designmatrix that should appear as fixed effects interaction with thetime_namevariable in the model. See details. Default is"", which corresponds to no covariates in the model.- time_func
the form of the time trend. Can be either one of
"linear","cubic","splines"or specified by the user, or the column name of a factor variable fromdesign. If specified by the user, it must be as an expression using only names of variables from thedesignmatrix with only the three following operators:+,*,/. The"splines"form corresponds to the natural cubic B-splines (see alsons). If there are only a few timepoints, a"linear"form should be sufficient. Otherwise, the"cubic"form is more parsimonious than the"splines"form, and should be sufficiently flexible. If the column name of a factor variable fromdesignis supplied, then time is considered as discrete in the analysis. If the user specify a formula using column names from design, both factor and numeric variables can be used.- group_name
in the case of several treatment groups, the name of a factor variable from the
designmatrix. It indicates to which treatment group each sample belongs to. Default is"", which means that there is only one treatment group. See Details.- separateSubjects
logical flag indicating that the analysis identifies gene sets that discriminates patients rather than gene sets than have a significant trend over time. Default is
FALSE. See Details.- minGSsize
the minimum number of genes in a gene set. If there are less genes than this number in one of the gene sets under scrutiny, the Likelihood Ratio of this gene set is not computed (the mixed model are not fitted). Default is
10genes as the minimum.- maxGSsize
the maximum number of genes in a gene set. If there are more genes than this number in one of the gene sets under scrutiny, the Likelihood Ratio of this gene set is not computed (the mixed model are not fitted). This is to avoid very long computation times. Default is
500genes as the maximum.- x
an object of class '
TcGSA'.- ...
further arguments passed to or from other methods.
Value
TcGSA.LR returns a tcgsa object, which is a list with
the 5 following elements:
fit a data frame that contains the 3 following variables:
LR: the likelihood ratio between the model under the null hypothesis and the model under the alternative hypothesis.CVG_H0: convergence status of the model under the null hypothesis.CVG_H1: convergence status of the model under the alternative hypothesis.
time_func: a character string passing along the value of thetime_funcargument used in the call.GeneSets_gmt: agmtobject passing along the value of thegmtargument used in the call.group.var: a factor passing along thegroup_namevariable from thedesignmatrix.separateSubjects: a logical flag passing along the value of theseparateSubjectsargument used in the call.Estimations: a list of 3 dimensions arrays. Each element of the list (i.e. each array) corresponds to the estimations of gene expression dynamics for each of the gene sets under scrutiny (obtained from mixed models). The first dimension of those arrays is the genes included in the concerned gene set, the second dimension is thePatient_ID, and the third dimension is theTimePoint. The values inside those arrays are estimated gene expressions.time_DF: the degree of freedom of the natural splines functions
Details
This Time-course Gene Set Analysis aims at identifying gene sets that are not
stable over time, either homogeneously or heterogeneously (see Hejblum
et al, 2012)in terms of their probes. And when the argument
separateSubjects is TRUE, instead of identifying gene sets that
have a significant trend over time, TcGSA identifies gene sets that
have significantly different trends over time depending on Subjects.
References
Hejblum BP, Skinner J, Thiebaut R, (2015) Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLOS Comput. Biol. 11(6):e1004310. doi: 10.1371/journal.pcbi.1004310
See also
summary.TcGSA, plot.TcGSA,
and TcGSA.LR.parallel for an implementation using
parallel computing
Examples
if(interactive()){
data(data_simu_TcGSA)
tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design,
subject_name="Patient_ID", time_name="TimePoint",
time_func="linear", crossedRandom=FALSE)
tcgsa_sim_1grp
summary(tcgsa_sim_1grp)
plot(x=tcgsa_sim_1grp, expr=expr_1grp,
Subject_ID=design$Patient_ID, TimePoint=design$TimePoint,
baseline=1,
B=100,
time_unit="H"
)
tcgsa_sim_2grp <- TcGSA.LR(expr=expr_2grp, gmt=gmt_sim, design=design,
subject_name="Patient_ID", time_name="TimePoint",
time_func="linear", crossedRandom=FALSE,
group_name="group.var")
tcgsa_sim_2grp
}