This function can plot different representations of the gene expression in a specific gene set, stratified on all subjects.
Usage
plotPat.1GS(
expr,
gmt,
Subject_ID,
TimePoint,
geneset.name,
baseline = NULL,
group.var = NULL,
Group_ID_paired = NULL,
ref = NULL,
group_of_interest = NULL,
FUNcluster = NULL,
clustering_metric = "euclidian",
clustering_method = "ward",
B = 500,
max_trends = 4,
aggreg.fun = "median",
na.rm.aggreg = TRUE,
trend.fun = "median",
methodOptiClust = "firstSEmax",
verbose = TRUE,
clustering = TRUE,
time_unit = "",
title = NULL,
y.lab = NULL,
desc = TRUE,
lab.cex = 1,
axis.cex = 1,
main.cex = 1,
y.lab.angle = 90,
x.axis.angle = 45,
y.lim = NULL,
x.lim = NULL,
gg.add = list(theme())
)
Arguments
- expr
either a matrix or dataframe of gene expression upon which dynamics are to be calculated, or a list of gene sets estimation of gene expression. In the case of a matrix or dataframe, its dimension are \(n\) x \(p\), with the \(p\) sample in column and the \(n\) genes in row. In the case of a list, its length should correspond to the number of gene sets under scrutiny and each element should be an 3 dimension array of estimated gene expression, such as for the list returned in the
'Estimations'
element ofTcGSA.LR
. See details.- gmt
a gmt object containing the gene sets definition. See
GSA.read.gmt
and definition on www.software.broadinstitute.org.- Subject_ID
a factor of length \(p\) that is in the same order as the columns of
expr
(when it is a dataframe) and that contains the patient identifier of each sample.- TimePoint
a numeric vector or a factor of length \(p\) that is in the same order as
TimePoint
and the columns ofexpr
(when it is a dataframe), and that contains the time points at which gene expression was measured.- geneset.name
a character string containing the name of the gene set to be plotted, that must appear in the
"geneset.names"
element ofgmt
.- baseline
a character string which is the value of
TimePoint
that can be used as a baseline. Default isNULL
, in which case no time point is used as a baseline value for gene expression. Has to beNULL
when comparing two treatment groups.- group.var
in the case of several treatment groups, this is a factor of length \(p\) that is in the same order as
Timepoint
,Subject_ID
and the columns ofexpr
. It indicates to which treatment group each sample belongs to. Default isNULL
, which means that there is only one treatment group. See Details.- Group_ID_paired
a character vector of length \(p\) that is in the same order as
Timepoint
,Subject_ID
,group.var
and the columns ofexpr
. This argument must not beNULL
in the case of a paired analysis, and must beNULL
otherwise. Default isNULL
.- ref
the group which is used as reference in the case of several treatment groups. Default is
NULL
, which means that reference is the first group in alphabetical order of the labels ofgroup.var
. See Details.- group_of_interest
the group of interest, for which dynamics are to be computed in the case of several treatment groups. Default is
NULL
, which means that group of interest is the second group in alphabetical order of the labels ofgroup.var
.- FUNcluster
a function which accepts as first argument a matrix
x
and as second argument the number of clusters desiredk
, and which returns a list with a component named'cluster'
which is a vector of lengthn = nrow(x)
of integers in 1:k, determining the clustering or grouping of the n observations. Default isNULL
, in which case a hierarchical clustering is performed via the functionagnes
, using the metricclustering_metric
and the methodclustering_method
. See'FUNcluster'
inclusGap
and Details.- clustering_metric
character string specifying the metric to be used for calculating dissimilarities between observations in the hierarchical clustering when
FUNcluster
isNULL
. The currently available options are"euclidean"
and"manhattan"
. Default is"euclidean"
. Seeagnes
. Also, a"sts"
option is available in TcGSA. It implements the 'Short Time Series' distance [Moller-Levet et al., Fuzzy Clustering of short time series and unevenly distributed sampling points, Advances in Intelligent Data Analysis V:330-340 Springer, 2003] designed specifically for clustering time series.- clustering_method
character string defining the agglomerative method to be used in the hierarchical clustering when
FUNcluster
isNULL
. The six methods implemented are"average"
([unweighted pair-]group average method, UPGMA),"single"
(single linkage),"complete"
(complete linkage),"ward"
(Ward's method),"weighted"
(weighted average linkage). Default is"ward"
. Seeagnes
.- B
integer specifying the number of Monte Carlo ("bootstrap") samples used to compute the gap statistics. Default is
500
. SeeclusGap
.- max_trends
integer specifying the maximum number of different clusters to be tested. Default is
4
.- aggreg.fun
a character string such as
"mean"
,"median"
or the name of any other defined statistics function that returns a single numeric value. It specifies the function used to aggregate the observations before the clustering. Default is tomedian
.- na.rm.aggreg
a logical flag indicating whether
NA
should be remove to prevent propagation throughaggreg.fun
. Can be useful to set to TRUE with unbalanced design as those will generate structuralNA
s in$Estimations
. Default isTRUE
.- trend.fun
a character string such as
"mean"
,"median"
or the name of any other function that returns a single numeric value. It specifies the function used to calculate the trends of the identified clustered. Default is tomedian
.- methodOptiClust
character string indicating how the "optimal" number of clusters is computed from the gap statistics and their standard deviations. Possible values are
"globalmax"
,"firstmax"
,"Tibs2001SEmax"
,"firstSEmax"
and"globalSEmax"
. Default is"firstSEmax"
. See'method'
inclusGap
, Details and Tibshirani et al., 2001 in References.- verbose
logical flag enabling verbose messages to track the computing status of the function. Default is
TRUE
.- clustering
logical flag. If
FALSE
, there is no clustering representation; ifTRUE
, the lines are colored according to which cluster they belong to. Default isTRUE
. See Details.- time_unit
the time unit to be displayed (such as
"Y"
,"M"
,"W"
,"D"
,"H"
, etc) next to the values ofTimePoint
on the x-axis. Default is""
.- title
character specifying the title of the plot. If
NULL
, a title is automatically generated, if""
, no title appears. Default isNULL
.- y.lab
character specifying the annotation of the y axis. If
NULL
, an annotation is automatically generated, if""
, no annotation appears. Default isNULL
.- desc
a logical flag. If
TRUE
, a line is added to the title of the plot with the description of the gene set plotted (from the gmt file). Default isTRUE
.- lab.cex
a numerical value giving the amount by which lab labels text should be magnified relative to the default
1
.- axis.cex
a numerical value giving the amount by which axis annotation text should be magnified relative to the default
1
.- main.cex
a numerical value giving the amount by which title text should be magnified relative to the default
1
.- y.lab.angle
a numerical value (in [0, 360]) giving the orientation by which y-label text should be turned (anti-clockwise). Default is
90
. Seeelement_text
.- x.axis.angle
a numerical value (in [0, 360]) giving the orientation by which x-axis annotation text should be turned (anti-clockwise). Default is
45
.- y.lim
a numeric vector of length 2 giving the range of the y-axis. See
plot.default
.- x.lim
if numeric, will create a continuous scale, if factor or character, will create a discrete scale. Observations not in this range will be dropped. See
xlim
.- gg.add
A list of instructions to add to the
ggplot2
instructions. See+.gg
. Default islist(theme())
, which adds nothing to the plot.
Value
A dataframe the 2 following variables:
ProbeID
which contains the IDs of the probes of the plotted gene set.Cluster
which to which cluster the probe belongs to.
If clustering
is FALSE
, then Cluster
is NA
for all the probes.
Details
If expr
is a matrix or a dataframe, then the "original" data are
plotted. On the other hand, if expr
is a list returned in the
'Estimations'
element of TcGSA.LR
, then it is those
"estimations" made by the TcGSA.LR
function that are plotted.
If indiv
is 'genes', then each line of the plot is the median of a
gene expression over the patients. On the other hand, if indiv
is
'patients', then each line of the plot is the median of a patient genes
expression in this gene set.
This function uses the Gap statistics to determine the optimal number of
clusters in the plotted gene set. See
clusGap
.
References
Tibshirani, R., Walther, G. and Hastie, T., 2001, Estimating the number of data clusters via the Gap statistic, Journal of the Royal Statistical Society, Series B (Statistical Methodology), 63, 2: 411--423.
Examples
if(interactive()){
data(data_simu_TcGSA)
tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design,
subject_name="Patient_ID", time_name="TimePoint",
time_func="linear", crossedRandom=FALSE)
plotPat.1GS(expr=expr_1grp, TimePoint=design$TimePoint,
Subject_ID=design$Patient_ID, gmt=gmt_sim,
geneset.name="Gene set 4",
clustering=FALSE,
time_unit="H",
lab.cex=0.7)
plotPat.1GS(expr=expr_1grp, TimePoint=design$TimePoint,
Subject_ID=design$Patient_ID, gmt=gmt_sim,
geneset.name="Gene set 4",
clustering=FALSE, baseline=1,
time_unit="H",
lab.cex=0.7)
colval <- c(hsv(0.56, 0.9, 1),
hsv(0, 0.27, 1),
hsv(0.52, 1, 0.5),
hsv(0, 0.55, 0.97),
hsv(0.66, 0.15, 1),
hsv(0, 0.81, 0.55),
hsv(0.7, 1, 0.7),
hsv(0.42, 0.33, 1)
)
n <- length(colval); y <- 1:n
op <- par(mar=rep(1.5,4))
plot(y, axes = FALSE, frame.plot = TRUE,
xlab = "", ylab = "", pch = 21, cex = 8,
bg = colval, ylim=c(-1,n+1), xlim=c(-1,n+1),
main = "Color scale"
)
par(op)
plotPat.1GS(expr=expr_1grp, TimePoint=design$TimePoint,
Subject_ID=design$Patient_ID, gmt=gmt_sim,
geneset.name="Gene set 5",
time_unit="H",
title="",
gg.add=list(scale_color_manual(values=colval)),
lab.cex=0.7
)
plotPat.1GS(expr=tcgsa_sim_1grp$Estimations, TimePoint=design$TimePoint,
Subject_ID=design$Patient_ID, gmt=gmt_sim,
geneset.name="Gene set 3",
time_unit="H",
lab.cex=0.7
)
}