Supplementary information for:

# 'A quantitative method for reverse engineering gene networks from microarray experiments using regulatory strengths'

## Alberto de la Fuente, Paul Brazhnik and Pedro Mendes

## Biochemical Networks Modeling Group

##
Virginia Bioinformatics Institute

alf@vt.edu brazhnik@vt.edu mendes@vt.edu

[Abstract]

[Analysis of a 3-gene network]

[Analysis of three genes in a 5-gene network]

[Analysis of a 5-gene network in which complex formation occurs]

[Analysis of the 3-gene network with control coefficients and elasticities
]

We propose a method for reverse engineering gene regulatory networks from
microarray gene expression data. The method is based on metabolic control
analysis and requires data from microarray gene expression experiments
where the rate of change of a single gene has been perturbed. Gene
regulatory interactions are quantified through "regulatory strengths" which are
determined from co-responses of messenger RNA to a common perturbation.
The application of the method is illustrated by analyzing data produced with
computer simulations of gene regulatory networks. Evidence is shown
indicating that the method performs well in some cases even when
perturbations are as large as two-fold. A global gene regulatory network can
be uncovered by this method if applied systematically to all genes in a
genome. Alternatively, it can be applied to a subset of genes in which case it
identifies a phenomenological gene network that contains, beside direct
interactions, contributions from indirect interactions (via the genes that were
not considered or through complex formation of gene products).

To demonstrate the method we here perform the analysis on a model of a network of only three genes (figure 1).
The main motivation for employing the method on simulated data is that in this case the structure of the system
that produced the data is exactly known, in contrast to data produced by biological systems.
We think that all methods used in gene expression analysis should be tested to simulated data before applied
with confidence to the much larger biological data sets.
The model is described by non-linear ordinary differential equations as given by equation 1.

Figure 1: A three gene regulatory network

(1)

Where [A], [B] and [C] are the concentrations of the mRNA species.
*V*_{a}, *V*_{b} and *V*_{c} are basal rates of transcription,
the *K*_{iB} and *K*_{iB'} are inhibition constants,
*K*_{aC} and *K*_{aA} are activation constants, *k*_{a}, *k*_{b}, and *k*_{c}
are first-order degradation constants. A reference steady state for
[A], [B] and [C] was calculated setting all parameter
values arbitrarily to unity, exept *K*_{iB} , *K*_{iB'} , *K*_{aA}
and *K*_{aC} were set to 0.1.

The method requires a first measurement of concentrations of mRNA in a certain reference steady state. Perturbations are made by increasing the rate of transcription of single genes and the effect on the steady state concentrations of all mRNA species are measured in response to that perturbation. Perturbations of where made by increasing the rate constants V by 10% in individual experiments. The data is given in table 1.

The method requires a first measurement of concentrations of mRNA in a certain reference steady state. Perturbations are made by increasing the rate of transcription of single genes and the effect on the steady state concentrations of all mRNA species are measured in response to that perturbation. Perturbations of where made by increasing the rate constants V by 10% in individual experiments. The data is given in table 1.

Table 1: Steady state responses of mRNA concentrations in the 3-gene
network to a 10% increase in transcription rates

Steady state concentrations | A | B | C |

Steady state 0 | 0.164881 | 0.506499 | 0.102634 |

Perturbation rate a | 0.179707 | 0.512106 | 0.104963 |

Perturbation rate b | 0.156757 | 0.512106 | 0.104963 |

Perturbation rate c | 0.160838 | 0.521745 | 0.109093 |

The co-control coefficient defined for the mRNA concentrations of two genes i and j when a rate of transcription

(2)

where FR is the fluorescence ratio as measured in micro arrays and gene-chips:

(3)

where *F* is the fluorescence measured in the state before the purturbation (reference state) and *F'*
the the fluorescence in the state after the perturbation, which is assumed to be proportional with the
concentrations of mRNA in the reference state, *[mRNA]*_{i} and in the perturbed state *[mRNA]*_{i}' respectively.

While the co-control coefficients are defined for infinitisimal perturbations, the right hand side of the equation represents the finite estimation for the co-control coefficient. This is the form of the equation we used to calculate the co-control coefficients from the data in table 1.

Three different co-control matrices, each with respect to another variable, are calculated.

One with respect to the control coefficients on A. In this matrix the control coefficients are devided by the control coefficients of A:

(4) While the co-control coefficients are defined for infinitisimal perturbations, the right hand side of the equation represents the finite estimation for the co-control coefficient. This is the form of the equation we used to calculate the co-control coefficients from the data in table 1.

Three different co-control matrices, each with respect to another variable, are calculated.

One with respect to the control coefficients on A. In this matrix the control coefficients are devided by the control coefficients of A:

Using the data from Table 1 we calculated the following values for the co-control coefficients:

(5)

One with respect to the control coefficients on B. In this matrix the control coefficients are devided by the control coefficients of B:

(6)

Using the data from Table 1 we calculated the following values for the co-control coefficients:

(7)

One with respect to the control coefficients on C. In this matrix the control coefficients are devided by the control coefficients of C:

(8)

Using the data from Table 1 we calculated the following values for the co-control coefficients:

(9)

To describe interactions between genes we use a coefficient called the regulatory strength. The regulatory strength, as defined by equation 3 in the paper,

(10)

describes how much of the concentration of a certain mRNA is regulated by the concentration of another mRNA or its own concentration. Inverting the co-control matrices gives the matrices containing the regulatory strenths:

By inverting the co-control matrix in A we obtain the Regulatory strength matrix to A:

(11)

By inverting the co-control matrix in B we obtain the Regulatory strength matrix to B:

(12)

By inverting the co-control matrix in C we obtain the Regulatory strength matrix to C:

(13)

To infer genetic networks we are only interested in the direct effects of genes on eachother. In the regulatory strength matrices
only some rowes represent direct effects, the other elements represent regulatory effects that are indirect. The top row
of the regulatory strength matrix in A gives direct effects on A, the middel row of the regulatory strength martix in B gives the direct
effects on B and the bottom row in the regulatory strength matrix in C gives the direct effects on C. Mathematically the relevant rows
can be joined in a single matrix as follows:

(14)

The obtained matrix is one with direct regulatory strengths in its elements. This matrix represent the genetic network and
is called the genetic interaction matrix. For the system under investigation we obtained:

(15)

Using this matrix we draw the structure of the genetic network. We consider regulatory strengths lower than 0.055 to be zero,
because we know that errors are obtained by making the step from infinitisimal to finite perturbations. We obtained the following
network:

Figure2: The network obtained from the gene interaction matrix

Table2: The values for the regulatory coefficients determined in three different series of perturbation experiments

Theoretical | 1.1x | 0.5x | 2x | |

-0.901 | -0.903 | -0.879 | -0.916 | |

-0.752 | -0.757 | -0.711 | -0.787 | |

0 | 0.001 | -0.005 | -0.004 | |

0 | 0.001 | -0.006 | 0.005 | |

-0.638 | -0.639 | -0.635 | -0.646 | |

0.315 | 0.311 | 0.348 | 0.288 | |

0.241 | 0.236 | 0.279 | 0.213 | |

-0.533 | -0.544 | -0.444 | -0.607 | |

-0.638 | -0.639 | -0.628 | -0.651 |

The model can be downloaded here as a
Gepasi file. The co-control
coeffients can be calculated from the steady state concentrations
of mRNA using a excel spreadsheet, which can be be downloaded
here. The calculated co-control
matrices can be inverted to obtain the regulatory strenght matrices
by using for example Mathematica or a Java based web program for
matrix operations (here).

In cases where one cannot measure the expression of all the genes or when it is not convenient or when their transcription rate
is not easily perturbed), the method has to be applied only to a subset of the genes. In those cases only the rates of
transcription of those genes and their mRNA concentrations are measured. Nevertheless, a larger network is responsible
for the observations and so one can no longer be sure if the interactions detected by the method are really direct or happen
to be the result of the hidden variables. To explore such a scenario we constructed the 5-gene network of Fig. 2A
with the kinetics given in equation 16. Then perturbed the transcription rates of genes C, D and E by 0.5x, 1.1x
and 2x perturbations and "measured" their corresponding relative mRNA responses. We calculated all the direct-effect
regulatory strengths in the same way as for the 3-gene network (above). The results are given in table 3.

Figure3A: The complete 5-gene network

(16)

Table3: The values for the regulatory coefficients determined in three different series of perturbation experiments

*The calculated value differs from the theoretical value due to indirect interaction through a hidden variable

*The calculated value differs from the theoretical value due to indirect interaction through a hidden variable

Theoretical | 1.1x | 0.5x | 2x | |

-2.15 | -2.31 | -1.65 | -4.57 | |

0 | 0.013 | -0.049 | 0.149 | |

-1.44 | -1.61 | 0.926 | -3.878 | |

0* | -0.302 | -0.07 | -1.36 | |

-1.04 | -1.04 | -1.07 | -0.984 | |

0 | -0.052 | 0.145 | -1.01 | |

-1.65 | -1.81 | -1.11 | -4 | |

0.268 | 0.267 | 0.352 | 0.336 | |

-2.15 | -2.3 | -1.69 | -4.43 |

Using the result of the 1.1x perturbation and a threshold value of 0.055 we could draw the following network:

Figure 3B: The 3-gene network obtained by analysis of three genes of the 5-gene network

Comparing the networks of Fig. 2, it is clear that all the direct interactions on the original network between genes
C, D and E were all recovered from the regulatory strengths, In addition there is now a new arrow from C to D
that does not exist in the original network (Fig. 2A). In the real system, C influences B and B influences D,
but because B was not included in the analysis these two interactions collapse into a single one which is,
of course, only apparent. What this reveals is that if one only considers in the analysis a subset of genes,
then the interactions may not be really direct, but rather pass through the action of hidden variables.
Incidentally this is the same as if one considers that in most cases it is not the mRNAs that interact with
gene transcription but rather their protein products. Because the proteins are not being considered here, they
are just as well hidden variables and their action is included in the arrows of the gene regulatory network.

The model can be downloaded here as a
Gepasi file. The co-control
coeffients can be calculated from the steady state concentrations
of mRNA using a excel spreadsheet, which can be be downloaded
here. The calculated co-control
matrices can be inverted to obtain the regulatory strenght matrices
by using for example Mathematica or a Java based web program for
matrix operations (here).

Our model of gene networks is a purely additive one, which stems from our use of metabolic control analysis
and this is based on a first-order Taylor approximation. However it is not hard to conceive
(or find examples) where the regulation of the expression of a gene depends on a combination of several other
genes. For example it may be that the products of gene A and gene B have to bind before being able to activate
gene C. Because these may be frequent, it is relevant to see how our method performs with such networks. For
that we have created the gene network of Fig. 3A. The main feature of this network is that there is the formation
of two "adduct" (AE and BD) that are regulators of other genes.

Figure 4A: The complete 5-network in which two regulatory complexes are formed

The system is discribed by the following differential equations:

(17)

Table4: The values for the regulatory coefficients determined for the model with complex formation

*The calculated value differs from the theoretical value due to indirect interaction through a hidden variable

*The calculated value differs from the theoretical value due to indirect interaction through a hidden variable

Theoretical | 1.1x | 0.5x | 2x | |

-0.942 | -0.952 | -0.904 | -1.07 | |

0* | -0.264 | -0.176 | -0.347 | |

0 | -0.041 | 0.24 | -0.39 | |

0* | -0.258 | -0.209 | -0.304 | |

0* | -0.066 | 0.005 | -0.239 | |

0 | 0.003 | 0.011 | -0.049 | |

-0.842 | -0.839 | -0.865 | -0.825 | |

-0.738 | -0.748 | -0.706 | -0.865 | |

0* | -0.257 | -0.211 | -0.305 | |

0 | -0.004 | 0.011 | -0.057 | |

0* | -0.384 | -0.277 | -0.581 | |

0 | -0.003 | 0.008 | -0.039 | |

-1.27 | -1.29 | -1.14 | -1.59 | |

0 | 0.001 | -0.011 | 0.006 | |

0* | -0.389 | -0.264 | -0.652 | |

0 | -0.005 | -0.021 | -0.063 | |

0.476* | 0.394 | 0.499 | 0.324 | |

0.476* | -0.023 | 0.144 | -0.212 | |

-0.888 | -0.887 | -0.896 | -0.879 | |

0 | -0.005 | -0.021 | -0.074 | |

0* | -0.602 | -0.422 | -0.895 | |

0 | -0.004 | -0.003 | -0.068 | |

-1.23 | -1.28 | -1.02 | -1.66 | |

0.267 | 0.257 | 0.351 | 0.199 | |

-1.41 | -1.44 | -1.28 | -1.76 |

We performed 0.5x, 1.1x and 2x perturbations on
all the transcription rates and observed the response on the expression levels of A, B, C, D and E (the levels of
AE and BD were not taken into account).

We used the results obtained with the 1.1x perturbation to draw the following network:

We used the results obtained with the 1.1x perturbation to draw the following network:

Figure 4B: The network obtained by applying the method on the 1.1x perturbation data

The calculated regulatory strengths describe a gene regulatory network
(Fig. 4B). This includes interactions from B to A, from D to A, from A on C and from E on C - these correspond
to the interactions that the complexes have on target genes, but are here shown as additive. This is similar to
the previous case of hidden variables: since we do not consider the complexes, there are extra arrows going directly
from the complex components to the target genes. There are also additional interactions of A on E and E on A, and B
on D and D on B which are all negative.

The model can be downloaded here as a
Gepasi file. The co-control
coeffients can be calculated from the steady state concentrations
of mRNA using a excel spreadsheet, which can be be downloaded
here. The calculated co-control
matrices can be inverted to obtain the regulatory strenght matrices
by using for example Mathematica or a Java based web program for
matrix operations (here).

When the exact magnitude of the transcription rate perturbations is known we can use regular
metabolic control analysis. We can than split up the regulatory
effects into its two constituent parts: an elasticity coefficient (how a rate of transcription changes with a
change in an mRNA) and a control coefficient (how the steady state concentration of a mRNA
changes with a change in transcription rate). Control coefficients can be calculated from
micro array data, provided that the exact magnitudes of the perturbations in transcription
rates are known. This requires measurements of the changes in transcription rates or a system
to quantitatively and consistently increase the rate of transcription. The elasticity coefficients
can be obtained by inverting the matrix of control coefficients:

(18)

We analysed the 3-gene network (above) and calculated the control matrix from the data in Table 1:

(18)

We analysed the 3-gene network (above) and calculated the control matrix from the data in Table 1:

(19)

By inverting this matrix the elasticity coefficients were obtained:

(20)

From this matrices we can draw the wiring diagram and assign quantitative measures to each
interaction in terms of the elasticities (at the dashed arrows) and the control coefficients
(at the solid arrows).

Figure5: The network obtained from the control and elasticity matrices

Table5: The values for the control and elasticity coefficients determined in three different series of perturbation experiments

Theoretical | 1.1x | 0.5x | 2x | |

0.9 | 0.86 | 1.18 | 0.62 | |

0.638 | 0.6 | 0.93 | 0.4 | |

0.638 | 0.61 | 0.85 | 0.44 | |

-1 | -1.05 | -0.742 | -1.49 | |

-0.835 | -0.88 | -0.601 | -1.28 | |

0 | 0.001 | -0.004 | 0.006 | |

0 | 0.001 | -0.007 | -0.011 | |

-1 | -1.06 | -0.681 | -1.588 | |

0.494 | 0.516 | 0.373 | 0.708 | |

0.378 | 0.387 | 0.328 | 0.475 | |

-0.835 | -0.892 | -0.522 | -1.35 | |

-1 | -1.05 | -0.739 | -1.451 |