Supplementary information for:

'A quantitative method for reverse engineering gene networks from microarray experiments using regulatory strengths'

Alberto de la Fuente, Paul Brazhnik and Pedro Mendes

alf@vt.edu    brazhnik@vt.edu    mendes@vt.edu

Biochemical Networks Modeling Group

     Virginia Bioinformatics Institute


Index

[Abstract]
[Analysis of a 3-gene network]
[Analysis of three genes in a 5-gene network]
[Analysis of a 5-gene network in which complex formation occurs]
[Analysis of the 3-gene network with control coefficients and elasticities ]


Abstract

We propose a method for reverse engineering gene regulatory networks from microarray gene expression data. The method is based on metabolic control analysis and requires data from microarray gene expression experiments where the rate of change of a single gene has been perturbed. Gene regulatory interactions are quantified through "regulatory strengths" which are determined from co-responses of messenger RNA to a common perturbation. The application of the method is illustrated by analyzing data produced with computer simulations of gene regulatory networks. Evidence is shown indicating that the method performs well in some cases even when perturbations are as large as two-fold. A global gene regulatory network can be uncovered by this method if applied systematically to all genes in a genome. Alternatively, it can be applied to a subset of genes in which case it identifies a phenomenological gene network that contains, beside direct interactions, contributions from indirect interactions (via the genes that were not considered or through complex formation of gene products).


Analysis of a 3-gene network

To demonstrate the method we here perform the analysis on a model of a network of only three genes (figure 1). The main motivation for employing the method on simulated data is that in this case the structure of the system that produced the data is exactly known, in contrast to data produced by biological systems. We think that all methods used in gene expression analysis should be tested to simulated data before applied with confidence to the much larger biological data sets. The model is described by non-linear ordinary differential equations as given by equation 1.




Figure 1: A three gene regulatory network

(1)       
Where [A], [B] and [C] are the concentrations of the mRNA species. Va, Vb and Vc are basal rates of transcription, the KiB and KiB' are inhibition constants, KaC and KaA are activation constants, ka, kb, and kc are first-order degradation constants. A reference steady state for [A], [B] and [C] was calculated setting all parameter values arbitrarily to unity, exept KiB , KiB' , KaA and KaC were set to 0.1.

The method requires a first measurement of concentrations of mRNA in a certain reference steady state. Perturbations are made by increasing the rate of transcription of single genes and the effect on the steady state concentrations of all mRNA species are measured in response to that perturbation. Perturbations of where made by increasing the rate constants V by 10% in individual experiments. The data is given in table 1.


Table 1: Steady state responses of mRNA concentrations in the 3-gene network to a 10% increase in transcription rates
Steady state concentrations A B C
Steady state 0 0.164881 0.506499 0.102634
Perturbation rate a 0.179707 0.512106 0.104963
Perturbation rate b 0.156757 0.512106 0.104963
Perturbation rate c 0.160838 0.521745 0.109093


The co-control coefficient defined for the mRNA concentrations of two genes i and j when a rate of transcription vm is perturbed is defined as:

(2)       

where FR is the fluorescence ratio as measured in micro arrays and gene-chips:

(3)       

where F is the fluorescence measured in the state before the purturbation (reference state) and F' the the fluorescence in the state after the perturbation, which is assumed to be proportional with the concentrations of mRNA in the reference state, [mRNA]i and in the perturbed state [mRNA]i' respectively.
While the co-control coefficients are defined for infinitisimal perturbations, the right hand side of the equation represents the finite estimation for the co-control coefficient. This is the form of the equation we used to calculate the co-control coefficients from the data in table 1.

Three different co-control matrices, each with respect to another variable, are calculated.

One with respect to the control coefficients on A. In this matrix the control coefficients are devided by the control coefficients of A:

(4)       


Using the data from Table 1 we calculated the following values for the co-control coefficients:

(5)       


One with respect to the control coefficients on B. In this matrix the control coefficients are devided by the control coefficients of B:

(6)       


Using the data from Table 1 we calculated the following values for the co-control coefficients:

(7)       


One with respect to the control coefficients on C. In this matrix the control coefficients are devided by the control coefficients of C:

(8)       


Using the data from Table 1 we calculated the following values for the co-control coefficients:

(9)       


To describe interactions between genes we use a coefficient called the regulatory strength. The regulatory strength, as defined by equation 3 in the paper,

(10)       


describes how much of the concentration of a certain mRNA is regulated by the concentration of another mRNA or its own concentration. Inverting the co-control matrices gives the matrices containing the regulatory strenths:

By inverting the co-control matrix in A we obtain the Regulatory strength matrix to A:

(11)       

By inverting the co-control matrix in B we obtain the Regulatory strength matrix to B:

(12)       

By inverting the co-control matrix in C we obtain the Regulatory strength matrix to C:

(13)       

To infer genetic networks we are only interested in the direct effects of genes on eachother. In the regulatory strength matrices only some rowes represent direct effects, the other elements represent regulatory effects that are indirect. The top row of the regulatory strength matrix in A gives direct effects on A, the middel row of the regulatory strength martix in B gives the direct effects on B and the bottom row in the regulatory strength matrix in C gives the direct effects on C. Mathematically the relevant rows can be joined in a single matrix as follows:


(14)       

The obtained matrix is one with direct regulatory strengths in its elements. This matrix represent the genetic network and is called the genetic interaction matrix. For the system under investigation we obtained:


(15)       


Using this matrix we draw the structure of the genetic network. We consider regulatory strengths lower than 0.055 to be zero, because we know that errors are obtained by making the step from infinitisimal to finite perturbations. We obtained the following network:



Figure2: The network obtained from the gene interaction matrix


Table2: The values for the regulatory coefficients determined in three different series of perturbation experiments
Theoretical 1.1x 0.5x 2x
-0.901 -0.903 -0.879 -0.916
-0.752 -0.757 -0.711 -0.787
0 0.001 -0.005 -0.004
0 0.001 -0.006 0.005
-0.638 -0.639 -0.635 -0.646
0.315 0.311 0.348 0.288
0.241 0.236 0.279 0.213
-0.533 -0.544 -0.444 -0.607
-0.638 -0.639 -0.628 -0.651


The model can be downloaded here as a Gepasi file. The co-control coeffients can be calculated from the steady state concentrations of mRNA using a excel spreadsheet, which can be be downloaded here. The calculated co-control matrices can be inverted to obtain the regulatory strenght matrices by using for example Mathematica or a Java based web program for matrix operations (here).

Analysis of three genes in a 5-gene network



In cases where one cannot measure the expression of all the genes or when it is not convenient or when their transcription rate is not easily perturbed), the method has to be applied only to a subset of the genes. In those cases only the rates of transcription of those genes and their mRNA concentrations are measured. Nevertheless, a larger network is responsible for the observations and so one can no longer be sure if the interactions detected by the method are really direct or happen to be the result of the hidden variables. To explore such a scenario we constructed the 5-gene network of Fig. 2A with the kinetics given in equation 16. Then perturbed the transcription rates of genes C, D and E by 0.5x, 1.1x and 2x perturbations and "measured" their corresponding relative mRNA responses. We calculated all the direct-effect regulatory strengths in the same way as for the 3-gene network (above). The results are given in table 3.



Figure3A: The complete 5-gene network


(16)       

Table3: The values for the regulatory coefficients determined in three different series of perturbation experiments
*The calculated value differs from the theoretical value due to indirect interaction through a hidden variable
Theoretical 1.1x 0.5x 2x
-2.15 -2.31 -1.65 -4.57
0 0.013 -0.049 0.149
-1.44 -1.61 0.926 -3.878
0* -0.302 -0.07 -1.36
-1.04 -1.04 -1.07 -0.984
0 -0.052 0.145 -1.01
-1.65 -1.81 -1.11 -4
0.268 0.267 0.352 0.336
-2.15 -2.3 -1.69 -4.43


Using the result of the 1.1x perturbation and a threshold value of 0.055 we could draw the following network:


Figure 3B: The 3-gene network obtained by analysis of three genes of the 5-gene network


Comparing the networks of Fig. 2, it is clear that all the direct interactions on the original network between genes C, D and E were all recovered from the regulatory strengths, In addition there is now a new arrow from C to D that does not exist in the original network (Fig. 2A). In the real system, C influences B and B influences D, but because B was not included in the analysis these two interactions collapse into a single one which is, of course, only apparent. What this reveals is that if one only considers in the analysis a subset of genes, then the interactions may not be really direct, but rather pass through the action of hidden variables. Incidentally this is the same as if one considers that in most cases it is not the mRNAs that interact with gene transcription but rather their protein products. Because the proteins are not being considered here, they are just as well hidden variables and their action is included in the arrows of the gene regulatory network.


The model can be downloaded here as a Gepasi file. The co-control coeffients can be calculated from the steady state concentrations of mRNA using a excel spreadsheet, which can be be downloaded here. The calculated co-control matrices can be inverted to obtain the regulatory strenght matrices by using for example Mathematica or a Java based web program for matrix operations (here).

Analysis of a 5-gene network in which complex formation occurs


Our model of gene networks is a purely additive one, which stems from our use of metabolic control analysis and this is based on a first-order Taylor approximation. However it is not hard to conceive (or find examples) where the regulation of the expression of a gene depends on a combination of several other genes. For example it may be that the products of gene A and gene B have to bind before being able to activate gene C. Because these may be frequent, it is relevant to see how our method performs with such networks. For that we have created the gene network of Fig. 3A. The main feature of this network is that there is the formation of two "adduct" (AE and BD) that are regulators of other genes.

Figure 4A: The complete 5-network in which two regulatory complexes are formed


The system is discribed by the following differential equations:

(17)       

Table4: The values for the regulatory coefficients determined for the model with complex formation
*The calculated value differs from the theoretical value due to indirect interaction through a hidden variable
Theoretical 1.1x 0.5x 2x
-0.942 -0.952 -0.904 -1.07
0* -0.264 -0.176 -0.347
0 -0.041 0.24 -0.39
0* -0.258 -0.209 -0.304
0* -0.066 0.005 -0.239
0 0.003 0.011 -0.049
-0.842 -0.839 -0.865 -0.825
-0.738 -0.748 -0.706 -0.865
0* -0.257 -0.211 -0.305
0 -0.004 0.011 -0.057
0* -0.384 -0.277 -0.581
0 -0.003 0.008 -0.039
-1.27 -1.29 -1.14 -1.59
0 0.001 -0.011 0.006
0* -0.389 -0.264 -0.652
0 -0.005 -0.021 -0.063
0.476* 0.394 0.499 0.324
0.476* -0.023 0.144 -0.212
-0.888 -0.887 -0.896 -0.879
0 -0.005 -0.021 -0.074
0* -0.602 -0.422 -0.895
0 -0.004 -0.003 -0.068
-1.23 -1.28 -1.02 -1.66
0.267 0.257 0.351 0.199
-1.41 -1.44 -1.28 -1.76


We performed 0.5x, 1.1x and 2x perturbations on all the transcription rates and observed the response on the expression levels of A, B, C, D and E (the levels of AE and BD were not taken into account).

We used the results obtained with the 1.1x perturbation to draw the following network:

Figure 4B: The network obtained by applying the method on the 1.1x perturbation data


The calculated regulatory strengths describe a gene regulatory network (Fig. 4B). This includes interactions from B to A, from D to A, from A on C and from E on C - these correspond to the interactions that the complexes have on target genes, but are here shown as additive. This is similar to the previous case of hidden variables: since we do not consider the complexes, there are extra arrows going directly from the complex components to the target genes. There are also additional interactions of A on E and E on A, and B on D and D on B which are all negative.


The model can be downloaded here as a Gepasi file. The co-control coeffients can be calculated from the steady state concentrations of mRNA using a excel spreadsheet, which can be be downloaded here. The calculated co-control matrices can be inverted to obtain the regulatory strenght matrices by using for example Mathematica or a Java based web program for matrix operations (here).

Analysis of the 3-gene network with control coefficients and elasticities



When the exact magnitude of the transcription rate perturbations is known we can use regular metabolic control analysis. We can than split up the regulatory effects into its two constituent parts: an elasticity coefficient (how a rate of transcription changes with a change in an mRNA) and a control coefficient (how the steady state concentration of a mRNA changes with a change in transcription rate). Control coefficients can be calculated from micro array data, provided that the exact magnitudes of the perturbations in transcription rates are known. This requires measurements of the changes in transcription rates or a system to quantitatively and consistently increase the rate of transcription. The elasticity coefficients can be obtained by inverting the matrix of control coefficients:

(18)       

We analysed the 3-gene network (above) and calculated the control matrix from the data in Table 1:


(19)       

By inverting this matrix the elasticity coefficients were obtained:

(20)       

From this matrices we can draw the wiring diagram and assign quantitative measures to each interaction in terms of the elasticities (at the dashed arrows) and the control coefficients (at the solid arrows).



Figure5: The network obtained from the control and elasticity matrices


Table5: The values for the control and elasticity coefficients determined in three different series of perturbation experiments
Theoretical 1.1x 0.5x 2x
0.9 0.86 1.18 0.62
0.638 0.6 0.93 0.4
0.638 0.61 0.85 0.44
-1 -1.05 -0.742 -1.49
-0.835 -0.88 -0.601 -1.28
0 0.001 -0.004 0.006
0 0.001 -0.007 -0.011
-1 -1.06 -0.681 -1.588
0.494 0.516 0.373 0.708
0.378 0.387 0.328 0.475
-0.835 -0.892 -0.522 -1.35
-1 -1.05 -0.739 -1.451


The model can be downloaded here as a Gepasi file. The co-control coeffients can be calculated from the steady state concentrations of mRNA using a excel spreadsheet, which can be be downloaded here. The calculated co-control matrices can be inverted to obtain the regulatory strenght matrices by using for example Mathematica or a Java based web program for matrix operations (here).