updated October 28, 2007
(site moved, links updated)
A Collection of Artificial Gene Networks
Not many real gene networks are well-known to date. All that is available are portions of the gene networks of some well-studied cells, such as E. coli. But many scientific questions depend on the particular details of gene networks. One of our main research themes is to develop methods to uncover the structure of gene networks. In order to know if those methods work at all, we started testing them with simulated experiments on artificial gene networks. Several of our collaborators have asked us for such synthetic data sets, and this small project of generating artificial gene expression data has snowballed into a project on its own.
Artificial gene networks are useful for:
- generating data sets to test inference methods, and other gene expression analyses
- gaining understanding of the underlying factors involved in gene expression dynamics
We use the biochemical simulator Gepasi to generate data for the dynamics of our gene networks (now superseded by COPASI). Gepasi uses a continuous representation of biochemical reactions, based on differential equations. In the beggining we have generated the gene network models in arbitrary ways. These were mostly small networks and yet the process was tedious. Since then we have created gene network generators that create models based on several characteristics of interest. We call this system A—Biochem.
We make the networks and data available. Gene network models are supplied in their original Gepasi file format, ready to be used by that simulator. We also supply them in SBML level 1 format, which other simulators can read. There are also diagrams and statistics about each network available. Each artificial gene network has a web page of its own!
Boolean? Not really...
Gene networks are often modeled as boolean networks, i.e. the level of gene products is represented with 0 or 1. Alternatively, they have also been modeled where 0 represents no active transcription and 1 represent active transcription. Boolean models are OK for many purposes, but we aim to represent gene networks with a little more detail... Our models represent the level of gene products and the rates of reaction as continuous values. We use ordinary differential equations for this purpose. It is possible that the SBML files we created could be used by boolean simulators. If you can do use them in this way, you are most welcome. (and we would like to know!)
What makes gene networks different?
Exactly because we do not know many real gene networks, it is important that we generate gene networks covering a wide range of properties, such as network sizes, topologies and kinetics.
The topologies that we have generated to date are:
- Erdös-Renyi-like random networks, that have been extensively studied as gene networks by Stuart Kauffman
- Ordered lattices
- Watts-Strogatz (small-world) networks
- Albert-Barabási (scale-free) networks
The rate laws of transcription and mRNA degradation determine to a large extent the dynamics of gene networks. Rate laws are mathematical expressions that relate the rate of reaction (transcription, etc.) to the concentration of several substances (effectors). The rate of transcription responds to the concentrations of nucleotides, RNA polymerase, and transcription factors. In our models we ignore the effect of the nucleotides and polymerase (i.e. they're assumed to be constant); all other effects come from other gene products and could be positive (activation) or negative (inhibition). It is the way in which induction and repression work that we change in the networks. The networks will cover several phenomena:
- all inhibitors, all activators, and several proportions of numbers of inhibitors/activators
- independent (competitive) effects, where activations may overcome inhibitions and vice versa
- dominant inhibition, where the effect of activators does not overcome inhibitions
- dominant activation, the opposite of the previous one
- cooperativity effects (sigmoidal inhibition and activation)
This project was funded by the Virginia Bioinformatics Institute and the National Science Foundation (grant BES-0120306).