# How EDISON Contributed to Network Inference

No, not this Edison. I’m talking about my software package EDISON (Estimation of Directed Interactions from Sequences Of Non-homogeneous gene expression)*.

**What it does**

Network inference. More precisely, network inference from time series data.

Imagine you have a dataset to analyse where a number of variables have been measured at regular intervals. You might ask, what are the relationships between these variables. In other words, if you were to link every two variables that depend on each other, what would the network you obtain look like. EDISON uses the observed data to infer the underlying network.

Now there are plenty of packages for network inference. What makes EDISON stand out is that it can also answer a second question: do the relationships between the variables change over time. The software package can identify changepoints in the time series of measurements where the structure of the underlying network changes.

**Under the Hood**

The underlying model for EDISON is a dynamic Bayesian network (DBN) model with a changepoint process. Without going into too many details, the changepoint process partitions the time series into segments, with the network \(M^h\) in each segment \(h\) modelling the data with a different DBN:

$$X^h \sim DBN(M^h, \theta_h)$$

with \(\theta_h\) the parameters of the network. Within each DBN, the relationship between variables is modelled by a linear regression:

$$X_i(t) = a^h_{i0} + \sum_{j \in M^h_i} a^h_{ij}X_j(t-1) + \epsilon_i(t)$$

where \(X_i(t)\) is the value of variable \(i\) at time \(t\). Below is a schematic illustration of the changepoint process.

**Practical Uses**

There are many potential uses for network inference, and many situations where the network could change over time. In my work, I have used this package for inferring gene regulatory networks; in other words, finding out which genes influence which other genes. This can change over time, for example if the environment changes, or if the organism is still developing. I have also looked at protein signalling networks that show how proteins pass on messages within the cell. This function is often disrupted or changed in tumor cells, hence the need for changepoint methods.

**Where to get it**

EDISON is an R package and is available for free on CRAN:

http://cran.r-project.org/web/packages/EDISON/index.html

*Historical anecdote about the name: We came up with EDISON as a good-natured bit of rivalry with the group of Eric Xing, which had developed the package TESLA that serves a similar purpose. Yes, we realise that Thomas Edison was the mean one and Nicolai Tesla was the nice one.

Is this package capable of handling 10,000’s of genes? I’m having difficulties using R packages (like GRENITS) for constructing a DBN on a full gene expression data set. I understand this is because BNs are NP-hard. If your package isn’t meant to work with such a large data set, do you have any suggestions for how to subset the full data set assuming no prior knowledge?

No, this package is designed for small systems with tens of genes. I have run a version of this on a larger set of genes (~8,000), but this required specifying in advance a small (several hundred) set of transcription factors.

Without any prior knowledge, you have the option of dimensionality reduction via some criterion (i.e. explained variance in the case of PCA), or clustering of variables (which is essentially a different form of dimensionality reduction, but can be more interpretable). You might also filter out a number of your genes according to variance, if you think low or high variance genes are of less importance.

Thank you so much!