Software Tutorial: Hunting for Loops in Transcription Factor Networks

In this tutorial, we will build a Jupyter Notebook to analyze loops in the E. coli transcription factor network, which can be downloaded here. If you would like to jump to the end of the analysis, you can download the complete Jupyter Notebook here.

You will also need the following helper file: Python File

Before running this tutorial, make sure that the following software and packages are installed.

Warning: Be careful of the igraph installation and follow the website instructions carefully. When installing via pip or conda, specify “python-igraph” instead of “igraph”.

Installation Link Version1 Check Install
Python3 3.7 python –version
Jupyter Notebook 4.4.0 jupyter –version
python-igraph 0.8.0 conda list or pip list

Create a blank Jupiter notebook titled loops.ipynb and start editing this file below. First, we import the transcription factor network and see how many nodes and edges there are, as well as count the number of loops.

# NOTE: when installing via pip or conda, install python-igraph
from igraph import *
from network_loader import *
import random

txt_file = 'network_tf_tf_clean.txt'

network, vertex_names = open_network(txt_file)

# how many nodes & edges
print("Number of nodes: ", len(network.vs))
print("Number of edges: ", len(network.es))
print("Number of self-loops: ", sum(Graph.is_loop(network)))

If you run your notebook, you should obtain the following statistics.

  • Number of nodes: 197
  • Number of edges: 477
  • Number of self-loops: 130

We can also create a visualization of the network by adding the following line of code to our network.

plot(network, vertex_label=vertex_names, vertex_label_size=8,
     edge_arrow_width=1, edge_arrow_size=0.5, autocurve=True)

Running the notebook now produces the following network.

image-center

Our plan is to compare this network against a random network. The following code will call a function from a package to generate a random network with 197 nodes and 477 edges and plot it. It uses a built in function called random.seed() that takes an integer as input and uses this function to initiate a (pseudo)random number generator that will allow us to generate a random network. There is nothing special about the input value 42 here — or is there?

random.seed(42)
g = Graph.Erdos_Renyi(197,m=477,directed=True, loops=True)
plot(g, edge_arrow_width=1, edge_arrow_size=0.5, autocurve=True)

The resulting network is shown in the figure below.

image-center

The question is how many edges and self-loops this network has, which is handled by the following code.

# how many nodes & edges
print("Number of nodes: ", len(g.vs))
print("Number of edges: ", len(g.es))
print("Number of self-loops: ", sum(Graph.is_loop(g)))

This code produces the following statistics for the random network.

  • Number of nodes: 197
  • Number of edges: 477
  • Number of self-loops: 5

The number of self-loops is significantly lower in the random network compared to the real transcription factor network.

STOP: Change the input integer to random.seed to any integer you like. How does it affect the number of nodes, edges, and self-loops? Try changing the input to a few different values.

Regardless of what seed value we use, we can confirm that the number of self-loops expected in a random graph is significantly lower than in the real E. coli network. Back in the main text, we will discuss this significance and then see if we can determine why autoregulation has arisen.

Return to main text

  1. Other versions may be compatible with this code, but those listed are known to work for this tutorial.