Walkthrough of a Complete cMonkey₂ Run¶

Setting up for a cMonkey₂ run¶

Input required The only required input are

The gene expression matrix file (tab-separated values file)

The KEGG organism code (e.g., eco for E. coli; hal for H. salinarum)

Input automatically downloaded Given the KEGG organism code, cMonkey₂ downloads the following data from various online databases and caches them locally:

Genome sequences and annotations (currently from RSAT)

Operon membership predictions (for prokaryotes)

STRING functional gene associations

Input optional Custom files may be use as additional input (or to replace downloaded files):

Network

Genome

Annotation

Starting the run¶

cMonkey₂ is typically started by running the front-end cmonkey2 with additional parameters.

Example:

cmonkey2 --organism hal --rsat_base_url http://networks.systemsbiology.net/rsat ratios.tsv

The following phases will be executed:

Initialization

Scoring

Post-processing

Initialization steps¶

Before cMonkey₂ runs the scoring iterations, it ensures that the required data is in place via the following steps:

normalize the ratio matrix

retrieve organism information and data from RSAT

build gene synonym lookup table

retrieve gene locations and parse promoter sequences

download and normalize operon network from Microbes Online (if necessary)

download and normalize STRING network (if necessary)

initialize the starting biclusters

What is happening at each iteration¶

These are the essential steps in a cMonkey₂ iteration:

run row scoring functions

run column scoring function

combine the scores via the combiner

use scores to update cluster membership

Each scoring function can be individually configured to follow a specific schedule and the weights which define how much its results contribute to the final combined score can be updated following that schedule as well.

It should be noted that motif scoring has two schedules, one for running the MEME suite pipeline (typically every 100 iterations) and another one to apply the results of the MEME suite pipeline step to the current clusters memberships.

Inside a scoring module¶

The building blocks of a cMonkey₂ run are scoring modules. A scoring module is a Python class that implements the methods

def compute(self, iteration_result, reference_matrix)
def compute_force(self, iteration_result, reference_matrix)

The only difference between these do functions is that compute_force() always performs a computation, while compute() only runs in the iterations it is scheduled for.

The result of these calls is a DataMatrix object, where the columns represent the clusters, and the rows represent the genes or conditions, depending on whether the scoring function is a row or a column scoring function. Each cell in the matrix contains the corresponding score for a gene/condition in a specific cluster.

cMonkey₂ comes with a number of built-in scoring modules, which are part of the standard scoring setup. It is possible to create user-defined scoring modules; in this case it is recommended to inherit from the ScoringFunctionBase class, which provides a lot of useful functionality.

Monitoring progress¶

Throughout the computation, cMonkey₂ writes out its progress into its result database. Users can directly query the data contained in the database using regular data extraction tools. cMonkey₂ also comes with a web browser-based monitoring application that can be used to view a graphical representation of the run.

Walkthrough of a Complete cMonkey₂ Run¶

Setting up for a cMonkey₂ run¶

Starting the run¶

Initialization steps¶

What is happening at each iteration¶

Inside a scoring module¶

Monitoring progress¶

Post-processing steps¶

Results data format¶

Viewing results¶

Plugging in cMonkey Results for downstream analysis¶

cmonkey2

Navigation

Related Topics

Walkthrough of a Complete cMonkey2 Run¶

Setting up for a cMonkey2 run¶

Starting the run¶

Initialization steps¶

What is happening at each iteration¶

Inside a scoring module¶

Monitoring progress¶

Post-processing steps¶

Results data format¶

Viewing results¶

Plugging in cMonkey Results for downstream analysis¶

Walkthrough of a Complete cMonkey₂ Run¶

Setting up for a cMonkey₂ run¶