Result Database Schema

Introduction

cmonkey-python writes the results of its computation to an SQLite database. This choice was made, because SQLite is a free, open source and portable data store which is available on many systems and has programming interfaces to a large number of programming languages. Another important aspect is that the entire database is stored in a single file, which can be easily copied, archived and analyzed. In this section the database structure and its function is explained in further detail.

Tables

Note 1: The tables ending in _stats are only used in the cluster_viewer application and are subject to change.

Note 2: SQLite is different from other RDBMS in that each table has an implicit column rowid that acts like an auto incremented integer valued primary key. It is normally not shown in the frontend, but we will add it here for clarity

run_infos

rowid          int
start_time     timestamp
finish_time    timestamp
num_iterations int
last_iteration int
organism       text
species        text
ncbi_code      int
num_rows       int
num_columns    int
num_clusters   int
git_sha        text

This table represents the current information about a cmonkey run and only stores a single entry that is continuously updated until a run is finished.

row_names, column_names

rowid     int
order_num int
name      text

These two tables are structurally identical. They reflect the structure of the input gene expression matrix, to preserve the order of the rows and columns, their order is stored as well.

row_members, column_members

rowid     int
iteration int
cluster   int
order_num int

These tables contain the row and column members for each iteration and cluster. The element order_num references an order_num in its respective row_names/column_names table.

cluster_stats

rowid     int
iteration int
cluster   int
num_rows  int
num_cols  int
residual  decimal

Stores the residual values, number of rows and columns for each iteration and cluster.

motif_infos

rowid     int
iteration int
cluster   int
seqtype   text
motif_num int
evalue    decimal

Basic information about a motif that cmonkey thinks is associated with a specific cluster.

meme_motif_sites

rowid         int
motif_info_id int  /* references motif_infos.rowid */
seq_name      int
reverse       boolean
start         int
pvalue        decimal
flank_left    text
seq           text
flank_right   text

Detailed positional MEME information for a motif.

motif_annotations

rowid         int
motif_info_id int  /* references motif_infos.rowid */
iteration     int
gene_num      int
position      int
reverse       boolean
pvalue        decimal

Positional information for a motif that was obtained from MAST.

motif_pssm_rows

rowid         int
motif_info_id int  /* references motif_infos.rowid */
iteration     int
row           int
a             decimal
c             decimal
g             decimal
t             decimal

Rows of the PSSM for a motif.

global_background

rowid       int
subsequence text
pvalue      decimal

If the run uses a global background file, this table stores the entries that were generated.

statstypes

rowid      int
category   text
name       text

iteration_stats

rowid      int
statstype  int
iteration  int
score      decimal