Set Enrichment Setup

How to setup set enrichment

Set Enrichment is an optional scoring mechanism where the user can provide input files containing genes that are grouped in sets.

Input file Formats

Supported set file formats are either JSON or CSV.

The JSON format is as follows:

{
    "set1": ["<gene1>", "<gene2>", ...],
    "set2": ["<gene1>", "<gene2>", ...],
    ...
}

The CSV format is specified like this:

set1,<gene1>;<gene2>;...
set2,<gene1>;<gene2>;...

Configuration

Users who want to use the set enrichment function need to specify it in a custom pipeline file

{
    "row-scoring": {
        "id": "combiner",
        "function": { "module": "cmonkey.scoring", "class": "ScoringFunctionCombiner" },
        "args": {
            "functions": [
                { "id": "Rows",
                    "function": { "module": "cmonkey.microarray", "class": "RowScoringFunction" }
                },
                { "id": "SetEnrichment",
                    "function": { "module": "cmonkey.set_enrichment", "class": "ScoringFunction" }
                },
                ...
            ]
        }
    },
    ...
}

In addition, the user needs to provide the Enrichment settings in an .ini file:

[SetEnrichment]
schedule = 1,7
scaling_rvec=seq(1e-5, 0.5, length=num_iterations*3/4)
set_types = settype1

[SetEnrichment-settype1]
set_file = example_data/eco/eco-merged-set.json
weight = 1.0

As shown, the SetEnrichment section specifies the schedule and the scaling within the row scoring functions as well as a single set type. The user can specify any number of set types, by providing a list of names in the set_types setting, separated by comma. Set type specific settings are then made in the section "SetEnrichment-", currently these are the set file and the weight that indicates the percentage that scoring on a set type contributes to set enrichment scoring.