Set Enrichment Setup ==================== How to setup set enrichment --------------------------- Set Enrichment is an optional scoring mechanism where the user can provide input files containing genes that are grouped in sets. Input file Formats ~~~~~~~~~~~~~~~~~~ Supported set file formats are either JSON or CSV. The JSON format is as follows: .. highlight:: none :: { "set1": ["", "", ...], "set2": ["", "", ...], ... } The CSV format is specified like this: .. highlight:: none :: set1,;;... set2,;;... Configuration ~~~~~~~~~~~~~ Users who want to use the set enrichment function need to specify it in a custom pipeline file .. highlight:: none :: { "row-scoring": { "id": "combiner", "function": { "module": "cmonkey.scoring", "class": "ScoringFunctionCombiner" }, "args": { "functions": [ { "id": "Rows", "function": { "module": "cmonkey.microarray", "class": "RowScoringFunction" } }, { "id": "SetEnrichment", "function": { "module": "cmonkey.set_enrichment", "class": "ScoringFunction" } }, ... ] } }, ... } In addition, the user needs to provide the Enrichment settings in an .ini file: .. highlight:: none :: [SetEnrichment] schedule = 1,7 scaling_rvec=seq(1e-5, 0.5, length=num_iterations*3/4) set_types = settype1 [SetEnrichment-settype1] set_file = example_data/eco/eco-merged-set.json weight = 1.0 As shown, the ``SetEnrichment`` section specifies the schedule and the scaling within the row scoring functions as well as a single set type. The user can specify any number of set types, by providing a list of names in the set_types setting, separated by comma. Set type specific settings are then made in the section ``"SetEnrichment-"``, currently these are the set file and the weight that indicates the percentage that scoring on a set type contributes to set enrichment scoring.