Title: | Reverse Ecology Analysis on Microbiome |
---|---|
Description: | An implementation of the reverse ecology framework. Reverse ecology refers to the use of genomics to study ecology with no a priori assumptions about the organism(s) under consideration, linking organisms to their environment. It allows researchers to reconstruct the metabolic networks and study the ecology of poorly characterized microbial species from their genomic information, and has substantial potentials for microbial community ecological analysis. |
Authors: | Yang Cao [aut, cre] |
Maintainer: | Yang Cao <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.99.3 |
Built: | 2025-02-24 03:32:07 UTC |
Source: | https://github.com/yiluheihei/revecor |
A dataset containing the the KEGG orthology annotation profiles of seven oral species which was downloaded from the Integrated Microbial Genomes (IMG).
A list with seven elements and each elements represents the annotation profile of the species
This datasets constains the KEGG orthology annotation information of seven oral species whose interactions were carefully and well characterized. The human oral microbiota is relatively #' well described. The name of these seven species is: Aggregatibacter, actinomycetemcomitans D7S-1, Fusobacterium nucleatum polymorphum ATCC 10953, Porphyromonas gingivalis ATCC 33277, Streptococcus gordonii str. Challis substr. CH1, Streptococcus oralis SK23, ATCC 35037, Veillonella atypica ACS-134-V-Col7a. For more annotation information on these species, see img.jgi.doe.gov/.
Aa, Aggregatibacter actinomycetemcomitans D7S-1
Ao, Actinomyces oris K20
Fn, Fusobacterium nucleatum polymorphum ATCC 10953
Pg, Porphyromonas gingivalis ATCC 33277
Sg, Streptococcus gordonii str. Challis substr. CH1
So, Streptococcus oralis SK23, ATCC 35037
Va, Veillonella atypica ACS-134-V-Col7a
data(anno.species)
data(anno.species)
Calculating the metabolic competition complementarity index among all metabolic networks
calculateCooperationIndex(g, ..., threshold = 0, p = FALSE, nperm = 1000)
calculateCooperationIndex(g, ..., threshold = 0, p = FALSE, nperm = 1000)
g |
igraph that represents a metabolic network, see |
... |
a list of metabolic networks or a network append to g |
threshold |
threshold, the cutoff of confidence score to be serve as a seed set, default is 0.2 |
p |
a logical value which determins whether the calculated index is statistical or biological significant. default is FALSE |
nperm |
the number of permuations of metabolic network node labes, which is used for p value calculation, default is 1000. |
Metabolic competition index is defined as the fraction of compounds in a species seed set of metabolic network that are alse included in its partner; However, metabolic complementarity index is the fraction of compounds in one species seed set of metabolic network appearing in the metabolic network but not in the seed set of its partner; The biosynthetic support score represents the extent to which the metabolic requirements of a potential parasitic organism can be supported by the biosynthetic capacity of a potential host. It is measured by calculating the fraction of the source components of a, in which at least one of the compounds can be found in the network of b. However, seed compounds are associated with a confidence score (1/size of SCC), so this fraction is calculated as a mormalized weighted sum.
The ith row and jth col elements of the returnd matrix represents the metabolic competition index or complementarity index of the ith network on the jth metabolic network.
a cooperation index matrix whose nrow and ncol is equal to the number of species to be compared, for more see details.
complementarityIndex
,
competitionIndex
## Not run: ## metabolic network reconstruction and seed set identity of sample data anno.species net <- lapply(anno.species,reconstructGsMN) interactions <- calculateCooperationIndex(net) ## End(Not run)
## Not run: ## metabolic network reconstruction and seed set identity of sample data anno.species net <- lapply(anno.species,reconstructGsMN) interactions <- calculateCooperationIndex(net) ## End(Not run)
In infix and prefix forms.
compose(...) f %.% g
compose(...) f %.% g
... |
n functions to apply in order from right to left |
f , g
|
two functions to compose for the infix form |
This function was from hadley wickham's package pryr, for more details see https://github.com/hadley/pryr
Hadley wickham
not_null <- `!` %.% is.null not_null(4) not_null(NULL) add1 <- function(x) x + 1 compose(add1,add1)(8)
not_null <- `!` %.% is.null not_null(4) not_null(NULL) add1 <- function(x) x + 1 compose(add1,add1)(8)
Caculate confidence score of seed set
confidencescore(object) ## S4 method for signature 'seedset' confidencescore(object)
confidencescore(object) ## S4 method for signature 'seedset' confidencescore(object)
object |
|
a list
## Not run: confidencescore(seed.set) ## End(Not run)
## Not run: confidencescore(seed.set) ## End(Not run)
T he genome scale metabolic network (GsMN) whose seed set is caculated.
getGsMN(object) ## S4 method for signature 'seedset' getGsMN(object)
getGsMN(object) ## S4 method for signature 'seedset' getGsMN(object)
object |
|
a igraph
## Not run: getGsMN(seed.set) ## End(Not run)
## Not run: getGsMN(seed.set) ## End(Not run)
This function helps us to obtain the specific-organism pathway map, prasing this maps to get metabolic data contains reaction, substrate and product.
getOrgMetabolicData(org)
getOrgMetabolicData(org)
org |
characters, the KEGG organism code, e.g. "buc". |
Function getOrgMetabolicData
helps us to download metabolic
data of a given organism from KEGG database with REST-style KEGG API.
Enzyme reactions take place in this organism (org) and its metabolites
(substrates and products), that will be used for organism-specific genome
scale metabolic network reconstruction, can be obtained with this function.
a three length df, consists of enzyme reaction names, substrates and products
## Not run: metabolic.data <- getOrgMetabolicData("buc") ## End(Not run)
## Not run: metabolic.data <- getOrgMetabolicData("buc") ## End(Not run)
Detect a given metabolic network and idendity the seed compounds of each organism
getSeedSets(g, threshold = 0)
getSeedSets(g, threshold = 0)
g |
an igraph object which represents a given organism-specific metaboliic network |
threshold |
numeric constant ranges from 0 to 1, default is 0. |
All the compound in the same source SCC all equally to be included in the seed set, each of these compounds was assigned a confidence level, C=1/(size of souce SCC), denoting the compounds probability of being a seed. This threshold was used to determin whether a compound should be a seed.
a two-length list which consists of network and the seed set compounds of the given organism-specific metabolic network, ....
## Not run: ## get metabolic annotated data of a specific species metabolic.data <- getOrgMetabolicData("buc") ## metabolic network reconstruction net <- reconstructGsMN(metabolic.data) ## End(Not run)
## Not run: ## get metabolic annotated data of a specific species metabolic.data <- getOrgMetabolicData("buc") ## metabolic network reconstruction net <- reconstructGsMN(metabolic.data) ## End(Not run)
A dataset containing the the KEGG orthology annotation profiles of 116 gut prevlent species which was downloaded from the Integrated Microbial Genomes (IMG).
A list with 116 elements and each elements represents the annotation profile of the species
This dataset focused on a list of 116 prevalent gut species, whose genome sequence is available in IMG database and sequence coverage is more than 1 annotation profiles of this 116 species was collected from IMG database.
With a in-house R script, we obtained genomic data for all organisms from the Department of Integrated Microbial Genomes project (IMG). For each species, the list of genes mapped to the Kyoto Encyclopedia of Genes and Genomes orthologous groups (KEGG KOs) was downloaded. For more annotation information on these species, see img.jgi.doe.gov/.
Calculating the metabolic complementarity index and complementarity index of based on species metabolic network.
complementarityIndex(g1, g2, seed.set1, seed.set2, threshold = 0, p = FALSE, nperm = 1000) competitionIndex(g1, g2, seed.set1, seed.set2, threshold = 0, p = FALSE, nperm = 1000)
complementarityIndex(g1, g2, seed.set1, seed.set2, threshold = 0, p = FALSE, nperm = 1000) competitionIndex(g1, g2, seed.set1, seed.set2, threshold = 0, p = FALSE, nperm = 1000)
g1 |
igraph object, a species-specific metabolic network. |
g2 |
igraph object, a species-specific metabolic network, the complementary network of g1 |
seed.set1 |
seeds slot of a seed-set object, seeds of the metabolic
network g1, more details see |
seed.set2 |
seeds slot of a seed-set object, seeds of the metabolic
network g2, more details see |
threshold |
the cutoff of confidence score to be serve as a seed set, default is 0. |
p |
a logical value which determins whether the calculated index is statistical or biological significant. default is FALSE. |
nperm |
the number of permuations of metabolic network node labes, which is used for complementarity index's P value calculating, default is 1000. |
Metabolic competition index is defined as the fraction of compounds in a species seed set of metabolic network that are also included in its partner; However, metabolic complementarity index is the fraction of compounds in one species seed set of metabolic network appearing in the metabolic network but not in the seed set of its partner. However, seed compounds are associated with a confidence score (1/size of SCC), so this fraction is calculated as a normalized weighted sum.
Based on the metabolic network and seed sets of species, this functions help us to predict the species interactions of species1 on the presence of species2.
a two length list: complementarity index or competition index: range from 0 to 1, p value of complementarity index. Or a single value of complementarity or competition index while p is FALSE.
getSeedSets
,
calculateCooperationIndex
## Not run: ## metabolic network reconstruction and seed set identity of sample data anno.species net <- lapply(anno.species,reconstructGsMN) seed.sets <- lapply(net, getSeedSets) seed.sets <- lapply(seed.sets, function(x)x@seeds) ## calculate the complementarity index of the first species complementarity.index <- complementarityIndex(net[[1]],net[[2]], seed.sets[[1]], seed.sets[[2]]) competition.index <- competitionIndex(net[[1]],net[[2]], seed.sets[[1]], seed.sets[[2]]) ## End(Not run)
## Not run: ## metabolic network reconstruction and seed set identity of sample data anno.species net <- lapply(anno.species,reconstructGsMN) seed.sets <- lapply(net, getSeedSets) seed.sets <- lapply(seed.sets, function(x)x@seeds) ## calculate the complementarity index of the first species complementarity.index <- complementarityIndex(net[[1]],net[[2]], seed.sets[[1]], seed.sets[[2]]) competition.index <- competitionIndex(net[[1]],net[[2]], seed.sets[[1]], seed.sets[[2]]) ## End(Not run)
kegg organism buc metabolic information, which consists of enzymatic reactions and metabolites.
A data frame with 418 observations on three variables.
[,1] .attrs.name, character (reaction: R)
[,2] substrate.name, list (substrates: cpd)
[,3] product.name, list (products: cpd)
buc metatolic information:
.attrs.name: Enzymatic reactions that organism involved
substrate.name: Substrates of the corresponding reaction.
product.name: Products of the corresponding reaction.
kegg organism ptr metabolic information, which consists of enzymatic reactions and metabolites.
A data frame with 1858 observations on three variables.
[,1] .attrs.name, character (reaction: R)
[,2] substrate.name, list (substrates: cpd)
[,3] product.name, list (products: cpd)
ptr metatolic information:
.attrs.name: Enzymatic reactions that organism involved
substrate.name: Substrates of the corresponding reaction.
product.name: Products of the corresponding reaction.
This function utilizes Kosaraju's algorithm to caculate the strong connetected components descomposition of a given network
KosarajuSCC(g)
KosarajuSCC(g)
g |
a igraph object to be caculated |
a list which length is equal to the number of SCCs, each element represents a Scc
AV Aho, JE Hopcroft, JD Ullman: The design and analysis of computer algorithms, 1974
## Not run: metabolic.data <- getOrgMetabolicData("buc") ## metabolic network reconstruction net <- reconstructGsMN(metabolic.data) scc <- KosarajuSCC(net) ## End(Not run)
## Not run: metabolic.data <- getOrgMetabolicData("buc") ## metabolic network reconstruction net <- reconstructGsMN(metabolic.data) scc <- KosarajuSCC(net) ## End(Not run)
Caculate the number of the seed source components.
len(object) ## S4 method for signature 'seedset' len(object)
len(object) ## S4 method for signature 'seedset' len(object)
object |
|
an interger
## Not run: len(seed.set) ## End(Not run)
## Not run: len(seed.set) ## End(Not run)
Non seed of the network.
nonseed(object) ## S4 method for signature 'seedset' nonseed(object)
nonseed(object) ## S4 method for signature 'seedset' nonseed(object)
object |
|
a vector
## Not run: nonseed(seed.set) ## End(Not run)
## Not run: nonseed(seed.set) ## End(Not run)
Reconstruction of genome-scale metabolic network (GsMN) whose nodes represents compounds and whose edges represents reactions.
reconstructGsMN(metabolic.data, RefData = RefDbcache, threshold = 10, is.gaint = TRUE)
reconstructGsMN(metabolic.data, RefData = RefDbcache, threshold = 10, is.gaint = TRUE)
metabolic.data |
df or a character vector. More details see function
|
RefData |
The reference metabolic data. It does not need reference data While organism metabolic data was collected from KEGG database, and RefData is set to NULL. Otherwise, RefDbCache, an internal dataset in this package, was taken as the Reference metabolic data for Genome scale metabolic reconstruction. |
threshold |
numeric, Nodes belonging to components with fewer than the value of threshold nodes will be ignored. This is a good option for networks that contain many small and trivial components. Default is 10. |
is.gaint |
logical, Ignore all nodes except those in the giant component: selecting the only main largest component (connected set of nodes) of the network. All smaller components will be ignored. This is a good option for networks with a dominant component. Default is TRUE. |
The input of this function can be of two forms. If organims is
collected in KEGG database, it can be obtained with
getOrgMetabolicData
which is a data frame. Otherwise,
metabolic.data
could be a character vecotr which contains the KEGG
Orthology annotated information on this organism, e.g. we can download this
KO annotation profile in the https://img.jgi.doe.gov website for
species detected in a human microbime which not contained in KEGG organism
database. Several functions, such as link{read.table}
and
read.delim
could help us to read KO annotation profile.
igraph object
## not run (organism in KEGG) ## metabolic.data <- getOrgMetabolicData("buc") ## g <- reconstructGsMN(metabolic.data) ## species detected in a human microbiome annodir <- system.file("extdata","koanno.tab",package = "RevEcoR") metabolic.data <- read.delim2(file=annodir,stringsAsFactors=FALSE) ##load the reference metabolic data data(RefDbcache) g2 <- reconstructGsMN(metabolic.data, RefData = RefDbcache)
## not run (organism in KEGG) ## metabolic.data <- getOrgMetabolicData("buc") ## g <- reconstructGsMN(metabolic.data) ## species detected in a human microbiome annodir <- system.file("extdata","koanno.tab",package = "RevEcoR") metabolic.data <- read.delim2(file=annodir,stringsAsFactors=FALSE) ##load the reference metabolic data data(RefDbcache) g2 <- reconstructGsMN(metabolic.data, RefData = RefDbcache)
Reference data for global metabolic construction
The reference metabolic pathway data contains KOs, substrates and products, as well as a constructed reference global network, which used for metabolic network reconstruction
The format is: List of 7 KO, substrate, product, user, date, version, reference network
Information this dataset is involved:
KO, all KEGG orthlogy enties in KEGG metabolic pathways.
substrate, substrate of enzymatic reactions in all KEGG metabolic pathways.
product, product of enzymatic reactions in all KEGG metabolic pathways.
user who download this data.
date, the date this data is downloaded.
version, R version used to obtained it.
network, the global network which is reconstructed based on all the metabolites.
https://www.bioconductor.org/packages/release/bioc/html/mmnet.html
This package implementation the applications of reverse ecology. Reverse ecology refers to the use of genomics to study ecology with no a priori assumptions about the organism(s) under consideration, linking the organism and their environment. Prediction the cooperation among species and hosts.
seedset-class
Object representing the seed sets of a given metabolic network
GsMN,
a igraph network
seeds,
a character list represents seeds of a given metabolic network which is composed of the KEGG compound index.
getGsMN, signature(object = "seedset")
:
get the genome scale metabolic network whose seed set is caculated
len, signature(object = "seedset")
:
return the number of source SCC
seedSize, signature(object = "seedset")
:
returns the sizes of each source SCCs
nonseed, signature(object = "seedset")
:
the non seeds of the GsMN
show, signature(object = "seedset")
:
show the short summary of a seedset class
confidencescore, signature(object = "seedset")
:
confidence score of the seed set
getSeedSets
,getGsMN
,len
,
nonseed
,seedSize
,confidencescore
## Not run: #' ## generate a metabolic network in igraph class and a seed set of this graph annodir <- system.file("extdata","koanno.tab",package = "RevEcoR") metabolic.data <- read.delim2(file=annodir,stringsAsFactors=FALSE) g <- reconstructGsMN(metabolic.data) seeds <- getSeedSets(g)@seeds seed.set <- new("seedset",GsMN = g, seeds = seeds) ## End(Not run)
## Not run: #' ## generate a metabolic network in igraph class and a seed set of this graph annodir <- system.file("extdata","koanno.tab",package = "RevEcoR") metabolic.data <- read.delim2(file=annodir,stringsAsFactors=FALSE) g <- reconstructGsMN(metabolic.data) seeds <- getSeedSets(g)@seeds seed.set <- new("seedset",GsMN = g, seeds = seeds) ## End(Not run)
Caculate the size of each seed source component.
seedSize(object) ## S4 method for signature 'seedset' seedSize(object)
seedSize(object) ## S4 method for signature 'seedset' seedSize(object)
object |
|
a vector represents size of each source seed componet of network
## Not run: seedSize(seed.set) ## End(Not run)
## Not run: seedSize(seed.set) ## End(Not run)
Show a short summary of seedset object
## S4 method for signature 'seedset' show(object)
## S4 method for signature 'seedset' show(object)
object |
|
## Not run: show(seed.set) ## End(Not run)
## Not run: show(seed.set) ## End(Not run)