Metadata about signaling pathway resources

This collection was created during the construction of OmniPath when we considered more than 50 resources and selected the ones containing literature curation effort. OmniPath is a network of signaling pathways intending to combine all high quality, manually curated efforts. The descriptions here cite the relevant sentences about the curation protocols from the original articles and webpages. URLs pointing to the articles and the webpages, and some additional metadata are provided where available. The resources with green title are included by default in OmniPath. pypath methods are listed where available, to know more please look at pypath documentation. This list is only about network resources. pypath is able to process and integrate many other resources, please see the paper and the documentation to know more.

We searched for license information in the main, About, Download and FAQ sections of the webpages, and run Google searches for the database name and license. Where we could not find anything about licensing, we assumed no license. Unfortunately due to todays restrictive copyright legislations, users don't have the freedom to use, modify and redistribute the data without a license explicitely granting these to them. Despite the clear intention from the authors to make their data public, and statements on the webpage like "free to use" or "available for download".

Contents


ACSN – Atlas of Cancer Signalling Networks

Category || Subcategory >>> Literature curated || Reaction

Last updated: 2015

Updated in years: 2008, 2014, 2015, 2016

Created by Curie

Contact:

License: No license

Webpages

Articles

PubMed

Taxons: Human

Quotes

The map curator studies the body of literature dedicated to the biological process or molecular mechanism of interest. The initial sources of information are the major review articles from high-impact journals that represent the consensus view on the studied topic and also provide a list of original references. The map curator extracts information from review papers and represents it in the form of biochemical reactions in CellDesigner. This level of details reflects the ‘canonical’ mechanisms. Afterwards, the curator extends the search and analyses original papers from the list provided in the review articles and beyond. This information is used to enrich the map with details from the recent discoveries in the field. The rule for confident acceptance and inclusion of a biochemical reaction or a process is the presence of sufficient evidences from more than two studies, preferably from different scientific groups. The content of ACSN is also verified and compared with publicly available databases such as REACTOME, KEGG, WikiPathways, BioCarta, Cell Signalling and others to ensure comprehensive representation of consensus pathways and links on PMIDs of original articles confirmed annotated molecular interactions.

Methods in pypath

Data source (URLs and files)

Data format definition

Data input methods

Interactions


AlzPathway

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2015

Updated in years: 2012, 2015

Created by Tokyo Bioinf

Contact:

License: CC-Attribution-3.0

Articles

Webpages

PubMed

Quotes

We collected 123 review articles related to AD accessible from PubMed. We then manually curated these review articles, and have built an AD pathway map by using CellDesigner. Molecules are distinguished by the following types: proteins, complexes, simple molecules, genes, RNAs, ions, degraded products, and phenotypes. Gene symbols are pursuant to the HGNC symbols. Reactions are also distinguished by the following categories: state transition, transcription, translation, heterodimer association, dissociation, transport, unknown transition, and omitted transition. All the reactions have evidences to the references in PubMed ID using the MIRIAM scheme. All the references used for constructing the AlzPathway are listed in the ‘References for AlzPathway’. Cellular types are distinguished by the followings: neuron, astrocyte, and microglial cells. Cellular compartments are also distinguished by the followings: brain blood barrier, presynaptic, postsynaptic, and their inner cellular localizations.

References can be fetched only from XML formats, not from the SIF file. Among approx. 150 protein-protein interactions, also contains interactions of many small molecules, denoted by pubchem IDs.

Data integration in pypath: static

Methods in pypath

Data source (URLs and files)

Data format definition


ARN

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2014

Updated in years: 2014

Created by NetBiol Group

Contact:

License: CC-Attribution-NonCommercial-ShareAlike-3.0

Webpages

Articles

PubMed

Taxons: Human

Quotes

From Korcsmaros 2010: ... we first listed signaling proteins and interactions from reviews and then added further signaling interactions of the listed proteins. We used reviews as a starting point, manually looked up interactions three times, and manually searched for interactions of known signaling proteins with no signaling interactions so far in the database.


Ataxia

Category || Subcategory >>> High-throughput || Interaction

Last updated: 2010

Created by Shaw Lab

Contact:

License: CC-Attribution-2.5

Webpages

Articles

Taxons: Human

Quotes

In order to expand the interaction dataset, we added relevant direct protein–protein interactions from currently available human protein–protein interaction networks (Rual et al., 2005; Stelzl et al., 2005). We also searched public databases, including BIND (Bader et al., 2003), DIP (Xenarios et al., 2002), HPRD (Peri et al., 2003), MINT (Zanzoni et al., 2002), and MIPS (Pagel et al., 2005), to identify literature-based binary interactions involving the 54 ataxia-associated baits and the 561 interacting prey proteins. We identified 4796 binary protein–protein interactions for our Y2H baits and prey proteins (Table S4) and incorporated them in the Y2H protein–protein interaction map (Figures 4A–4C).

The Ataxia network doesn't contain original manual curation effort. The integrated data are very old.


Awan 2007

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2007

Created by Wang Group

Contact:

License: No license

Articles

PubMed

Direct data import from: BioCarta, CA1

Quotes

To construct the human cellular signalling network, we manually curated signalling pathways from literature. The signalling data source for our pathways is the BioCarta database (http://www.biocarta.com/genes/allpathways.asp), which, so far, is the most comprehensive database for human cellular signalling pathways. Our curated pathway database recorded gene names and functions, cellular locations of each gene and relationships between genes such as activation, inhibition, translocation, enzyme digestion, gene transcription and translation, signal stimulation and so on. To ensure the accuracy and the consistency of the database, each referenced pathway was cross-checked by different researchers and finally all the documented pathways were checked by one researcher. In total, 164 signalling pathways were documented (supplementary Table 2). Furthermore, we merged the curated data with another literature-mined human cellular signalling network. As a result, the merged network contains nearly 1100 proteins (SupplementaryNetworkFile). To construct a signalling network, we considered relationships of proteins as links (activation or inactivation as directed links and physical interactions in protein complexes as neutral links) and proteins as nodes.


BioCarta

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2006

Updated in years: 2006

Created by Community

Contact:

License: BioCarta webpage Terms and Conditions of Use (pathways are not owned by BioCarta and are free to use)

Webpages

Taxons: Human

Quotes

Community built pathway database based on expert curation.

This resource includes a huge number of pathways, each curated by experts from a few reviews. The data is not available for download from the original webpage, only from second hand, for example from NCI-PID, in NCI-XML format. However, these files doesn't contain any references, which makes problematic the use of the BioCarta dataset. Also, some pathways are reviewed long time ago, possibly outdated.


BioGRID – Biological General Repository for Interaction Datasets

Category || Subcategory >>> High throughput || Interaction

Last updated: 2016

Updated in years: 2003, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016

Created by Tyers Lab

Contact:

License: BioGRID License, non-free

Webpages

Articles

PubMed

Collections

Methods in pypath

Data source (URLs and files)

Data format definition

Interactions


Ma'ayan 2005 – Human Hippocampal CA1 Region Neurons Signaling Network

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2005

Updated in years: 2005

Created by Iyengar Lab

Contact:

License: No license

Articles

PubMed

Taxons: Human, Mouse

Nodes: 545, Edges: 1259

Quotes

We used published research literature to identify the key components of signaling pathways and cellular machines, and their binary interactions. Most components (~80%) have been described in hippocampal neurons or related neuronal cells. Other components are from other cells, but are included because they are key components in processes known to occur in hippocampal neurons, such as translation. We then established that these interactions were both direct and functionally relevant. All of the connections were individually verified by at least one of the authors of this paper by reading the relevant primary paper(s). We developed a system made of 545 components (nodes) and 1259 links (connections). We used arbitrary but consistent rules to sort components into various groups. For instance, transcription factors are considered a as part of the transcriptional machinery, although it may also be equally valid to consider them as the most downstream component of the central signaling network. Similarly the AMPA receptor-channel (AMPAR) is considered part of the ion channels in the electrical response system since its activity is essential to defining the postsynaptic response, although it binds to and is activated by glutamate, and hence can be also considered a ligand gated receptor-channel in the plasma membrane. The links were specified by two criteria: function and biochemical mechanism. Three types of functional links were specified. This follows the rules used for representation of pathways in Science’s STKE (S1). Links may be activating, inhibitory or neutral. Neutral links do not specify directionality between components, and are mostly used to represent scaffolding and anchoring undirected or bidirectional interactions. The biochemical specification includes defining the reactions as non-covalent binding interactions or enzymatic reactions. Within the enzymatic category, reactions were further specified as phosphorylation, dephosphorylation, hydrolysis, etc. These two criteria for specification are independent and were defined for all interactions. For the analyses in this study we only used the functional criteria: activating, inhibitory or neutral specifications. We chose papers that demonstrated direct interactions that were supported by either biochemical or physiological effects of the interactions. From these papers we identified the components and interactions that make up the system we analyzed. During this specification process we did not consider whether these interactions would come together to form higher order organizational units. Each component and interaction was validated by a reference from the primary literature (1202 papers were used). A list of authors who read the papers to validate the components and interactions is provided under authors contributions.

One of the earliest manually curated networks, available in easily accessible tabular format, including UniProt IDs and PubMed references.

Data integration in pypath: dynamic

Methods in pypath

Data source (URLs and files)

Data format definition

Interactions


CancerCellMap

Category || Subcategory >>> Literature curated || Interaction

Last updated: 2006

Created by Bader Lab

Contact:

License: CC-Attribution-2.5

Webpages

Collections

Taxons: Human, Mouse, Rat

Quotes

Manually curated data, unpublished. A team of M.Sc. and Ph.D. biologists at the Institute of Bioinformatics in Bangalore, India read original research papers and hand-entered the pathway data into our database. The quality of the Cancer Cell Map pathways is very high. Half of the pathways were reviewed by experts at Memorial Sloan-Kettering Cancer Center and were found to contain only a few errors, which were subsequently fixed. A pathway is a collection of all genes/proteins that have been described as pathway members in any publication and all the interactions between them that can be found described in the literature.

One of the earliest manually curated datasets, now only available from second hand, e.g. from PathwayCommons. Included in many other resources. Contains binary interactions with PubMed references.

Data integration in pypath: dynamic

Methods in pypath

Data source (URLs and files)

Data format definition

Interactions


CARFMAP

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2015

Contact:

License: CC-Attribution-4.0

Articles

Webpages

PubMed


ConsensusPathDB

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2015

Updated in years: 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015

Contact:

License: Constituting resources carry their own licenses. "Due to several licensing issues, we are not allowed to release the complete integrated network (including signaling, metabolism and gene regulation)."

Webpages

Articles

PubMed

Collections

Taxons: Human, Mouse, Yeast

Quotes

Interaction data in ConsensusPathDB currently originates from 12 interaction databases and comprises physical interactions, biochemical reactions and gene regulations. Importantly, the source of physical entities and interactions is always recorded, which allows linking to the original data in the source database.

In order to assess the content overlap of the source databases and to reduce redundancy, we have applied a method to merge identical physical entities and identify similar interactions. The method is straightforward and efficient for the integration of networks from any single species. Simple physical entities of the same type (genes, proteins, transcripts, metabolites) are compared on the basis of common database identifiers like UniProt, Ensembl, Entrez, ChEBI, etc. Since different databases tend to annotate physical entities with different identifier types (e.g. some databases annotate proteins with UniProt identifiers, others with Ensembl identifiers), we first translated the annotations to a uniform identifier type, which is a UniProt entry name in case of proteins, Ensembl gene ID in case of genes and transcripts, and KEGG/ChEBI ID in case of metabolites. Protein complexes are compared according to their individual protein composition. Simple physical entities with the same identifier, and complexes with the same composition, are merged in ConsensusPathDB. Information provided by the according source databases for the merged entities is stored in a complementary manner.

Functional interactions of physical entities are also compared with each other. Here, we distinguish between primary and secondary interaction participants. Primary participants are substrates and products in case of biochemical reactions, interactors in case of physical interactions and target genes in case of gene regulation. All other participants, e.g. enzymes and interaction modifiers, are secondary participants. If the primary participants of two or more interactions match, these interactions are considered similar. Two similar interactions may have different stoichiometry, modification and/or localization of the participants. To allow for flexibility, similar interactions are marked as such in the database, but the decision whether they should be considered identical despite mismatching details is left to the user and depends on his specific problem. Moreover, ConsensusPathDB does not provide any additional quality control filters. All interactions provided by the different database sources are treated in the same way.

ConsensusPathDB comprises data from 32 resources. The format is easy to use, tab delimited text file, with UniProtKB names and PubMed IDs. However, the dataset is extremely huge, and several databases containing HTP data is included.


CORUM – Comprehensive Resource of Mammalian protein complexes

Category || Subcategory >>> Literature curated || Complexes

Last updated: 2012

Updated in years: 2007, 2009

Contact:

License: No license.

Articles

Webpages

PubMed

Collections

Taxons: Human, Mouse, Rat

Quotes

The CORUM database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes.

In order to provide a high-quality dataset of mammalian protein complexes, all entries are manually created. Only protein complexes which have been isolated and characterized by reliable experimental evidence are included in CORUM. To be considered for CORUM, a protein complex has to be isolated as one molecule and must not be a construct derived from several experiments. Also, artificial constructs of subcomplexes are not taken into account. Since information from high-throughput experi ments contains a significant fraction of false-positive results, this type of data is excluded. References for relevant articles were mainly found in general review articles, cross-references to related protein complexes within analysed literature and comments on referenced articles in UniProt.

CORUM is not part of the OmniPath pathways network, because we did not applied any complex expansion. But it has an interface built in the pypath module.

Data integration in pypath: dynamic

Methods in pypath

Data source (URLs and files)

Data input methods


CST Pathways – Cell Signaling Technology Pathways

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2015

Updated in years: 2005, 2015

Created by CST

Contact:

License: No license.

Webpages

Quotes

On these resource pages you can find signaling pathway diagrams, research overviews, relevant antibody products, publications, and other research resources organized by topic. The pathway diagrams associated with these topics have been assembled by CST scientists and outside experts to provide succinct and current overviews of selected signaling pathways.

The pathway diagrams are based on good quality, manually curated data, probably from review articles. However, those are available only in graphical (PDF and InDesign) formats. There is no programmatic way to obtain the interactions and references, as it was confirmed by the authors, who I contacted by mail. Wang's HumanSignalingNetwork includes the data from this resource, which probably has been entered manually, but Wang's data doesn't have source annotations, despite it's compiled from multiple sources. The date of the beginning of this project is estimated using the Internet wayback machine.


Cui 2007

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2007

Created by Wang Group

Contact:

License: No license

Articles

PubMed

Taxons: Human

Nodes: 1528, Edges: 4249

Direct data import from: Awan2007, CancerCellMap

Quotes

To build up the human signaling network, we manually curated the signaling molecules (most of them are proteins) and the interactions between these molecules from the most comprehensive signaling pathway database, BioCarta (http://www.biocarta.com/). The pathways in the database are illustrated as diagrams. We manually recorded the names, functions, cellular locations, biochemical classifications and the regulatory (including activating and inhibitory) and interaction relations of the signaling molecules for each signaling pathway. To ensure the accuracy of the curation, all the data have been crosschecked four times by different researchers. After combining the curated information with another literature‐mined signaling network that contains ∼500 signaling molecules (Ma'ayan et al, 2005)[this is the CA1], we obtained a signaling network containing ∼1100 proteins (Awan et al, 2007). We further extended this network by extracting and adding the signaling molecules and their relations from the Cancer Cell Map (http://cancer.cellmap.org/cellmap/), a database that contains 10 manually curated signaling pathways for cancer. As a result, the network contains 1634 nodes and 5089 links that include 2403 activation links (positive links), 741 inhibitory links (negative links), 1915 physical links (neutral links) and 30 links whose types are unknown (Supplementary Table 9). To our knowledge, this network is the biggest cellular signaling network at present.

Excellent signaling network with good topology for all those who doesn't mind to use data of unknown origin. Supposedly a manually curated network, but data files doesn't include article references. Merging CA1 network with CancerCellMap and BioCarta (also without references) makes the origin of the data untraceable.


dbPTM

Category || Subcategory >>> Literature curated || Ptm

Last updated: 2015

Updated in years: 2005, 2009, 2012, 2015

Created by ISBLab

Contact:

License: No license

Webpages

Articles

PubMed

Collections

Taxons: Human, Metazoa, Bacteria, Plants, Yeast

Quotes

Due to the inaccessibility of database contents in several online PTM resources, a total 11 biological databases related to PTMs are integrated in dbPTM, including UniProtKB/SwissProt, version 9.0 of Phospho.ELM, PhosphoSitePlus, PHOSIDA, version 6.0 of O-GLYCBASE, dbOGAP, dbSNO, version 1.0 of UbiProt, PupDB, version 1.1 of SysPTM and release 9.0 of HPRD.

With the high throughput of MS-based methods in post-translational proteomics, this update also includes manually curated MS/MS-identified peptides associated with PTMs from research articles through a literature survey. First, a table list of PTM-related keywords is constructed by referring to the UniProtKB/SwissProt PTM list (http://www.uniprot.org/docs/ptmlist.txt) and the annotations of RESID (28). Then, all fields in the PubMed database are searched based on the keywords of the constructed table list. This is then followed by downloading the full text of the research articles. For the various experiments of proteomic identification, a text-mining system is developed to survey full-text literature that potentially describes the site-specific identification of modified sites. Approximately 800 original and review articles associated with MS/MS proteomics and protein modifications are retrieved from PubMed (July 2012). Next, the full-length articles are manually reviewed for precisely extracting the MS/MS peptides along with the modified sites. Furthermore, in order to determine the locations of PTMs on a full-length protein sequence, the experimentally verified MS/MS peptides are then mapped to UniProtKB protein entries based on its database identifier (ID) and sequence identity. In the process of data mapping, MS/MS peptides that cannot align exactly to a protein sequence are discarded. Finally, each mapped PTM site is attributed with a corresponding literature (PubMed ID).

Data integration in pypath: dynamic

Methods in pypath

Data source (URLs and files)

Data format definition

Data input methods

Interactions

Enzyme-substrate relationships and PTMs


DeathDomain

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2012

Updated in years: 2011, 2012

Created by Myoungji University

Contact:

License: No license

Articles

Webpages

PubMed

Collections

Taxons: Human

Nodes: 99, Edges: 175

Quotes

The PubMed database was used as the primary source for collecting information and constructing the DD database. After finding synonyms for each of the 99 DD superfamily proteins using UniProtKB and Entrez Gene, we obtained a list of articles using each name of the proteins and its synonyms on a PubMed search, and we selected the articles that contained evidence for physical binding among the proteins denoted. We also manually screened information that was in other databases, such as DIP, IntAct, MINT, STRING and Entrez Gene. All of the 295 articles used for database construction are listed on our database website.

Detailful dataset with many references. Sadly the data can be extracted only by parsing HTML. It doesn't mean more difficulty than parsing XML formats, just these are not intended to use for this purpose.

Data integration in pypath: static

Methods in pypath

Data source (URLs and files)

Data format definition


DEPOD – Human Dephosphorylation Database

Category || Subcategory >>> Literature curated || Post-translational modification

Last updated: 2016

Updated in years: 2013, 2014, 2016

Created by EMBL & EMBL-EBI

Contact:

License: No license

Articles

Webpages

PubMed

Collections

Taxons: Human

Quotes

DEPOD the human DEPhOsphorylation Database (version 1.0) is a manually curated database collecting human active phosphatases, their experimentally verified protein and non-protein substrates and dephosphorylation site information, and pathways in which they are involved. It also provides links to popular kinase databases and protein-protein interaction databases for these phosphatases and substrates. DEPOD aims to be a valuable resource for studying human phosphatases and their substrate specificities and molecular mechanisms; phosphatase-targeted drug discovery and development; connecting phosphatases with kinases through their common substrates; completing the human phosphorylation/dephosphorylation network.

Nice manually curated dataset with PubMed references, in easily accessible MITAB format with UniProt IDs, comprises 832 dephosphorylation reactions on protein substrates, and few hundreds on small molecules.

Methods in pypath

Data source (URLs and files)

Data format definition

Data input methods

Enzyme-substrate relationships and PTMs


DIP – Database of Interacting Proteins

Category || Subcategory >>> Literature curated || Interaction

Last updated: 2016

Updated in years: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016

Created by UCLA, Eisenberg Group

Contact:

License: CC-Attribution-NoDerivs-3.0

Articles

Webpages

PubMed

Collections

Quotes

In the beginning (near 2000), it was a entirely manually curated database:

Currently protein–protein interactions are entered into the DIP only following publication in peer-reviewed journals. Entry is done manually by the curator, followed by automated tests that show the proteins and citations exist. Interactions are double-checked by a second curator and flagged accordingly in the database.

From 2001, it contains high-throughput interactions:

Because the reliability of experimental evidence varies widely, methods of quality assessment have been developed and utilized to identify the most reliable subset of the interactions. This CORE set can be used as a reference when evaluating the reliability of high-throughput protein-protein interaction data sets, for development of prediction methods, as well as in the studies of the properties of protein interaction networks.

The 'core' dataset contains manually curated interactions from small-scale studies. Interactions are well annotated with PubMed IDs, evidences, and mechanism (binding, chemical reaction, etc). The format is esily accessible (MITAB).

Data integration in pypath: static

Methods in pypath

Data source (URLs and files)

Data format definition


DOMINO

Category || Subcategory >>> Literature curated || Ptm

Last updated: 2006

Updated in years: 2006

Created by Cesareni Group

Contact:

License: CC-Attribution-2.5

Webpages

Articles

PubMed

Collections

Taxons: Human, Yeast, C. elegans, Mouse, Rat, HIV, D. melanogaster, A. thaliana, X. laevis, B. taurus, G. gallus, O. cuniculus, Plasmodium falciparum

Quotes

DOMINO aims at annotating all the available information about domain-peptide and domain–domain interactions. The core of DOMINO, of July 24, 2006 consists of more than 3900 interactions extracted from peer-reviewed articles and annotated by expert biologists. A total of 717 manuscripts have been processed, thus covering a large fraction of the published information about domain–peptide interactions. The curation effort has focused on the following domains: SH3, SH2, 14-3-3, PDZ, PTB, WW, EVH, VHS, FHA, EH, FF, BRCT, Bromo, Chromo and GYF. However, interactions mediated by as many as 150 different domain families are stored in DOMINO.

Methods in pypath

Data source (URLs and files)

Domain-domain interactions

Domain-motif interactions

Data format definition

Data input methods

Interactions

Enzyme-substrate relationships and PTMs


ELM

Category || Subcategory >>> Literature curated || Post-translational modifications

Last updated: 2014

Updated in years: 2003, 2008, 2009, 2012, 2013, 2014, 2016

Created by ELM Consortium

Contact:

License: ELM Software License Agreement, non-free

Webpages

Articles

PubMed

Collections

Data integration in pypath: dynamic

Methods in pypath

Data source (URLs and files)

Domain-motif interactions

Data format definition

Data input methods

Interactions


Guide to Pharmacology – Guide to Pharmacology

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2015

Updated in years: 2007, 2008, 2009, 2011, 2013, 2014, 2015, 2016

Contact:

License: CC-Attribution-ShareAlike-3.0

Webpages

Articles

PubMed

Collections

Quotes

Presently, the resource describes the interactions between target proteins and 6064 distinct ligand entities (Table 1). Ligands are listed against targets by their action (e.g. activator, inhibitor), and also classified according to substance types and their status as approved drugs. Classes include metabolites (a general category for all biogenic, non-peptide, organic molecules including lipids, hormones and neurotransmitters), synthetic organic chemicals (e.g. small molecule drugs), natural products, mammalian endogenous peptides, synthetic and other peptides including toxins from non-mammalian organisms, antibodies, inorganic substances and other, not readily classifiable compounds.

The new database was constructed by integrating data from IUPHAR-DB and the published GRAC compendium. An overview of the curation process is depicted as an organizational flow chart in Figure 2. New information was added to the existing relational database behind IUPHAR-DB and new webpages were created to display the integrated information. For each new target, information on human, mouse and rat genes and proteins, including gene symbol, full name, location, gene ID, UniProt and Ensembl IDs was manually curated from HGNC, the Mouse Genome Database (MGD) at Mouse Genome Informatics (MGI), the Rat Genome Database (RGD), UniProt and Ensembl, respectively. In addition, ‘Other names’, target-specific fields such as ‘Principal transduction’, text from the ‘Overview’ and ‘Comments’ sections and reference citations (downloaded from PubMed; http://www.ncbi.nlm.nih.gov/pubmed) were captured from GRAC and uploaded into the database against a unique Object ID.

Methods in pypath

Data source (URLs and files)

Data format definition


HPRD – Human Protein Reference Database

Category || Subcategory >>> Literature curated || Post-translational modification

Last updated: 2010

Updated in years: 2002, 2005, 2009, 2010

Contact:

License: No license. Everything in HPRD is free as long as it is not used for commercial purposes. Commercial entitites will have to pay a fee under a licensing arrangement which will be used to make this database even better. Commercial users should send an e-mail for details. This model of HPRD is similar to the SWISS-PROT licensing arrangement. We do not have any intentions to profit from HPRD. Our goal is to promote science by creating the infrastructure of HPRD. We hope to keep it updated with the assistance of the entire biomedical community. Any licensing fee, if generated, will be used to annotate HPRD better and to add more entries and features.

Webpages

Articles

PubMed

Collections

Quotes

The information about protein-protein interactions was cataloged after a critical reading of the published literature. Exhaustive searches were done based on keywords and medical subject headings (MeSH) by using Entrez. The type of experiments that served as the basis for establishing protein-protein interactions was also annotated. Experiments such as coimmunoprecipitation were designated in vivo, GST fusion and similar “pull-down” type of experiments were designated in vitro, and those identified by yeast two-hybrid were annotated as yeast two-hybrid.

Posttranslational modifications were annotated based on the type of modification, site of modification, and the modified residue. In addition, the upstream enzymes that are responsible for modifications of these proteins were reported if described in the articles. The most commonly known and the alternative subcellular localization of the protein were based on the literature. The sites of expression of protein and/or mRNA were annotated based on published studies.

Methods in pypath

Data source (URLs and files)

Data format definition

Data input methods

Interactions

Enzyme-substrate relationships and PTMs


HumanSignalingNetwork – Human Signaling Network version 6

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2014

Updated in years: 2009, 2010, 2011, 2012, 2013, 2014

Created by Wang Group

Contact:

License: No license

Webpages

Taxons: Human, Mouse, Rat

Direct data import from: Cui2007, BioCarta, CST, NCI-PID, iHOP

Quotes

Composed from multiple manually curated datasets, and contains own manual cuartion effort. Methods are unclear, and the dataset has not been published in reviewed paper. Based on the Cui et al 2007.

Wang Lab has manually curated human signaling data from literature since 2005. The data sources include BioCarta, CST Signaling pathways, NCI Pathway Interaction Database, iHOP, and many review papers. The contents are updated every year.

iHOP is not literature curated, but is a literature mining platform.

This network aims to merge multiple manually curated networks. Unfortunately a precise description of the sources and methods is missing. Also, the dataset does not include the references. Moreover, the data file misses header and key, so users can only guess about the meaning of columns and values.

Data integration in pypath: dynamic

Methods in pypath

Data source (URLs and files)

Data format definition

Interactions


HuPho – Human Phosphatase Portal

Category || Subcategory >>> High throughput and literature curated || Post-translational modification

Last updated: 2015

Updated in years: 2012, 2015

Created by Cesareni Group

Contact:

License: No license

Webpages

Articles

PubMed

Collections

Quotes

In order to offer a proteome-wide perspective of the phosphatase interactome, we have embarked on an extensive text-mining-assisted literature curation effort to extend phosphatase interaction information that was not yet covered by protein–protein interaction (PPI) databases. Interaction evidence captured by expert curators was annotated in the protein interaction database MINT according to the rapid curation standard. This data set was next integrated with protein interaction information from three additional major PPI databases, IntAct, BioGRID and DIP. These databases are part of the PSIMEx consortium and adopt a common data model and common controlled vocabularies, thus facilitating data integration. Duplicated entries were merged and redundant interactions have been removed.

As a result, from the HuPho website it is possible to explore experimental evidence from 718 scientific articles reporting 4600 experiments supporting protein interactions where at least one of the partners is a phosphatase. Since some interactions are supported by more than one piece of evidence, the actual number of non-redundant interactions is smaller, 2500 at the time of writing this paper. Moreover, 199 phosphatases have at least one reported ligand, while 53 have none. Interaction evidence is fairly evenly distributed in the four PSIMEx resources suggesting a substantial lack of overlap among the data curated by each database.

The database is dynamically updated, so is up to date at any given time. That's why it is marked as up to date in 2015, despite it has no new release after 2012.


InnateDB

Category || Subcategory >>> Literature curated || Interaction

Last updated: 2015

Updated in years: 2008, 2010, 2013, 2014, 2015

Created by Brinkman Lab, Hancock Lab, Lynn Group

Contact:

License: Design Science License

Articles

Webpages

PubMed

Collections

Quotes

InnateDB (www.innatedb.com) is a database and integrated analysis platform specifically designed to facilitate systems-level analyses of the mammalian innate immune response (Lynn et al. 2008; 2010, 2013). To enrich our knowledge of innate immunity networks and pathways, the InnateDB curation team has contextually annotated >25,000 human and mouse innate immunity-relevant molecular interactions through the review of >5,000 biomedical articles. Curation adheres to the MIMIx guidelines and new interactions are added weekly. Importantly, interactions are curated between molecules with a documented role in an innate immunity relevant biological process or pathway and all other interactors regardless of whether the interacting molecule has any known role in innate immunity. This approach captures interactions between the innate immune system and other systems.

InnateDB is not limited to data on the innate immune system. It is a comprehensive database of human, mouse and bovine molecular interactions and pathways, consisting of more than 300,000 molecular interactions and 3,000+ pathways, integrated from major public molecular interaction and pathway databases. InnateDB is also an analysis platform offering user-friendly bioinformatics tools, including pathway and ontology analysis, network visualization and analysis and the ability to upload and analyze user-supplied gene expression or other quantitative data in a network and/or pathway context. The platform has a global profile and is utilised by >10,000 users per annum and is widely cited. A mirror of the site hosted in Australia is also available at innatedb.sahmri.com.

Note that new interactions and gene annotations are added to InnateDB on an almost weekly database so the data is being continuously updated.

Probably the largest manually curated binary protein interaction dataset, developed by a dedicated full time team of curators. Formats are clear and accessible, comprising UniProt IDs, PubMed references, experimental evidences and mechanisms.

Data integration in pypath: static

Methods in pypath

Data source (URLs and files)

Data format definition


IntAct – IntAct Molecular Interaction Database

Category || Subcategory >>> Literature curated and high-throughput || Interaction

Last updated: 2016

Updated in years: 2003, 2006, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016

Created by EBI

Contact:

License: Apache License, Version 2.0

Articles

Webpages

PubMed

Collections

Direct data import from: InnateDB, MINT

Quotes

The information within the IntAct database primarily consists of protein–protein interaction (PPI) data. The majority of the PPI data within the database is annotated to IMEx standards, as agreed by the IMEx consortium. All such records contain a full description of the experimental conditions in which the interaction was observed. This includes full details of the constructs used in each experiment, such as the presence and position of tags, the minimal binding region defined by deletion mutants and the effect of any point mutations, referenced to UniProtKB, the underlying protein sequence database. Protein interactions can be described down to the isoform level, or indeed to the post-translationally cleaved mature peptide level if such information is available in the publication, using the appropriate UniProtKB identifiers.

Each entry in IntAct is peer reviewed by a senior curator, and not released until accepted by that curator. Additional rule-based checks are run at the database level, and manually fixed when necessary. Finally, on release of the data, the original author of each publication is contacted and asked to comment on the representation of their data; again manual updates are made to the entry should the author highlight any errors.

All binary interactions evidences in the IntAct database, including those generated by Spoke expansion of co-complex data, are clustered to produce a non-redundant set of protein pairs (R. C. Jimenez et al., manuscript in preparation). Each binary pair is then scored, using a simple addition of the cumulated value of a weighted score for the interaction detection method and the interaction type for each interaction evidence associated with that binary pair, as described using the PSI-MI CV terms. The scores are given in Table 1, all children of each given parent receives that score. Only experimental data is scored, inferred interactions, for example, would be excluded. Any low confidence data or data manually tagged by a curator for exclusion from the process, would not be scored. Isoforms and post-processed protein chains are regarded as distinct proteins for scoring purposes.

We can not draw a sharp distinction between low and high throughput methods, and I can agree, that this is not the only and best measure of quality considering experimental data. I see that IntAct came up with a good solution to estimate the confidence of interactions. The mi-score system gives a comprehensive way to synthetize information from multiple experiments, and weight interactions according to experimental methods, interaction type, and number of evidences.

Methods in pypath

Data source (URLs and files)

Data format definition


KEGG – Kyoto Encyclopedia of Genes and Genomes

Category || Subcategory >>> Literature curated || Reaction network

Last updated: 2016

Updated in years: 2000, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016

Contact:

License: KEGG License, non-free

Webpages

Articles

Collections

Quotes

From 2011, KEGG data is not freely available. The downloadable KGML files contain binary interactions, most of them between large complexes. No references available.

Methods in pypath

Data source (URLs and files)

Data input methods

Miscellaneous


Laudanna – Compiled Datasets for Network Analysis from Laudanna Lab

Category || Subcategory >>> Combined || Mixed

Last updated: 2014

Updated in years: 2014

Created by Laudanna Lab

Contact:

License: No license

Webpages

Direct data import from: BioGRID, ConsensusPathDB, dbPTM, DIP, HumanSignalingNetwork, IntAct, MINT, MPPI, PathwayCommons, phospho.ELM, PhosphoPoint, PhosphoSite, SignaLink

Quotes

Data sets are compiled from public data-bases and from literature and manually curated for accuracy. They are intended for network reconstruction, topological and multidimensional analysis in cell biology.

Methods in pypath

Data source (URLs and files)

Data input methods

Miscellaneous


Li 2012

Category || Subcategory >>> High-throughput || Yeast 2 hybrid

Last updated: 2012

Created by Wang Lab

Contact:

License: No license.

Articles

PubMed

Taxons: Human

Quotes

Human phosphotyrosine signaling network.

We manually collected the experimentally determined human TK–substrate interactions and substrate–SH2/PTB domain interactions from the literature (see Supplemental Materials), as well as the Phospho.ELM and PhosphoSitePlus databases. [71 references, 585 circuits]

Data integration in pypath: static

Methods in pypath

Data source (URLs and files)

Domain-motif interactions

Data format definition

Data input methods

Interactions

Enzyme-substrate relationships and PTMs


Lit-BM-13

Category || Subcategory >>> High-throughput || Yeast 2 hybrid

Last updated: 2013

Created by CCSB

Contact:

License: No license. "This dataset is freely available to the research community through the search engine or via download."

Articles

Webpages

PubMed

Quotes

High-quality non-systematic Literature dataset. In 2013, we extracted interaction data from BIND, BioGRID, DIP, HPRD, MINT, IntAct, and PDB to generate a high-quality binary literature dataset comprising ~11,000 protein-protein interactions that are binary and supported by at least two traceable pieces of evidence (publications and/or methods) (Rolland et al Cell 2014). Although this dataset does not result from a systematic investigation of the interactome search space and should thus be used with caution for any network topology analyses, it represents valuable interactions for targeted studies and is freely available to the research community through the search engine or via download.

Methods in pypath

Data source (URLs and files)

Data input methods


LMPID

Category || Subcategory >>> Literature curated || Post-translational modifications

Last updated: 2015

Updated in years: 2015

Created by Bose Institute

Contact:

License: No license. If you are using this database please cite Sarkar 2015.

Webpages

Articles

PubMed

Collections

Quotes

LMPID (Linear Motif mediated Protein Interaction Database) is a manually curated database which provides comprehensive experimentally validated information about the LMs mediating PPIs from all organisms on a single platform. About 2200 entries have been compiled by detailed manual curation of PubMed abstracts, of which about 1000 LM entries were being annotated for the first time, as compared with the Eukaryotic LM resource.

Methods in pypath

Data source (URLs and files)

Domain-motif interactions

Data format definition

Data input methods

Interactions


Macrophage

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2010

Contact:

License: No license

Articles

Webpages

PubMed

Collections

Quotes

Ongoing analysis of macrophage-related datasets and an interest in consolidating our knowledge of a number of signalling pathways directed our choice of pathways to be mapped (see Figure 1). Public and propriety databases were initially used as resources for data mining, but ultimately all molecular interaction data was sourced from published literature. Manual curation of the literature was performed to firstly evaluate the quality of the evidence supporting an interaction and secondly, to extract the necessary and additional pieces of information required to 'understand' the pathway and construct an interaction diagram. We have drawn pathways based on our desire to model pathways active in a human macrophage and therefore all components have been depicted using standard human gene nomenclature (HGNC). However, our understanding of the pathway components and the interactions between them, have been drawn largely from a consensus view of literature knowledge. As such the pathways presented here are based on data derived from a range of different cellular systems and mammalian species (human and mouse).

Data integration in pypath: static

Methods in pypath

Data source (URLs and files)

Data format definition


MatrixDB

Category || Subcategory >>> Literature curated || Interaction

Last updated: 2015

Updated in years: 2009, 2011, 2015

Contact:

License: No license

Articles

Webpages

PubMed

Collections

Taxons: Mammalia

Quotes

Protein data were imported from the UniProtKB/Swiss-Prot database (Bairoch et al., 2005) and identified by UniProtKB/SwissProt accession numbers. In order to list all the partners of a protein, interactions are associated by default to the accession number of the human protein. The actual source species used in experiments is indicated in the page reporting interaction data. Intracellular and membrane proteins were included to obtain a comprehensive network of the partners of extracellular molecules. Indeed, ECM proteins and GAGs bind to a number of membrane proteins or cell-associated proteoglycans and some of them interact with intracellular partners upon internalization (Dixelius et al., 2000). ECM proteins were identified by the UniProtKB/Swiss-Prot keyword ‘extracellular matrix’ and by the GO terms ‘extracellular matrix’, ‘proteinaceous extracellular matrix’ and their child terms. The proteins annotated with the GO terms ‘extracellular region’ and ‘extracellular space’, which are used for proteins found in biological fluids, were not included because circulating molecules do not directly contribute to the extracellular scaffold. Additionally, 96 proteins were manually (re-)annotated through literature curation. MatrixDB integrates 1378 interactions from the Human Protein Reference Database (HPRD, Prasad et al., 2009), 211 interactions from the Molecular INTeraction database (MINT, Chatr-Aryamontri et al., 2007), 46 interactions from the Database of Interacting Proteins (DIP, Salwinski et al., 2004), 232 interactions from IntAct (Kerrien et al., 2007a) and 839 from BioGRID (Breitkreutz et al., 2008) involving at least one extracellular biomolecule of mammalian origin. We added 283 interactions from manual literature curation and 65 interactions from protein and GAG array experiments.

Very nice! Note: The interactions imported from IMEX databases or any other database, are collected separately, in the PSICQUIC-extended dataset. The MatrixDB-core dataset is curated manually by the MatrixDB team.

Data integration in pypath: static

Methods in pypath

Data source (URLs and files)

Data format definition


MINT – Molecular Interaction Database

Category || Subcategory >>> Literature curated and high-throughput || Interaction

Last updated: 2015

Updated in years: 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015

Contact:

License: CC-Attribution-2.5

Webpages

Articles

Collections


MPPI – The MIPS Mammalian Protein-Protein Interaction Database

Category || Subcategory >>> Literature curated || Interaction

Last updated: 2005

Updated in years: 2000, 2005

Created by MIPS Munich

Contact:

License: No license. "You are free to use the database as you please including full download of the dataset for your own analyses as long as you cite the source properly (Pagel et al. 2005)."

Articles

Webpages

PubMed

Collections

Taxons: Human, Mammalia

Quotes

The first and foremost principle of our MPPI database is to favor quality over completeness. Therefore, we decided to include only published experimental evidence derived from individual experiments as opposed to large-scale surveys. High-throughput data may be integrated later, but will be marked to distinguish it from evidence derived from individual experiments.

This database contains hundreds of interactions curated manually from original papers. The format is perfect, with UniProt IDs, and PubMed references.

Data integration in pypath: static

Methods in pypath

Data source (URLs and files)

Data format definition


NCI-PID – NCI-Nature Pathway Interaction Database

Category || Subcategory >>> Literature curated || Reaction network

Last updated: 2012

Updated in years: 2008, 2012

Created by NCI

Contact:

License: No license

Webpages

Articles

PubMed

Collections

Taxons: Human

Direct data import from: BioCarta, Reactome

Quotes

In curating, editors synthesize meaningful networks of events into defined pathways and adhere to the PID data model for consistency in data representation: molecules and biological processes are annotated with standardized names and unambiguous identifiers; and signaling and regulatory events are annotated with evidence codes and references. To ensure accurate data representation, editors assemble pathways from data that is principally derived from primary research publications. The majority of data in PID is human; however, if a finding discovered in another mammal is also deemed to occur in humans, editors may decide to include this finding, but will also record that the evidence was inferred from another species. Prior to publication, all pathways are reviewed by one or more experts in a field for accuracy and completeness.

From the NCI-XML interactions with references, directions and signs can be extracted. Complexes are ommited.

Methods in pypath

Data source (URLs and files)

Data format definition

Data input methods

Interactions


Negatome

Category || Subcategory >>> Literature curated || Negative

Last updated: 2013

Contact:

License: No license

Articles

Webpages

PubMed

Collections

Quotes

Annotation of the manual dataset was performed analogous to the annotation of protein–protein interactions and protein complexes in previous projects published by our group. Information about NIPs was extracted from scientific literature using only data from individual experiments but not from high-throughput experiments. Only mammalian proteins were considered. Data from high-throughput experiments were omitted in order to maintain the highest possible standard of reliability.

Data integration in pypath: static

Methods in pypath

Data source (URLs and files)

Data format definition

Miscellaneous


NetPath

Category || Subcategory >>> Literature curated || Reaction network

Last updated: 2015

Updated in years: 2010, 2011, 2012, 2013, 2014, 2015

Created by Pandey Lab, IOB Bangalore

Contact:

License: CC-Attribution-2.5

Articles

Webpages

PubMed

Collections

Direct data import from: CancerCellMap

Includes data from: CancerCellMap

Quotes

The initial annotation process of any signaling pathway involves gathering and reading of review articles to achieve a brief overview of the pathway. This process is followed by listing all the molecules that arereported to be involved in the pathway under annotation. Information regarding potential pathway authorities are also gathered at this initial stage. Pathway experts are involved in initial screening of the molecules listed to check for any obvious omissions. In the second phase, annotators manually perform extensive literature searches using search keys, which include all the alter native names of the molecules involved, the name of the pathway, the names of reactions, and so on. In addition, the iHOP resource is also used to perform advanced PubMed-based literature searches to collect the reactions that were reported to be implicated in a given pathway. The collected reactions are manually entered using the PathBuilder annotation interface, which is subjected to an internal review process involving PhD level scientists with expertise in the areas of molecular biology, immunology and biochemistry. However, there are instances where a molecule has been implicated in a pathway in a published report but the associated experimental evidence is either weak or differs from experiments carried out by other groups. For this purpose, we recruit several investigators as pathway authorities based on their expertise in individual signaling pathways. The review by pathway authorities occasionally leads to correction of errors or, more commonly, to inclusion of additional information. Finally, the pathway authorities help in assessing whether the work of all major laboratories has been incorporated for the given signaling pathway.

Formats are unclear. The tab delimited format contains the pathway memberships of genes, PubMed references, but not the interaction partners! The Excel file is very weird, in fact it is not an excel table, and contains only a few rows from the tab file. The PSI-MI XML is much better. By writing a simple parser, a lot of details can be extracted.

Data integration in pypath: dynamic

Methods in pypath

Data source (URLs and files)

Data format definition

Data input methods

Interactions


NRF2ome

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2013

Updated in years: 2013

Created by NetBiol Group

Contact:

License: CC-Attribution-NonCommercial-ShareAlike-3.0

Webpages

Articles

PubMed

Taxons: Human

Quotes

From Korcsmaros 2010: ... we first listed signaling proteins and interactions from reviews and then added further signaling interactions of the listed proteins. We used reviews as a starting point, manually looked up interactions three times, and manually searched for interactions of known signaling proteins with no signaling interactions so far in the database.


PANTHER – Pathway Analysis Through Evolutionary Relationships

Category || Subcategory >>> Literature curated || Reaction network

Last updated: 2016

Updated in years: 2000, 2001, 2002, 2003, 2005, 2006, 2010, 2011, 2012, 2014, 2016

Contact:

License: No license

Articles

Webpages

PubMed

Collections

Quotes

References are captured at three levels. First, each pathway as a whole requires a reference. For signaling pathways, at least three references, usually review papers, are required in order to provide a more objective view of the scope of the pathway. For metabolic pathways, a textbook reference is usually sufficient. Second, references are often associated to each molecule class in the pathway. Most of these references are OMIM records or review papers. Third, references are provided to support association of specific protein sequences with a particular molecule class, e.g., the SWISS-PROT sequence P53_HUMAN annotated as an instance of the molecule class ‘‘P53’’ appearing in the pathway class ‘‘P53 pathway’’. These are usually research papers that report the experimental evidence that a particular protein or gene participates in the reactions represented in the pathway diagram.

There are three major properties that make this infrastructure differ from other pathway curation systems, such as from Reactome and EcoCyc. First, the pathway diagrams are drawn with CellDesigner software. There are two advantages to using CellDesigner. First, controlled graphical notations are used to draw the pathway diagram, and the software automatically creates a computational representation that is compatible with the SBML standard. Second, a pathway diagram can be viewed with an exact, one-to-one correspondence with the ontological representation of the pathways stored in the back-end. The second property is that the scope of the pathway is defined first based on literature, and pathway components (proteins, genes, RNAs) are treated as ontology terms, or molecule classes, rather than specific instances. This means that multiple proteins from the same organism or different organisms can potentially play the same given role in a pathway. The advantage is that the work flow is more similar to the thinking process of the biologists who are the users of our curation software module. The third major property is that the curation software is designed to be simple enough to be used directly by bench biologists after a brief training course. All other pathway databases we are aware of employ highly trained curators, who of course cannot be experts in all areas of biology. The current set of PANTHER pathways has been curated by more than 40 different external experts from the scientific community; they must only have demonstrated their expertise with publications in the relevant field.


PathwayCommons

Category || Subcategory >>> Combined || Interaction

Last updated: 2016

Updated in years: 2010, 2011, 2012, 2013, 2014, 2015, 2016

Created by Bader Lab, MSKCC cBio

Contact:

License: Constituting databases carry their own licenses.

Webpages

Articles

PubMed

Collections

Direct data import from: Reactome, NCI-PID, CancerCellMap, BioCarta, HPRD, PhosphoSite, PANTHER, DIP, IntAct, BioGRID, BIND, CORUM

Quotes

Pathway Commons is a collection of publicly available pathway information from multiple organisms. It provides researchers with convenient access to a comprehensive collection of biological pathways from multiple sources represented in a common language for gene and metabolic pathway analysis.


PDZBase

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2004

Updated in years: 2004

Created by Weinstein Group

Contact:

License: No license.

Webpages

Articles

PubMed

Collections

Taxons: Human

Quotes

PDZBase is a database that aims to contain all known PDZ-domain-mediated protein-protein interactions. Currently, PDZBase contains approximately 300 such interactions, which have been manually extracted from >200 articles.

PDZBase currently contains ∼300 interactions, all of which have been manually extracted from the literature, and have been independently verified by two curators. The extracted information comes from in vivo (co-immunoprecipitation) or in vitro experiments (GST-fusion or related pull-down experiments). Interactions identified solely from high throughput methods (e.g. yeast two-hybrid or mass spectrometry) were not included in PDZBase. Other prerequisites for inclusion in the database are: that knowledge of the binding sites on both interacting proteins must be available (for instance through a truncation or mutagenesis experiment); that interactions must be mediated directly by the PDZ-domain, and not by any other possible domain within the protein.

Methods in pypath

Data source (URLs and files)

Data format definition

Interactions


phospho.ELM

Category || Subcategory >>> Literature curated || Ptm

Last updated: 2010

Updated in years: 2004, 2007, 2010

Contact:

License: phospho.ELM Academic License, non-free

Webpages

Articles

PubMed

Collections

Quotes

Phospho.ELM http://phospho.elm.eu.org is a new resource containing experimentally verified phosphorylation sites manually curated from the literature and is developed as part of the ELM (Eukaryotic Linear Motif) resource. Phospho.ELM constitutes the largest searchable collection of phosphorylation sites available to the research community. The Phospho.ELM entries store information about substrate proteins with the exact positions of residues known to be phosphorylated by cellular kinases. Additional annotation includes literature references, subcellular compartment, tissue distribution, and information about the signaling pathways involved as well as links to the molecular interaction database MINT. Phospho.ELM version 2.0 contains 1,703 phosphorylation site instances for 556 phosphorylated proteins. (Diella 2004)

Data integration in pypath: dynamic

Methods in pypath

Data source (URLs and files)

Data format definition

Data input methods

Interactions

Enzyme-substrate relationships and PTMs


PhosphoPoint

Category || Subcategory >>> Literature curated and prediction || Post-translational modification

Last updated: 2008

Contact:

License: No license

Articles

Webpages

PubMed

Collections

Taxons: Human

Quotes

We have integrated three existing databases, including Phospho.ELM (release 6.0, total 9236 phosphorylation sites), HPRD (release 6, total 8992 phosphorylation sites), SwissProt (release 51.5, total 6529 phosphorylation sites), and our manually curated 400 kinase–substrate pairs, which are primarily from review articles.

Among these phosphorylation sites, 7843 (6152+995+696) are from high-throughput (HTP) screening, 6329 (3828+1152+1349) are from low-throughput (LTP) analysis, and only 679 (420+97+162) are both from HTP and LTP screening. One special note is that there are 887 phosphorylation sites, which do not have annotation from literature in the SwissProt database and it is not possible distinguish whether these are from HTP or LTP.

It contains 400 manually curated interactions and much more from HTP methods. The manually curated set can not be distinguished in the data formats offered.

Data integration in pypath: static

Methods in pypath

Data source (URLs and files)

Data format definition

Data input methods

Miscellaneous


PhosphoSite – PhosphoSitePlus

Category || Subcategory >>> Literature curated and high throughput || Post-translational modification

Last updated: 2016

Updated in years: 2011, 2015, 2016

Created by CST

Contact:

License: CC-NonCommercial-ShareAlike

Articles

Webpages

PubMed

Collections

Taxons: Human, Mouse, Eubacteria, Eukarya

Quotes

PSP integrates both low- and high-throughput (LTP and HTP) data sources into a single reliable and comprehensive resource. Nearly 10,000 journal articles , including both LTP and HTP reports, have been manually curated by expert scientists from over 480 different journals since 2001.

Information from nearly 13 000 papers and 600 different journals characterizing modification sites with LTP methods has been curated into PSP.

Information is gathered from published literature and other sources. Published literature is searched semi-automatically with multiple intelligent search algorithms to identify reports that potentially identify phosphorylation sites in human, mouse or other species. Each identified report is then scanned by our highly trained curatorial staff (all with PhDs and extensive research experience in cell biology or related disciplines) to select only those papers that either identify new physiological phosphorylation sites or those that illuminate the biological function of the phosphorylation event. Records that are selected for inclusion into PhosphoSite are placed in the curatorial queue for processing. Note: while we gather records that describe both in vitro and in vivo phosphorylation events, we only finally submit records about in vitro sites when we have additional hard evidence that the site is also phosphorylated in vivo.

Data integration in pypath: dynamic

Methods in pypath

Data source (URLs and files)

Data format definition

Data input methods

Interactions

Miscellaneous

Enzyme-substrate relationships and PTMs


Reactome

Category || Subcategory >>> Literature curated || Reaction network

Last updated: 2016

Updated in years: 2004, 2008, 2010, 2012, 2014, 2015, 2016

Contact:

License: CC-Attribution-4.0

Webpages

Articles

PubMed

Collections

Quotes

Once the content of the module is approved by the author and curation staff, it is peer-reviewed on the development web-site, by one or more bench biologists selected by the curator in consultation with the author. The peer review is open and the reviewers are acknowledged in the database by name. Any issues raised in the review are resolved, and the new module is scheduled for release.

No binary interactions can be exported programmatically from any format of the Reactome dataset. Reactome's curation method doesn't cover binary interactions, the inferred lists on the webpage are based on automatic expansion of complexes and reactions, and thus are unreliable. In lack of information, references cannot be assigned to interactions.

Data integration in pypath: dynamic

Methods in pypath

Data source (URLs and files)

Data format definition

Data input methods

Interactions


SignaLink

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2015

Updated in years: 2010, 2012, 2016

Created by NetBiol Group

Contact:

License: CC-Attribution-NonCommercial-ShareAlike-3.0

Webpages

Articles

PubMed

Collections

Taxons: Human, D. melanogaster, C. elegans

Quotes

In each of the three organisms, we first listed signaling proteins and interactions from reviews (and from WormBook in C.elegans) and then added further signaling interactions of the listed proteins. To identify additional interactions in C.elegans, we examined all interactions (except for transcription regulation) of the signaling proteins listed in WormBase and added only those to SignaLink that we could manually identify in the literature as an experimentally verified signaling interaction. For D.melanogaster, we added to SignaLink those genetic interactions from FlyBase that were also reported in at least one yeast-2-hybrid experiment. For humans, we manually checked the reliability and directions for the PPIs found with the search engines iHop and Chilibot.

SignaLink assigns proteins to signaling pathways using the full texts of pathway reviews (written by pathway experts). While most signaling resources consider 5–15 reviews per pathway, SignaLink uses a total of 170 review papers, i.e. more than 20 per pathway on average. Interactions were curated from a total of 941 articles (PubMed IDs are available at the website). We added a small number of proteins based on InParanoid ortholog clusters. For curation, we used a self-developed graphical tool and Perl/Python scripts. The current version of SignaLink was completed in May 2008 based on WormBase (version 191), FlyBase (2008.6), Ensembl, UniProt and the publications listed on the website.

The curation protocol of SignaLink (Fig. 1A) contains several steps aimed specifically at reducing data and curation errors. We used reviews as a starting point, manually looked up interactions three times, and manually searched for interactions of known signaling proteins with no signaling interactions so far in the database.

For OmniPath we used the literature curated part of version 3 of SignaLink, which is unpublished yet. Version 2 is publicly available, and format definitions in pypath exist to load the version 2 alternatively.


Signor – Signaling Network Open Resource

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2015

Updated in years: 2015

Created by Cesareni Group

Contact:

License: No license

Webpages

Articles

PubMed

Collections

Direct data import from: SignaLink3, PhosphoSite

Quotes

SIGNOR, the SIGnaling Network Open Resource, organizes and stores in a structured format signaling information published in the scientific literature. The captured information is stored as binary causative relationships between biological entities and can be represented graphically as activity flow. The entire network can be freely downloaded and used to support logic modeling or to interpret high content datasets. The core of this project is a collection of more than 11000 manually-annotated causal relationships between proteins that participate in signal transduction. Each relationship is linked to the literature reporting the experimental evidence. In addition each node is annotated with the chemical inhibitors that modulate its activity. The signaling information is mapped to the human proteome even if the experimental evidence is based on experiments on mammalian model organisms.

Methods in pypath

Data source (URLs and files)

Data format definition

Interactions

Enzyme-substrate relationships and PTMs


SPIKE – Signaling Pathway Integrated Knowledge Engine

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2012

Updated in years: 2008, 2011, 2012

Created by Shamir Group, Shiloh Group

Contact:

License: No license

Articles

Webpages

PubMed

Collections

Quotes

SPIKE’s data on relationships between entities come from three sources: (i) Highly curated data submitted directly to SPIKE database by SPIKE curators and experts in various biomedical domains. (ii) Data imported from external signaling pathway databaes. At present, SPIKE database imports such data from Reactome, KEGG, NetPath and The Transcription Factor Encyclopedia (http://www.cisreg.ca/cgi-bin/tfe/home.pl). (iii) Data on protein–protein interactions (PPIs) imported either directly from wide-scale studies that recorded such interactions [to date,PPI data were imported from Stelzl et al., Rual et al. and Lim et al.] or from external PPI databases [IntAct and MINT]. Relationship data coming from these different sources vary greatly in their quality and this is reflected by a quality level attribute, which is attached to each relationship in SPIKE database (Supplementary Data). Each relationship in SPIKE is linked to at least one PubMed reference that supports it.

As of August 2010, the SPIKE database contains 20 412 genes/proteins, 542 complexes (327 of high quality), 320 protein families (167 of high quality) and 39 small molecules. These entities are linked by 34 338 interactions (of which 2400 are of high quality) and 6074 regulations (4420 of high quality). These are associated with 5873 journal references in total.

Each of the maps is constructed by a domain expert; typically the same expert will also be responsible later for keeping it up-to-date. The expert reads the relevant literature and identifies those interactions and regulations that are pertinent to the pathway.

The regulations and interactions in the database are assigned quality values ranging from 1 to 4. In general, relationships (regulations and interactions) derived from highly focused biochemical studies are assigned high quality (2 or 1) while those derived from high-throughput experiments are assigned lower quality (4 or 3). The curator uses best judgment to assign a quality level. For example, relationships mentioned in two independent research reports, or cited repeatedly in reviews written by leading authorities will get quality 1. Relationships with cited concrete references and those imported en masse from external curated signaling DBs are initially assigned quality 2 but later can be changed to the highest quality after the curator has read and was convinced by the cited papers. Data imported from protein-protein interaction DBs and datasets are assigned quality 3 or 4, depending on the experimental technique.

Data integration in pypath: static

Methods in pypath

Data source (URLs and files)

Data format definition


STRING

Category || Subcategory >>> High-throughput and prediction || Interaction

Last updated: 2016

Updated in years: 2016, 2015, 2013, 2011, 2009, 2007, 2005, 2003, 2000

Created by Bork Lab

Contact:

License: CC-Attribution-3.0 or CC-Attribution-NonCommercial-ShareAlike-3.0

Webpages

Articles

PubMed

Collections


TLR

Category || Subcategory >>> Literature curated || Model

License: No license

Articles


TRIP – Mammalian Transient Receptor Potential Channel-Interacting Protein Database

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2014

Updated in years: 2010, 2012

Contact:

License: CC-Attribution-ShareAlike-3.0

Articles

Webpages

PubMed

Collections

Taxons: Human, Mouse, Rat

Nodes: 468, Edges: 744

Quotes

The literature on TRP channel PPIs found in the PubMed database serve as the primary information source for constructing the TRIP Database. First, a list of synonyms for the term ‘TRP channels’ was constructed from UniprotKB, Entrez Gene, membrane protein databases (Supplementary Table S2) and published review papers for nomenclature. Second, using these synonyms, a list of articles was obtained through a PubMed search. Third, salient articles were collected through a survey of PubMed abstracts and subsequently by search of full-text papers. Finally, we selected articles that contain evidence for physical binding among the proteins denoted. To prevent omission of relevant papers, we manually screened information in other databases, such as DIP, IntAct, MINT, STRING, BioGRID, Entrez Gene, IUPHAR-DB and ISI Web of Knowledge (from Thomson Reuters). All 277 articles used for database construction are listed in our database website.

Good manually curated dataset focusing on TRP channel proteins, with ~800 binary interactions. The provided formats are not well suitable for bioinformatics use because of the non standard protein names, with greek letters and only human understandable formulas. Using HTML processing from 5-6 different tables, with couple hundreds lines of code, one have a chance to compile a usable table.

Methods in pypath

Data source (URLs and files)

Data format definition

Data input methods

Interactions


Vidal HI-III

Category || Subcategory >>> High-throughput || Yeast 2 hybrid

Last updated: 2016

Updated in years: 2012, 2014, 2016

Contact:

License: No license. "This dataset is freely available to the research community through the search engine or via download."

Articles

Webpages

PubMed


WikiPathways

Category || Subcategory >>> Literature curated || Reaction network

Last updated: 2016

Updated in years: 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016

Contact:

License: CC-Attribution-3.0

Webpages

Articles

Collections

Quotes

The goal of WikiPathways is to capture knowledge about biological pathways (the elements, their interactions and layout) in a form that is both human readable and amenable to computational analysis.

The data is not accessible. Interactions are available in BioPAX format, but without references.


Zaman 2013

Category || Subcategory >>> Literature curated || Pathway

Last updated: 2013

Created by Wang Lab

Contact:

License: No license.

Articles

PubMed

Quotes

The human signaling network (Version 4, containing more than 6,000 genes and more than 50,000 relations) includes our previous data obtained from manually curated signaling networks (Awan et al., 2007; Cui et al., 2007; Li et al., 2012) and by PID (http://pid.nci.nih.gov/) and our recent manual curations using the iHOP database (http://www.ihop-net.org/UniPub/iHOP/).



Dénes Türei, 2017. Feedback: omnipath@googlegroups.com

Valid HTML5 Valid CSS3