Skip to content

STRING

STRING is a database of known and predicted protein-protein interactions. The interactions include direct (physical) and indirect (functional) associations; they stem from computational prediction, from knowledge transfer between organisms, and from interactions aggregated from other (primary) databases.

This ingest uses a given version (currently 11.5) of STRING's <taxon>.protein.links.detailed.<version>.txt.gz files for a subset of NCBI taxon ID designated species. We filter the input data on the combined_score field (currently with the threshold recorded in the protein_links.yaml file). The various taxon-specific entrez_2_string mapping files are used to map protein subject and object nodes onto Entrez gene IDs.

Special note about Entrez mapping files

A separate Entrez to String identifier mapping file is not available for Rattus norvegicus (Norway rat, NCBI taxon ID 10116) but the mappings are (less conveniently) available inside the aggregated 'all_organisms' entrez_2_string file. See notes in the STRING section of the download.yaml configuration file for guidance on how to prepare the required mapping file for use in a local running of the ingest.

Source File Fields

  • protein1
  • protein2
  • neighborhood
  • fusion
  • cooccurence
  • coexpression
  • experimental
  • database
  • textmining
  • combined_score

Biolink Captured:

  • biolink:Gene

    • id (NCBIGene Entrez ID)
  • biolink:PairwiseGeneToGeneInteraction

    • id (random uuid)
    • subject (gene.id)
    • predicate (biolink:interacts_with)
    • object (gene.id)
    • aggregating_knowledge_source (["infores:monarchinitiative"])
    • primary_knowledge_source (infores:string)

Citation

Damian Szklarczyk, Andrea Franceschini, Stefan Wyder, Kristoffer Forslund, Davide Heller, Jaime Huerta-Cepas, Milan Simonovic, Alexander Roth, Alberto Santos, Kalliopi P. Tsafou, Michael Kuhn, Peer Bork, Lars J. Jensen, Christian von Mering. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Research, Volume 43, Issue D1, 28 January 2015, Pages D447-D452. https://doi.org/10.1093/nar/gku1003

License

BSD-3-Clause