EpiGraphDB version 1.0
The EpiGraphDB platform has been updated with a new major release (version 1.0). This is the first release since version 0.3 in 2020 (what a year!) as well as since the publication of the journal article on Bioinformatics. We believe the underlying integration pipeline, data structure and architecture for the EpiGraphDB platform has now progressed sufficiently to a stable state that we are pleased to announce this major release a version 1.0!
In the following sections we highlight a few key new features and changes in this update. For more detailed and technical changes, please visit the changelog in the platform documentation.
New and overhauled data sources
ClinVar is a public archive of reports of genetic variants and interpretations of their clinical relevance to disease. The variants are submitted by clinical testing laboratories, research laboratories, expert panels and other groups.
We import ClinVar data (extracted on 2021-01-12) as gene-disease associations, available as
[GENE_TO_DISEASE] relationship in EpiGraphDB. The sources of information for the gene-disease relationship include OMIM, GeneReviews, and a limited amount of curation by NCBI staff.
Mapping between EBI GWAS Catalog GWAS traits and EFO terms
To complement existing semantic mapping between
(Efo) ontology terms
we have added the official mapping from EBI GWAS Catalog
(available as “ebi-a” studies in OpenGWAS) and EFO terms.
Such mapping is available as
We have incorporated the latest
evidence to EpiGraphDB.
The MR-EvE evidence is represented as
With this update,
[MR_EVE_MR] evidence has increased
from 583,619 records to 25,804,945 records (for further details visit the
metadata and metrics
For further examples regarding the MR-EvE evidence, take a look at
the MR view on the EpiGraphDB WebUI and
the confounder view as well as the underlying
API endpoints in the EpiGraphDB API.
The Reactome data source has been overhauled and simplified. We now make use of the protein and pathway data sets available to download here.
Literature derived evidence
In addition to the newer version of SemMedDB (semmedVER42_R) we used SemRep to create semantic triples from the MedRxiv and BioRxiv titles and abstracts. This resulted in renaming the literature nodes and relationships in the graph, e.g. instead of
(SemmedTriple) we now have
and instead of
(SemmedTerm) we now have
each with a
Relationships between the new nodes are named after the data source, e.g.
in place of
We have refactored our entire graph build pipeline to improve transparency, reliability and robustness. For this we use the neo4j-build-pipeline which uses defined schemas and tests to ensure the graph is consistent and clean. More details on this can be found in the following blog posts https://www.biocompute.org.uk/post/neo4j_data_integration/ and https://neo4j.com/blog/neo4j-data-integration-pipeline-using-snakemake-and-docker/.
In addition, the source code for the Graph, WebUI and API are now hosted on GitHub under the MRCIEU organisation. We plan to write a separate blog post regarding the technologies behind the EpiGraphDB platform in the near term future.
For further information on the software side of the EpiGraphDB project (as well as other software projects developed in the MRC IEU) please visit MRC IEU’s GitHub Pages.
EpiGraphDB can be accessed and interactive with via the following ways: