Systematic comparison of Mendelian randomization studies and randomized controlled trials using electronic databases

Posted by Tom Gaunt on Saturday, April 16, 2022



Mendelian Randomization (MR) uses genetic instrumental variables to make causal inferences. Whilst sometimes referred to as “nature’s randomized trial”, it has distinct assumptions that make comparisons between the results of MR studies with those of actual randomized controlled trials (RCTs) invaluable.

In this medRxiv pre-print we mined ClinicalTrials.Gov, PubMed and EpigraphDB databases and carried out a series of 26 manual literature comparisons among 54 MR and 77 RCT publications.

What we did

We downloaded all results for all trials within, filtering them as illustrated in the figure. We used EpigraphDB to collect information about drug-target associations and semantic triples associated with selected MR and RCT publications based on a comprehensive search of PubMed. We then mapped MR studies to the corresponding RCTs to evaluate consistency and disagreement between the results.

We found that only 11% of completed RCTs identified in ClinicalTrials.Gov submitted their results to the database. Similarly low coverage was revealed for Semantic Medline (SemMedDB) semantic triples derived from MR and RCT publications – 25% and 12%, respectively.

Among intervention types that can be mimicked by MR, only trials of pharmaceutical interventions could be automatically matched to MR results due to insufficient annotation with MeSH ontology. A manual survey of the literature highlighted the potential for triangulation across a number of exposure/outcome pairs if similar challenges can be addressed. We conclude that careful triangulation of MR with RCT evidence should involve consideration of similarity of phenotypes across study designs, intervention intensity and duration, study population demography and health status, comparator group, intervention goal and quality of evidence.


‘Systematic comparison of Mendelian randomization studies and randomized controlled trials using electronic databases' by Maria K. Sobczyk, George Davey Smith and Tom R. Gaunt in medRxiv.

Code and data availability

Code used to carry out the analysis is available on GitHub: ClinicalTrials.Gov data was accessed via AACT: and analysed data subset is available in Supplementary Datasets 1 & 2. pQTL and eQTL MR analysis results are available via EpigraphDB: