CCU005: Data management and analysis records

Project lead:
Angela Wood, University of Cambridge

Since the COVID-19 pandemic, there has been rapid progress made towards the availability and accessibility of national healthcare data for research. Consequently, for the first time we are analysing data from over 65 million patients across the UK to help us understand more about COVID-19. Our analyses have the advantages of being able to study individuals with and without different health-related problems across all age groups, ethnicities, geographies and socioeconomic settings. Our results will be directly relevant to everyone living in the UK.

However, there are a number of challenges and limitations to using routinely collected healthcare data for research. The problems mainly arise because electronic health records are designed for clinical purposes, and do not necessary provide an accurate picture of the true health status on all patients at all times. If we do not address these problems properly in the analysis, then we will get biased results and make incorrect conclusions.

We aim to identify and provide solutions to address the challenges and limitations in the analysis of population-wide healthcare data. Ultimately, we want to ensure results arising from population-data healthcare data are accurately reported.


Linked electronic health records for research on a nationwide cohort including over 54 million people in England

  • BMJ publication 08/04/21 can be viewed here
  • BMJ editorial 08/04/21 can be viewed here and public contributor opinion piece here
  • The press release explaining this research can be viewed here
  • medRxiv preprint 26/02/21 can be viewed here
  • Code and phenotypes used to produce this paper are available in GitHub here

Harmonising electronic health records for reproducible research: challenges, solutions and recommendations from a UK-wide COVID-19 research collaboration

  • BMC Medical Informatics and Decision Making publication 16/01/23 can be viewed here
  • Preprint 28/10/22 can be viewed here
  • Related GitHub repo can be accessed here