CCU013: High-throughput electronic health record phenotyping approaches

Project lead:
Spiros Denaxas, UCL

When a patient visits their GP or is admitted into hospital, information about their symptoms, diagnosis, lab test results and prescriptions is inputted and stored in ‘Electronic Health Records’ (‘EHRs’). These EHR’s are a valuable resource for researchers and clinicians to be able to analyse the health data of large numbers of patients, with the aim of using this information to improve patient health and care.

However, as information in these EHRs is inputted by different health workers around the UK, there can be variations in the amount of detail that has been included and the records can contain many inconsistencies. This means that researchers need to initially spend a considerable amount of time and effort to be able to obtain the most relevant information from these EHRs, before they can then start to effectively analyse them. Examples include trying to identify which patients may or may not have a particular disease, or to extract individual measurements – such as high blood pressure or whether they are a smoker – from billions of rows of data.

To improve this, this project will create and evaluate different ways of being able to extract this valuable information from complex EHRs, so that the records can be most effectively analysed by the CVD-COVID-UK consortium. As our understanding of COVID-19 is developing rapidly, being able to access accurate information for research more quickly is especially important. This includes being able to accurately understand the impact that COVID-19 infection has in patients in the longer term – known as ‘long COVID’ – which affects multiple organs in the body.

The approaches developed in this project will benefit all of the research being undertaken by the CVD-COVID-UK consortium, and shared with the wider scientific and medical community by publishing the results openly. This will maximise the benefits of using information from EHRs, and ensure research can be reproduced effectively. Most importantly, this will speed up the ability to effectively analyse health information in EHRs, and directly improve benefits to patients and healthcare.


COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records

  • The Lancet Digital Health publication 08/06/22 can be viewed here
  • medRxiv preprint 09/11/21 can be viewed here
  • Code and phenotypes used to produce this paper are available in GitHub here