CCU013: High-throughput electronic health record phenotyping approaches

Project lead:
Spiros Denaxas, UCL

When a patient visits their GP or is admitted into hospital, information about their symptoms, diagnosis, lab test results and prescriptions is inputted and stored in ‘Electronic Health Records’ (‘EHRs’). These EHR’s are a valuable resource for researchers and clinicians to be able to analyse the health data of large numbers of patients, with the aim of using this information to improve patient health and care.

However, as information in these EHRs is inputted by different health workers around the UK, there can be variations in the amount of detail that has been included and the records can contain many inconsistencies. This means that researchers need to initially spend a considerable amount of time and effort to be able to obtain the most relevant information from these EHRs, before they can then start to effectively analyse them. Examples include trying to identify which patients may or may not have a particular disease, or to extract individual measurements – such as high blood pressure or whether they are a smoker – from billions of rows of data.

To improve this, this project will create and evaluate different ways of being able to extract this valuable information from complex EHRs, so that the records can be most effectively analysed by the CVD-COVID-UK consortium. As our understanding of COVID-19 is developing rapidly, being able to access accurate information for research more quickly is especially important. This includes being able to accurately understand the impact that COVID-19 infection has in patients in the longer term – known as ‘long COVID’ – which affects multiple organs in the body.

The approaches developed in this project will benefit all of the research being undertaken by the CVD-COVID-UK consortium, and shared with the wider scientific and medical community by publishing the results openly. This will maximise the benefits of using information from EHRs, and ensure research can be reproduced effectively. Most importantly, this will speed up the ability to effectively analyse health information in EHRs, and directly improve benefits to patients and healthcare.

View this project on the Health Data Research Gateway

Sub-projects

CCU013_01: Characterising COVID-19 related events in a nationwide electronic health record cohort of 55.9 million people in England

CCU013_02: High-throughput phenome-wide analyses of electronic health records in primary care and secondary care to identify phenotypes associated with COVID-19 disease severity and acute post-infection sequelae (PheWAS study)

CCU013_03: Understanding the risk of adverse health events following severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in 2020 (prior to vaccines becoming available) and in the era of delta among the fully vaccinated and the electively unvaccinated; a PheWas approach (CONVALESCENCE long covid study)

CCU013_04: Sex and socioeconomic deprivation differences in disease prevalence and mortality and impact of COVID-19

CCU013_05: Defining a COVID-19 reinfection phenotype from national electronic health records

CCU013_06: Characterisation and subtyping of patients hospitalised with long COVID using routinely collected electronic health records Development and validation of a prediction tool for mortality and clinical deterioration of hospitalised COVID-19 patients

CCU013_07: Development and validation of a prediction tool for mortality and clinical deterioration of hospitalised COVID-19 patients

Outputs

COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records

  • The Lancet Digital Health publication 08/06/22 can be viewed here
  • medRxiv preprint 09/11/21 can be viewed here
  • Code and phenotypes used to produce this paper are available in GitHub here

See more Projects