What is whole population data?
We refer to ‘whole population data’ as the collection of health records for an entire nation. This includes information provided by patients, and collected by the NHS, as part of their care and support. For example, when patients visit their GP or go to hospital, information about their appointment or visit, and what actions were taken, is normally recorded on a computer in a structured way (i.e. within pre-specified fields designed to capture certain detail). These records are often referred to as ‘electronic health records’.
Professor Angela Wood is the Theme Lead for Whole Population Data.
Why is ‘whole population data’ important for cardiovascular disease research?
Whole population data is important for cardiovascular research as it ensures that findings are representative of the whole population and all demographic groups, including diverse ethnicities, ages, genders, socioeconomic backgrounds, and geographic locations. This leads to more reliable results that are representative and relevant across the population to inform health policy and healthcare.
Having data from the whole population also makes it possible to study rare diseases or conditions that may not be adequately represented in smaller datasets.
However, access to, and analysis and interpretation of, this large-scale nationally-collated health data is challenging. There are complexities and deficiencies in the processes for data access approvals, provisioning, quality, linkage, analytic environment, analysis tools and capability.
What are we doing?
Our aim is to improve the accessibility and use of high quality, linked health datasets at UK population-wide scale for cardiovascular research.
Enabling access to linked whole population data
We have partnered with NHS England to enable, for the first time, analysis of linked health datasets for the whole population of England. This is possible within NHS England’s Trusted Research Environment (TRE) for England. This enables research using linked de-identified health records in a secure environment, enhancing privacy and security of data.
Research access to similar health data is in place in Wales and in Scotland, while it is in the process of being set up in Northern Ireland.
The power of linked population data has been demonstrated by our research, including:
- The first 3 nation (England, Scotland and Wales) analysis published in Nature Medicine used trends in dispensed medicines data to demonstrate the impact of COVID-19 on the management of cardiovascular diseases.
- The first whole UK population study of 68 million people revealed the impact of COVID-19 under-vaccination
- Studies on populations that are typically underserved in cardiovascular disease research, such as, the analysis of ethnicity recording across different routinely collected NHS datasets; investigation into COVID-19 infection and vaccination during pregnancy and the role in cardiovascular-related maternal health outcomes; analysis on the impact of COVID-19 on congenital heart disease procedures in children; the first analysis of rare diseases at a national scale (>58 million people) and outcomes after COVID-19 infection, enabling research on even very rare conditions (<1 case per million)
Driver research programme: CVD-COVID-UK/COVID-IMPACT
The CVD-COVID-UK/COVID-IMPACT research programme is led and coordinated by the Centre. The programme provides streamlined national scale access for whole population research to improve our understanding of the relationship between COVID-19 and cardiovascular disease.
This programme currently supports more than 80 cardiovascular studies, in a consortium of over 400 members, across more than 50 NHS and academic institutions, resulting in over 40 publications.
Faster, more efficient, higher quality, and reproducible cardiovascular data science research
Our health data science team enable researchers, regardless of their data science expertise, to address cardiovascular research questions and generate insights. Our team makes available health data science resources, including reusable curation and analysis pipelines and have developed an interactive Dataset Summary Dashboard to provide researchers with information on the datasets provisioned for the CVD-COVID-UK/COVID-IMPACT research programme to aid feasibility and planning of studies
All code, protocols, and phenotype algorithms are publicly available, prior to publication, through GitHub, HDR Innovation Gateway and the HDR UK Phenotype Library.
Facilitating research across nations
Research across the four nations of the UK is still very challenging, because the environments in which these analyses are run are still separate, and the data is not formatted in the same way.
Analysis across multiple nations, including internationally, or datasets is much simpler if the data is formatted in the same way, or uses the same data model. We are a data partner in the European Health Data Evidence Network (EHDEN) and have worked with edenceHealth to map key datasets to the OMOP (Observational Medical Outcomes Partnership) common data model. This will facilitate analysis across the four nations but also internationally.
Areas of work
Find out more about our data-led research.
Defining Disease
Developing methods to define cardiovascular health and disease in computable form through a collaborative network of expertise that provides a world-leading, open, cardiovascular phenotype library of tools and protocols.
Enhancing Cohorts
Facilitating the linkage of large, ‘omics-rich’ cohorts to electronic health records to better understand the causes of cardiovascular diseases.
Data Enabled Clinical Trials
Supporting the development of efficient, cost-effective trials, using routine health data to recruit and follow patients with cardiovascular conditions.
Imaging
Better use of unstructured data: addressing the challenges of accessing, improving and using unstructured data, for example from cardiac and brain imaging, medical free text and electrocardiograms.
Smartphones and Wearables
Exploring how data from apps and wearables, linked to other health datasets, can inform trajectories of cardiovascular health and disease.
CVD-COVID-UK / COVID-IMPACT
One of seven National Flagship Projects approved by the NIHR-BHF Cardiovascular Partnership, linking population healthcare datasets across the UK to understand the relationship between COVID-19 and cardiovascular diseases.
Diabetes Data Science Catalyst
This exciting partnership between the BHF Data Science Centre, Diabetes UK and HDR UK aims to develop improvements in our understanding of the link between cardiovascular diseases and diabetes.
Stroke Data Science Catalyst
This partnership between the BHF Data Science Centre, HDR UK and the Stroke Association will enable researchers to securely access, link and analyse existing UK health data, speeding up the search for better stroke prevention, treatments and care.
Kidney Data Science Catalyst
This partnership between the BHF Data Science Centre, Kidney Research UK and HDR UK will enable researchers to securely access, link and analyse existing UK health data, speeding up the search for better kidney and cardiovascular disease prevention, treatments, and care.