Whole Population Data

Better use of nationally-collated, structured, coded data: accessing, improving and using linked, national, population-wide health data.

What is whole population data?

We refer to ‘whole population data’ as the collection of health records for an entire nation. This includes information provided by patients, and collected by the NHS, as part of their care and support. For example, when patients visit their GP or go to hospital, information about their appointment or visit, and what actions were taken, is normally recorded on a computer in a structured way (i.e. within pre-specified fields designed to capture certain detail). These records are often referred to as ‘electronic health records’.

Why is ‘whole population data’ important for cardiovascular disease research?

Whole population data is important for cardiovascular research as it ensures that findings are representative of the whole population and all demographic groups, including diverse ethnicities, ages, genders, socioeconomic backgrounds, and geographic locations. This leads to more reliable results that are representative and relevant across the population to inform health policy and healthcare.

Having data from the whole population also makes it possible to study rare diseases or conditions that may not be adequately represented in smaller datasets.

However, access to, and analysis and interpretation of, this large-scale nationally-collated health data is challenging. There are complexities and deficiencies in the processes for data access approvals, provisioning, quality, linkage, analytic environment, analysis tools and capability.

What are we doing?

Our aim is to improve the accessibility and use of high quality, linked health datasets at UK population-wide scale for cardiovascular research.

Enabling access to linked whole population data

We have partnered with NHS England to enable, for the first time, analysis of linked health datasets for the whole population of England. This is possible within NHS England’s Trusted Research Environment (TRE) for England. This enables research using linked de-identified health records in a secure environment, enhancing privacy and security of data.

Research access to similar health data is in place in Wales and in Scotland, while it is in the process of being set up in Northern Ireland.

The power of linked population data has been demonstrated by our research, including:

Driver research programme: CVD-COVID-UK/COVID-IMPACT

The CVD-COVID-UK/COVID-IMPACT research programme is led and coordinated by the Centre. The programme provides streamlined national scale access for whole population research to improve our understanding of the relationship between COVID-19 and cardiovascular disease.

This programme currently supports more than 80 cardiovascular studies, in a consortium of over 400 members, across more than 50 NHS and academic institutions, resulting in over 40 publications.

Faster, more efficient, higher quality, and reproducible cardiovascular data science research

Our health data science team enable researchers, regardless of their data science expertise, to address cardiovascular research questions and generate insights. Our team makes available health data science resources, including reusable curation and analysis pipelines and have developed an interactive Dataset Summary Dashboard to provide researchers with information on the datasets provisioned for the CVD-COVID-UK/COVID-IMPACT research programme to aid feasibility and planning of studies

All code, protocols, and phenotype algorithms are publicly available, prior to publication, through GitHub, HDR Innovation Gateway and the HDR UK Phenotype Library.

Facilitating research across nations

Research across the four nations of the UK is still very challenging, because the environments in which these analyses are run are still separate, and the data is not formatted in the same way.

Analysis across multiple nations, including internationally, or datasets is much simpler if the data is formatted in the same way, or uses the same data model. We are a data partner in the European Health Data Evidence Network (EHDEN) and have worked with edenceHealth to map key datasets to the OMOP (Observational Medical Outcomes Partnership) common data model. This will facilitate analysis across the four nations but also internationally.

Areas of work

Find out more about our data-led research.