Ethnicity data from health records of over 61 million people studied in detail for first time

22 Feb 2024

In research published today in Nature Scientific Data, ethnicity data from general practice and hospital records of more than 61 million people in England has been studied in detail in for the first time.

Researchers assessed the available detail of ethnicity data from different sources of NHS records in England. They showed that much more detailed classification of ethnicity is possible than health researchers typically use. They also highlighted that ethnicity information was missing for almost 1 in 10 patients, while around 12% of patients had conflicting ethnicity codes in their patient records.

The study was led by researchers at the University of Oxford, University College London and the Centre for Ethnic Health Research, and made possible through the support of Health Data Research UK (HDR UK) and the British Heart Foundation (BHF) Data Science Centre. The researchers analysed de-identified data on ethnicity and other characteristics from general practice and hospital health records, accessed safely within NHS England’s Secure Data Environment (SDE). It is the first part of a three-phase project aiming to reduce bias in AI health prediction models.

Sara Khalid, Associate Professor of Health Informatics and Biomedical Data Science at NDORMS, explained: ‘Because AI-based healthcare technology depends on the data that is fed into it, a lack of representative data can lead to biased models that ultimately produce incorrect health assessments. Better data from real-world settings, such as the data we have collected, can lead to better technology and ultimately better health for all.’ 

Professor Cathie Sudlow, Chief Scientist of Health Data Research UK and Director of the BHF Data Science Centre, said: ‘We are delighted to be supporting hundreds of researchers to harness the power of the UK’s rich health data. This study on ethnicity recording highlights how different sources of health data from the whole English population can be accessed and analysed in a safe and secure way, providing insights that are relevant to everyone. The findings will empower health professionals, patients, carers and policy makers to make better decisions that will benefit people of all ages, ethnic groups, and social backgrounds across the country.’  

The project, which is part of the UK Government’s COVID-19 Data and Connectivity National Core Study, led by HDR UK, will now focus on using these detailed results to better describe how different ethnicities were impacted by the COVID-19 pandemic.

The team have now compiled their findings into a research-ready database. The study can be accessed now at Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity | Scientific Data (

  • News