A huge amount of health information is generated during routine interactions with the NHS and is stored in electronic health records. Data research to improve public health depends on computational code to interpret the complex coded information found in electronic health records. Here Dr Jackie MacArthur talks about the BHF Data Science Centre’s work in this area, led by Prof Spiros Denaxas, and a flagship report/set of guidelines to improve the reporting and sharing of this code.
Interpreting complex health data
The UK’s electronic health records, which cover the entire population across their lifespan, have the unique potential to transform public health. However, for this potential to be realised, researchers need the tools to analyse the health data of vast numbers of people. Researchers can interpret the complex information in someone’s medical record and extract clinically relevant information by creating algorithms, computer rules that aid data processing.
This clinically relevant information is often referred to as phenotypes. Phenotypes can include any observable or measurable characteristic of a person, such as whether someone has been diagnosed with a particular disease, their blood pressure, weight measurement, or smoking status.
The computer code that interprets this information and determine a person’s phenotype is often called a phenotyping algorithm.
Making phenotyping algorithms available and accessible
The BHF Data Science Centre enables data-led research to improve heart and circulatory health. Part of this includes supporting researchers to use enormous health data sets by ensuring phenotyping algorithms are available to meet their needs.
Currently, most phenotyping algorithms are not readily accessible because of limited sharing and a lack of defined standards. This limits their availability, quality and utility, presenting challenges for researchers who want to re-use phenotyping algorithms or interpret research results generated using them.
We want to make sure that phenotyping algorithms that have already been developed are findable, accessible, interoperable (usable across systems), and reusable (FAIR) by everyone. This is vital to ensure that research is interpretable and reproducible, and will make research more efficient and minimise waste. We also want to make sure that phenotyping algorithms are fully described, so researchers can understand what they do and how to use them.
Listening to the research community
We carried out community engagement and workshops to identify researchers’ needs. These workshops brought together researchers across the cardiovascular research community, including clinical trialists, health data scientists and clinicians. Based on our findings we are excited that we can now report the first set of recommendations in this area to ensure phenotyping algorithms are available and will support research.
Our recommendations include that phenotyping algorithms should be made available via a single, centrally accessible repository, and that they are fully described using the information we outline. We also encourage the agreement of a community standard for reporting and sharing of phenotyping algorithms, and hope that our recommendations form a catalyst for this.
“Establishing a standardized approach for defining and curating EHR phenotyping algorithms is critical for performing large, federated studies and for ensuring that research using such complex data is reproducible and FAIR. Our report takes the first step towards this direction by providing a set of actionable recommendations for the international community.”
Professor Spiros Denaxas of Biomedical Informatics based at University College London and Associate Director at the BHF Data Science Centre
The next steps
We are collaborating with the HDR UK Phenotype Library, an open access resource for phenotyping algorithms, to promote efficient curation and sharing of phenotyping algorithms using electronic health records.
We are also ensuring that all phenotyping algorithms developed or used as part of BHF Data Science Centre research are shared with the community. To help achieve this, we are encouraging developers and users of phenotyping algorithms to:
- Share their phenotyping algorithms publicly and freely, ideally via an open access repository.
- Follow best practices for curation by annotating phenotyping algorithms with the data and metadata in our recommended list.
Our ambition is that these recommendations will increase the sharing of high-quality phenotyping algorithms. This will maximise the potential of phenotyping algorithms to produce high-quality and efficient research, and thus the benefits of this research in terms of healthcare improvements.
Read more about BHF Data Science Centre work to define standards and best practice here.
Full report available here: