Phenotyping

The BHF Data Science Centre enables data-led research to improve cardiometabolic health and healthcare. Part of this includes supporting researchers to effectively and efficiently use electronic health record data by ensuring phenotyping algorithms are available to meet their needs. 

Phenotyping algorithms

All phenotyping algorithms created or used in research we support are shared via the HDR UK Phenotype Library and on each projects Github repository in accordance with our Publication and Dissemination Policy and the CVD-COVID-UK Ways of Working. 

You can access all BHF Data Science Centre phenotypes in the Phenotype Library here 

We encourage all developers and users of phenotyping algorithms to: 

  • Share their phenotyping algorithms publicly and freely, ideally via an open access repository. 
  • Follow best practices for curation by annotating phenotyping algorithms with the data and metadata in our recommended list. 

You can read our full recommendations here

Submitting to the Phenotype Library

The Phenotype Library is an openlyaccessible, searchable repository of electronic health record phenotyping algorithms. The Phenotype Library contains structured data and metadata describing each phenotyping algorithm to ensure researchers can identify, interpret and re-use algorithms. 

Preparation of submission materials 

The following materials are required for submission: 

  • A file containing the list of medical ontology terms (e.g. code list of ICD-10 or SNOMED terms) for each phenotyping algorithm. An example code list file (CSV format) can be found here. 
  • Metadata (YAML file) describing each phenotyping algorithm. This should include a description of the algorithm and its implementation, data sources used, associated publications/authors etc. This information will be presented alongside the code list. An example of a submitted phenotype with metadata can be found here. 

Details of the metadata (YAML file) and code list specification for the Phenotype Library can be found on the Phenotype Library’s website here. 

Submission to the BHF Data Science Centre’s collection in the Phenotype Library 

We recommend submission via the interface, but API access is also available. 

Submission via the Phenotype Library creation interface 

  • Sign up to the Phenotype Library by submitting a request here
  • Access the creation interface here
    • Documentation on  using the creation interface for Phenotypes based on structured health data can be found here
  • Ensure the ‘collections’ field is populated with ‘BHF Data Science Centre’ and ‘Phenotype Library 

Submission via the Phenotype Library API  

  • Compile the metadata into YAML format (template and example metadata files are provided above) 
  • BHF Data Science Centre and Phenotype Library collections to be specified in YAML file (collection ID: 20 and 18) 
  • API Documentation page can be found here, and its reference data can be found here 
  • The Client packages are available in both Python and R: 
  • R Package Client is available here 
  • Python Client is available here 

Additional support 

Documentation on the Phenotype Library can be found both here and here 

Any questions or problems that occur during the submission process should be directed to the Phenotype Library, via their contact page. 

Sharing via GitHub

We recommend that researchers include information describing all phenotyping algorithms used in their research in any Github or similar code repository for their project. At a minimum this should include: 

  • A file citing the accession IDs (including version) or DOI of all phenotyping algorithms that are already included in the Phenotype Library or other repository. 
  • For any phenotyping algorithms not included in a repository this should include: 
    • A file containing the list of medical ontology terms/codes (code list) for each phenotyping algorithm. 
    • Metadata describing each phenotyping algorithm. 

At minimum this should include the information represented in the Phenotype Library, as described in the submission materials described above.