Phenotyping algorithms
All phenotyping algorithms created or used in research we support are shared via the HDR UK Phenotype Library and on each projects Github repository in accordance with our Publication and Dissemination Policy and the CVD-COVID-UK Ways of Working.
You can access all BHF Data Science Centre phenotypes in the Phenotype Library here
We encourage all developers and users of phenotyping algorithms to:
- Share their phenotyping algorithms publicly and freely, ideally via an open access repository.
- Follow best practices for curation by annotating phenotyping algorithms with the data and metadata in our recommended list.
You can read our full recommendations here.
Submitting to the Phenotype Library
The Phenotype Library is an openly–accessible, searchable repository of electronic health record phenotyping algorithms. The Phenotype Library contains structured data and metadata describing each phenotyping algorithm to ensure researchers can identify, interpret and re-use algorithms.
Preparation of submission materials
The following materials are required for submission:
- A file containing the list of medical ontology terms (e.g. code list of ICD-10 or SNOMED terms) for each phenotyping algorithm. An example code list file (CSV format) can be found here.
- Metadata (YAML file) describing each phenotyping algorithm. This should include a description of the algorithm and its implementation, data sources used, associated publications/authors etc. This information will be presented alongside the code list. An example of a submitted phenotype with metadata can be found here.
Details of the metadata (YAML file) and code list specification for the Phenotype Library can be found on the Phenotype Library’s website here.
Submission to the BHF Data Science Centre’s collection in the Phenotype Library
We recommend submission via the interface, but API access is also available.
Submission via the Phenotype Library creation interface
- Sign up to the Phenotype Library by submitting a request here
- Access the creation interface here
- Documentation on using the creation interface for Phenotypes based on structured health data can be found here
- Ensure the ‘collections’ field is populated with ‘BHF Data Science Centre’ and ‘Phenotype Library’
Submission via the Phenotype Library API
- Compile the metadata into YAML format (template and example metadata files are provided above)
- BHF Data Science Centre and Phenotype Library collections to be specified in YAML file (collection ID: 20 and 18)
- API Documentation page can be found here, and its reference data can be found here
- The Client packages are available in both Python and R:
- R Package Client is available here
- Python Client is available here
Additional support
Documentation on the Phenotype Library can be found both here and here
Any questions or problems that occur during the submission process should be directed to the Phenotype Library, via their contact page.
Sharing via GitHub
We recommend that researchers include information describing all phenotyping algorithms used in their research in any Github or similar code repository for their project. At a minimum this should include:
- A file citing the accession IDs (including version) or DOI of all phenotyping algorithms that are already included in the Phenotype Library or other repository.
- For any phenotyping algorithms not included in a repository this should include:
- A file containing the list of medical ontology terms/codes (code list) for each phenotyping algorithm.
- Metadata describing each phenotyping algorithm.
At minimum this should include the information represented in the Phenotype Library, as described in the submission materials described above.