treeArches

treeArches is a framework to automatically build and extend reference atlases while enriching them with an updatable hierarchy of cell-type annotations across different datasets.

treeArches consists of two main steps: (i) removing the batch effects between datasets and (ii) matching the annotated cell types to construct a cell-type hierarchy.

For details and the software, please visit the scArches GitHub page (as treeArches is part of the scArches repository) and read publication by Michielsen et al. (2023).

A schematic version of treeArches. (A) Pre-training of a latent representation using labeled public reference datasets. After integration, a cell-type hierarchy is created by matching the cell types of the different datasets. Here, for instance, cell types (CT) 1 and 2 from study (S) 2 are subtypes of CT1 from S1. (B) (Un)labeled query datasets can be added to the latent representation by applying architectural surgery. After integration, the cell-type hierarchy is updated with labeled query datasets. Unlabeled query datasets can be annotated using the learned hierarchy.