Systematic tissue annotations of genomic samples by modeling unstructured metadata


ISMB 2020 went virtual this year due to the COVID-19 pandemic, but that didn’t stop the science from happening. I was fortunate to be selected to give a poster presentation on my current research. In our work, we create word embeddings from sample metadata and use these as features for training logisitic regression classifiers. Our models predict annotations for tissue and cell type labels from the UBERON ontology on the basis of text alone. Our approach outperforms two other classes of text-based annotation methods. While we do not outperform similarly tasked models trained from gene expression features, our approach can be used on novel data types without needing to retrain.

An overview of our work currently being drafted for publication is detailed in my poster. Download it here.