Systematic tissue annotations of genomic samples by modeling unstructured metadata


I was fortunate to be selected to give a poster presentation on my current research. In our work, we create word embeddings from sample metadata and use these as features for training logisitic regression classifiers. Our models predict annotations for tissue and cell type labels from the UBERON ontology on the basis of text alone. Our approach outperforms two other classes of text-based annotation methods. While we do not outperform similarly tasked models trained from gene expression features, our approach can be used on novel data types without needing to retrain. And yes, this is the same work I presented as ISMB 2020, but the constraints for posters in this conference were a little looser so I could put more content on there. Also, in the time between ISMB and Genome Informatics, some new analyses were done, and I was excited to be able to include that work as well.

You can download my poster here.