Zheng Xia, Ph.D., originally set his sights on a career in engineering.
His leap into cancer biology came after an experience working on a machine-learning project at Houston Methodist, the academic health center in Texas, where each day he’d see sick patients and worried families in the hallways.
Though his training had nothing to do with biology or cancer, he says, “I felt it very worthy of my effort to go into this field.”
The year was 2007, and Xia had heard about rapid advances in gene-reading technologies, such as microarrays, able to measure the activity levels of thousands of genes in one go. The torrent of data from such experiments was overrunning traditional methods of biological analysis.
Xia was convinced that his expertise could help cancer researchers pinpoint answers. He was correct.
Now, Xia is an associate professor of biomedical engineering in the OHSU School of Medicine and a member of the OHSU Knight Cancer Institute. His lab develops bioinformatics tools, including machine-learning systems, that can interpret biomedical data sets beyond the scale of direct human comprehension. He and his collaborators have multiple publications in high-impact journals such as Nature, Nature Biotechnology and Clinical Cancer Research. He is a principal investigator on three and a co-investigator on nine major R01-level grants from the National Institutes of Health and other external funding sources.
Working hand-in-hand with biologists and clinicians has been key.
“For me, it’s very important to be guided by significant biological or clinical questions, but because my background is engineering, I don’t know the clinically important questions,” Xia says. “Through collaborations, I learn how to think about the problems, how biologists generate hypotheses. And while we answer the questions our collaborators are interested in, those questions motivate us to think of new tools to analyze the data from different angles.”
Working with Amy Moran, Ph.D., an associate professor of cell, developmental and cancer biology in the OHSU School of Medicine, Xia helped to glean more insights from single-cell analysis of RNA, the genetic messenger molecule transcribed from active genes. The research, led by Moran, revealed how the sex hormone androgen, most commonly testosterone in men, can limit the body’s response to cancer immunotherapy — a finding that may help make those therapies more consistently effective.
disease behaviors
The project inspired Xia and members of his lab to develop a powerful computational tool called Scissor, which can zero in on the subpopulations of cells within a tumor that are driving important disease behaviors, such as the ability to resist anticancer treatments. The method expands the reach of single-cell analysis, which has been limited by small sample sizes, resulting in inadequate statistical power to answer some vital questions about tumors.
Scissor connects single-cell findings with clinical outcomes data available at the whole-tumor level in public databanks, such as The Cancer Genome Atlas. In a demonstration of Scissor’s usefulness, the researchers identified an aggressive cancer cell subpopulation in lung adenocarcinoma tumors — the most common type of lung cancer — that was associated with worse survival outcomes. They had single-cell data from just two cancer patients, but clinical outcomes data from 471 patient samples in The Cancer Genome Atlas. The ability to identify which cell subpopulations are responsible for drug response, tumor progression and spread of cancer could help reveal the mechanisms and point the way to better-targeted therapies.
More recently, Xia and colleagues developed PENCIL, a pioneering machine-learning model that can use single-cell gene activity data to not only pick out important subpopulations of cancer cells, but also reveal subpopulations of cells that are making a continuous transition between conditions, such as the cells that are transforming from normal to malignant.
PENCIL uses a strategy called “learning with rejection.” In traditional, supervised machine learning, the computer model must pick a label to predict a new sample from the labels it was provided for model training. Learning with rejection gives machine-learning systems the ability and freedom to reject low-confidence decisions by allowing them to say, “I do not know.”
“This is a very new idea,” Xia says. “I think we are the first to apply learning with rejection in biomedical research.”
By rejecting irrelevant cancer cells and focusing on the relevant ones, PENCIL can recognize gene signals missed by standard models and give more accurate predictions. The strategy, Xia says, “aligns with what Confucius once said: ‘It is true wisdom to say we do not know the answer to a question about which we are unsure, and to only respond to questions about which we have high confidence.’
“Everybody can make predictions with machine learning, but in medicine, it’s more about high-confidence predictions because our prediction may impact treatment decisions for our patients,” he continues. “If we predict this patient is responding to a specific drug, it must be high-confidence.”
oncoGPT
Now Xia is working on a project he’s given the nickname oncoGPT, after the famous text-generating artificial intelligence tool ChatGPT. Trained on a huge volume of publicly available texts, ChatGPT produces coherent and grammatically correct written content.
Xia’s oncoGPT will work in an analogous way by training on publicly available databases with gene expression data from many millions of cancer cells that can be associated with factors such as immune response, treatment resistance and patient survival. The project received a 2024 OHSU Faculty Excellence and Innovation Award sponsored by the Silver Family Innovation Fund.
“This will be a foundation model for facilitating our own clinical data analysis, which may have a limited sample size,” he says.
For example, the model could be used to help predict the most effective targeted therapy drug or combination of drugs for patients in the Knight Cancer Institute’s SMMART program, in which researchers have organized multiple technologies to analyze each person’s tumor in great detail and track how cancer cells evolve over time in response to treatment.
Given the breathtaking pace of advances in artificial intelligence, Xia feels both excited about the opportunities and anxious about keeping up. But, he says, “I feel there’s never been a better time to be doing big-data analysis and machine learning for translational cancer research — the DNA of the Knight Cancer Institute.”