Different diseases have different methods for testing if a person has the disease or not. immuneML, a new open source machine learning platform, can potentially look for a lot of diseases in just one blood sample.
“Once you know how it works for one disease, it can be very easy to make diagnostic tools for other types of diseases as well,” says Lonneke Scheffer.
“From a blood sample, we hope to be able to diagnose if a person has a disease or not. On the individual receptor level we want to see if that one specific receptor is specific to corona or to something else,” says Milena Pavlovic.
Scheffer and Pavlovic are Doctoral Research Fellows at Department of Informatics at the University of Oslo. They are part of the Research Group for Biomedical Informatics, where they have developed immuneML.
You can also read this article in Norwegian
Predicting whether receptors bind to corona
B and T cells in our immune system all have small receptors on the surface. There are millions of different receptors.
“The receptors have a certain 3D shape that make them able to stick to different antigens,” Scheffer says to Titan.uio.no.
“If we analyse these receptors using machine learning, we hope to be able to say what each of those receptors are specific for; what specific disease, what specific virus or bacteria, even cancer and autoimmunity,” Pavlovic says.
To do this using machine learning, they need to translate the 3D shaped receptors into a mathematical language, into mathematical representations. The receptors are proteins, and all proteins have their own blueprint in our DNA.
“Then we are looking at a flat line of little letters. These DNA sequences is what we get in, what we can translate to protein sequences in the computer. To predict if the receptor binds to corona or not, you really just look at one piece of text, and you want to predict on the basis of this text, does it bind to corona or not,” Scheffer says.
A repertoire of immune receptors
Scheffer and Pavlovic are not just looking at individual receptors and to which antigens they may bind. They also want to analyse the whole collection of receptors someone has in their body, what is called the adaptive immune receptor repertoire (AIRR).
“What is interesting and unique about this AIRR data is that it can potentially work for a lot of different diseases. It is a generalized method.”
“These repertoires are extremely diverse. This is also the reason why they are difficult to analyse, because these repertoires have really large amounts of different immune receptors inside them, which is also quite different from person to person,” Scheffer says.
Their machine learning models can look for patterns in these repertoires and make their predictions.
“Basically, what the machine learning models do based on the representation, is to find patterns that appear in that representation, that will be useful to predict the task of interest,” Pavlovic says.
“We use this platform to learn patterns that bind to gluten, which is relevant for coeliac disease. If someone has a dataset on corona, they can use immuneML on that,” Scheffer says.
Open source platform
immuneML is an open source platform. Anyone can use it. Scheffer and Pavlovic have made tutorials for people who are not programmers like them.
“The point of immuneML is to have a workspace for someone who has this kind of immunology data and who wants to find out what kind of machine learning methods would work best on their dataset,” Scheffer says.
“We hope that it will encourage people to develop new tools that will also be open source and shared with the research community, so that it can improve our understanding of how the immune system actually recognizes the disease,” Pavlovic says.
Very promising so far
Our knowledge about immune receptors is increasing rapidly, but it is a fairly new field of research. It is around ten years behind the DNA analyses that map which parts of the genetic material are important for various diseases.
“The main limitation of a genetic test is that it can only inform on a person's risk for developing a disease,” says Professor Geir Kjetil Sandve.
“The immune receptors, on the other hand, show responses to already ongoing disease processes. They do not merely tell you about an increased risk of a disease. They can tell you that a given disease is already developing in your body, and that you will probably notice the symptoms in a few years,” Sandve says to Titan.uio.no.
He believes that immuneML could play an important role for the further development of the field where machine learners meet immunologists.
“Without immuneML, machine learning researchers around the world would spend a lot of time developing their own solutions to many of the same basic problems, wasting time and ending up with completely incompatible tools. If the field is to gain momentum, we must be able to effectively compare and integrate ideas across groups.”
“We have already seen that other research groups use immuneML, and several groups say that they want to have their own developments integrated with our platform. So far it seems very promising,” Sandve says.
Scientific article:
Pavlovic, Scheffer et.al: immuneML: an ecosystem for machine learning analysis of adaptive immune receptor repertoires, Nature Machine Intelligence, November 2021.
This article was first published at Titan, University of Oslo research news within science and technology.