The second year of the our programme involves a 15 credit course on current trends in health informatics. Basically, this is a research course that runs all throughout the fall semester. This year, there are four tracks for us to chose from:
1. Consumer Health Informatics
2. ICT4D (Information and Communication technology for developing countries)
3. Visualization techniques (Virtual Reality, Augmented Reality)
4. Machine learning
The course is subdivided into a 5 credits individual task (a literature review) and a 10 credit group work (poster presentation + short paper).
Of course I am highly biased towards the topic I chose, which is “Machine learning in Health Informatics”. We are supervised by Panagiotis Papapetrou, who is a professor at Stockholm University. He is associated with DSV which is the abbreviation for data- och systemvetenskap. (Quick reminder: Health informatics is a joint programme by Karolinska Insitutet and Stockholm University).
Thus, I’ll start by introducing what this topic can include research wise and what the groups in track 4 have now started to work on.
What is machine learning?
As a data scientist, I would like to uncover knowledge from data sources. You must have heard of “big data” by now and that we have a lot of data in the medical field. These data sources can involve many things i.e. medical images, EHR data, biological data sources, and even data from self-tracking patients (blood sugar levels, calorie intake etc.). Thus, by using machine learning techniques I want to learn hidden patterns in such data. I want to make all the data I have usable and construct models that I might then exploit to give me prediction. On the basis of a given data input, I might want a model to tell me a simple answer e.g. “Heart disease present” and “No signs of heart disease”. These problems are what computer scientist call a binary classification problem (0’s and 1’s). Of course, the complexity of what I want a model’s output to be can increase.
What areas of machine learning are there?
One can subdivide the area of machine learning into different lines of research.
1. Supervised learning (methods: random forests, support vector machines, deep neural networks)
2. Unsupervised learning (methods: clustering, pattern recognition)
3. Reinforcement learning
What are the projects that you chose to work on this semester?
This year we split up into three groups. My group will be working with image analysis and neural networks, i.e. we want to construct a model that takes an chest x-ray as an input and then tells me for example “I have analyzed this x-ray image and I am 90% sure the patient has pneumonia”. The second group has decided to tackle the topic of interpretability in regards to machine learning models, i.e. a method called LIME (Local Interpretable Model-agnostic Explanations). This is based on the fact that today many machine learning models are seen as black boxes and it is unclear as to how a model comes up with a prediction. The third group will focus on machine learning with ICD codes, i.e. they want to use ICD codings to predict the length of stay in an ICU. I will be back to present the outcomes of the groups at the end of the semester!
How can I get a first glimpse into what you’re working with?
If you read this far and are still interested, then ask no more. Below are some scientific articles I can recommend to get a start on this topic. In general, the field of machine learning in health informatics is huge. I always tell myself, that basically anything is possible if you have good data to work with (and the ethical approval!).
1. Jonathan Rebane, Isak Karlsson, and Panagiotis Papapetrou, “An Investigation of Interpretable Deep Learning for Adverse Drug Event Prediction“. In the IEEE International Symposium on Computer-Based Medical Systems (CBMS) 2019
2. Shen D, Wu G, Suk HI, “Deep Learning in Medical Image Analysis“. Annu Rev Biomed Eng. 2017;19:221–248.
3. Bai, S. Zhang, B. L. Egleston, and S. Vucetic, “Interpretable representation learning for healthcare via capturing disease progression through time“. In Proceedings of the ACM International Conference on Knowledge Discovery & Data Mining (KDD), 2018, pp. 43–51
4. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin, “Why Should I Trust You?“: Explaining the Predictions of Any Classifier“. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)