Deep Learning is now ubiquitous in the machine learning world. In this blog post, we explore the use of Julia for deep learning experiments on IBM POWER8 + NVIDIA GPUs.
We shall demonstrate:
The ease of specifying deep neural network architectures in Julia and
visualizing them. We use
MXNet.jl, a Julia package for deep learning.
The ease of running Julia on Power Systems. We ran all our experiments on an PowerNV 8335-GCA, which has 160 CPU cores, and a Tesla K80 (dual) GPU accelerator. IBM and OSUOSL have generously provided us with the infrastructure for this analysis.
Deep neural networks have been around since the 1940s, but have only recently been deployed in research and analytics because of strides and improvements in computer technology and computational horsepower. Neural networks have a wide range of applications in machine learning: vision, speech processing, and even self driving cars. An interesting use case for neural networks could be the ability to drive down costs in medical diagnosis. Automated detection of diseases would be of immense help to doctors, especially in places around the world where access to healthcare is limited.
Diabetic retinopathy is an eye disease brought on by diabetes. There are over 126.2 million people in the world (as of 2010) with diabetic retinopathy, and this is expected to rise to over 191.2 million by 2030. According to the WHO in 2006, it accounted for 5% of world blindness.
Hence, early and automatic detection of diabetic retinopathy would be desirable. To that end, we took up an image classification problem using real clinical data. This data was provided to us by Drishti Care, which is a social enterprise that provides affordable eye care in India. Dhristi Care CEO Kiran Anandampillai explains that “India is home to 62 million diabetics, many of whom live in rural areas with limited access to health facilities. Timely screening for changes in the retina can help get them to treatment and prevent vision loss. Julia Computing’s work using deep learning makes retinal screening an activity that can be performed by a trained technician using a low cost fundus camera.”
We obtained eye fundus images for a number of patients. The eyes affected by retinopathy are generally marked by inflamed veins and cotton spots. The following picture on the left is a normal fundus image whereas the one on the right is affected by diabetic retinopathy.
We built MXNet from source with CUDA and OpenCV. This was essential for training our networks on GPUs with CUDNN, and reading our image record files. We had to build GCC 4.8 from source so that our various libraries could compile and link without error, but once we did, we were set up and ready to start working with the data.
We chose to run this experiment on an IBM Power System which features high performance, large caches, high memory and I/O bandwidth, tight integration with GPU accelerators and parallel multi-threaded Power architecture that is well adapted to ensure that GPUs are used to their fullest potential.
The idea is to train a deep neural network to classify all these fundus images into infected and uninfected images. Along with the fundus images, we have at our disposal a number of training labels, identifying if the patient is infected or not.
We used MXNet.jl, a Julia package for deep learning. As a first step, it’s good to load a pretrained model which is known to be good at classifying images. So we decided to download and use the ImageNet model called Inception with weights in their 39th epoch. On top of that we specify a simple classifier.
## Extend model as we wish arch = mx.@chain mx.get_internals(inception)[:global_pool_output] => mx.Flatten() => mx.FullyConnected(num_hidden = 128) => mx.Activation(act_type=:relu) => mx.FullyConnected(num_hidden = 2) => mx.WSoftmax(name = :softmax)
And now we train our model:
mx.fit( model, optimizer, dp, n_epoch = 100, eval_data = test_data, callbacks = [ mx.every_n_epoch(save_acc, 1, call_on_0=false), mx.do_checkpoint(prefix, save_epoch_0=true), ], eval_metric = mx.MultiMetric([mx.Accuracy(), WMultiACE(2)]) )
One feature of the data is that it is highly imbalanced. For every 200 uninfected images, we have only 3 infected images. One way of approaching that scenario is to penalize the network heavily for every infected case it gets wrong. So we replaced the normal Softmax layer towards the end of the network with a weighted softmax. To check whether we are overfitting, we decided to have multiple performance metrics.
However, from our cross-entropy measures, we found that we were still overfitting. With fast training times on dual GPUs, we were able to train our model quickly and understand the drawbacks of our current approach.
Therefore we decided to employ a different approach.
The second way to deal with our imbalanced dataset is to generate smaller, more balanced datasets that contained roughly equal numbers of uninfected images and infected images. We produced two datasets: one for training and another for cross validation, both of which had the same number of uninfected and infected patients.
Additionally, we also decided to shuffle our data. Every epoch, we resampled the uninfected images from the larger pool of uninfected images (and they were many in number) in the training dataset to expose the model to a range of uninfected images so that it can generalize well. Then we started doing the same to the infected images. This was quite simple to implement in Julia: we simply had to overload a particular function and modify the data.
Most of these steps were done incrementally. Our Julia setup and environment made it easy for us to quickly change code and train models and incrementally add more tweaks and modifications to our models as well as our training methods.
We also augmented our data by adding low levels of Gaussian noise to random images from both the uninfected images and the infected images. Additionally, some images were randomly rotated by 180 degrees. Rotations are quite ideal for this use case because the important spatial features would be preserved. This artificially expanded our training set.
The following code augments the infected images. In the following code segment, good images refer to uninfected images and bad images refer to infected images.
function mx.eachbatch(p::ShuffleDataProvider) # Find positions of all good/bad images gidx = find(x -> x == 0, vec(p.label_array)) bidx = find(x -> x == 1, vec(p.label_array)) # Generate indices of good/bad images from global pool goodidx = rand(1:size(good["data"], 4), length(gidx)) badidx = rand(1:size(bad["data"], 4), length(bidx)) # Add random noise to bad images or flip them for i = 1:length(bidx) flipping = rand(Bool) noise = rand(Bool) b = bad["data"][:, : , :, badidx[i]] if noise for dim = 1:3 b[:,:,dim] = make_noise(b[:, :, dim], 0.2, 0) end end if flipping for dim = 1:3 b[:,:,dim] = flip(b[:, :, dim]) end end p.data_array[:, bidx[i]] = vec(b) end p end
However, we found that while these measures stopped our model from overfitting, we could not obtain adequate performance. We explore the possible reason for this in the subsequent section.
The initial challenge we faced was that our data is imbalanced, and so we experimented with penalizing incorrect decisions made by the classifier. We tried generating a balanced (yet smaller) dataset in the first place and then it turned out that we were overfitting. To counter this, we performed the shuffling and data augmentation techniques. But we didn’t get much performance from the model.
Why is that so? Why is it that a model as deep as Inception wasn’t able to train effectively on our dataset?
The answer, we believe, lies in the data itself. On a randomized sample from the data, we found that there were two inherent problems with the data: firstly, there are highly blurred images with no features among both the healthy and the infected retinas.
Secondly, there are some features in the healthy images that one might find in the infected images! For instance, in some images the veins are somewhat puffed, and in others there are cotton spots. Below are some examples. While we note that the picture on the left is undoubtedly infected, notice that one on the right also has a few cotton spots and inflamed veins. So how does one differentiate? More importantly, how does our model differentiate?
So what do we do about this? For the training set, it would be helpful to have each image, rather than each patient independently diagnosed as healthy or infected by a doctor or by two or more doctors working independently. This would likely improve the model’s predictions.
Julia provides a distinct advantage at every stage for scientists engaged in machine learning and deep learning.
First, Julia is very efficient at preprocessing data. A very important first step in any machine learning experiment is to organize, clean up and preprocess large amounts of data. This was extremely efficient in our Julia environment, which is known to be orders of magnitude faster in comparable environments such as Python.
Second, Julia enables elegant code. Our models were chained together using Julia’s flexible syntax. Macros, metaprogramming and syntax familiar to users of any technical environement allows for easy-to-read code.
Third, Julia facilitates innovate. Since Julia is a first-class technical computing environment, we can easily deploy the models we create without changing any code. Julia hence solves the famous “two-language” problem, by obviating the need for different languages for prototyping and production. This leads to significant productivity gains and shortening of innovation cycles.
Due to all the aforementioned advantages, we were able to complete these experiments in a very short period of time compared with other comparable technical computing environments.
We have demonstrated in this blog post how to write an image classifier based on deep neural networks in Julia and how easy it is to perform multiple experiments. Unfortunately, there are challenges with the dataset that required more fine-grained labelling. We have reached out to appropriate experts for assistance in this regard.
Users who are interested in working with the dataset and would be interested in possibly collaborating with us on this are invited to reach out via email at ranjan at juliacomputing.com to discuss access to the dataset.
I should thank a number of people for helping me with this work: Valentin Churavy and Pontus Stenetorp for guiding and mentoring me, and Viral Shah of Julia Computing. Thanks to IBM and OSUOSL too for providing the hardware, as well as Drishti Care for providing the data.