Have you ever wondered how the app Prisma manages to turn your photos into impressionist paintings? They use, amongst
other things, a special type of algorithm called neural styles.
Neural styles are a special type of algorithm that combines the content of one
image with the style of another using deep neural networks. It was first
introduced in a famous paper by LA Gatys
and team at the University of Tubingen, Germany. Here they demonstrate how one
can use a class of deep neural networks to extract features from any “style” image
and subsequently “apply” them onto any “content” image.
The class of deep neural networks that are most powerful for image processing
tasks are called Convolutional Neural Networks (CNN).
It consists of a series of layers, which act as image filters. Each filter
extracts a feature from the input image. This series of layers form a model, or a network, that describes the transformations from the input image to the output features. The model used for this particular exercise was the VGG Network, which is a popular model used for object recognition tasks.
As mentioned earlier, we have two images: a style image, from which we extract
features, and a content image, on which these features are applied.
Here’s the content image, a photograph of the Taj Mahal in Agra, India.
Here’s the style image, one of the famous Water Lilies by the French impressionist, Claude Monet. We shall extract features from this particular image, and then apply them to our content image.
When we train our model on these two images, learning the styles from the style image, and appying them to the content image, we obtain the following output – the Taj Mahal drawn in the style of the lilies.
Notice that the structural features of the content image (or in other words, the
borders of the building) have been preserved. We find that a kind of texture has
been extracted from the style image and applied to the content image.
We can also load pre-trained models. In fact, this is what most mobile apps do.
The picture we take on our camera is our content image. The app then takes our content
image and performs a single forward pass on the model, “applying” a texture onto it.
We can see this with another content image. This is an artist’s impression of Kvothe,
one of my favorite fictional characters.
We have two pretrained models that are meant to provide two kinds of textures:
fire and frost. We pass the content image into our pretrained model
and obtain the following results, which shows the original image with two different styles applied.
The code for the above exercise can be found here.
We observe fast training times thanks in large part to optimized convolution
kernels on the GPU. Using the MXNet deep learning library, Julia makes it very easy
to perform these operations on a GPU. The following chart shows that the benefits of using the GPU are very large indeed!
We performed this excercise on a a IBM PowerNV 8335-GCA server, which has 160 CPU
cores, and a Tesla K80 (dual) GPU accelerator.
It would be nice to make stylized videos, such as this.
This would involve taking a pretrained model and passing every frame of the video
into a pretained model, thereby generating a new video. We will write about that work in subsequent blog posts.
Need help with Julia?
We also provide training and consulting services
and build open source or proprietary packages
for our customers on a consulting basis. Email us:
Julia Computing's mission is to create and deliver products that make Julia easy to use, easy to deploy and easy to scale. We operate out of Boston, London and Bangalore, and we serve customers worldwide.
© 2016 - 2020 Julia Computing, Inc. All Rights Reserved.