Image Quilting and 
Neural Style Transfer

Image Quilting

The natural successor to image quilting is using a neural network to transfer style to an image's content. This method was introduced in research paper "A Neural Algorithm of Artistic Style" from Gatys, Ecker, and Bethge.

The main idea behind this style transfer is that we are using a pre-trained network that is good at segmentation (in this case a VGG-11 network) in order to create a new image from two others: one that influences style (color, texture, etc.) and another that establishes content (the actual "things" segmentation detects) . Both the style and content images are pumped through our VGG network and their representations throughout the network (usually the first convolutional layer after a avg pool) is sampled. We sample the style and content image's representation thrroughout the network in order to compare the resulting image with the combined style and content. The style is taken at 5 convolutional layers and compared for loss using the SSD of the gram matrix of the style image and result image. The gram matrix helps define style features in an image rather than its per-pixel content. Content loss comparisons are sampled at one layer of the convolutional network and are direct SSD comparison with the result image. 

To the left is a VGG-11 network. The paper uses a VGG-19 network but I found this one works just as well despite its smaller size. In fact it works faster because it is smaller! The 5 convolution layers sampled are the very first "conv1_1", and then each convolutional layer after a max pooling (I changed max pool to avg pool in my implementation) "conv2_1", "conv3_1", "conv4_1", "conv5_1". The content layer to compare is a hyperparameter in my model among the "convX_1" layers listed above. The other hyperparameters are the weight of the style (the content loss weight was kept at 1) and then the learning rate of the LBFGS optimizer optimizing the image (default in pytorch is 1 which I kept it at). I also did not have a set number of iterations, rather opting to stop the optimization once its gradient differs by less than 1e-4 between iterations. 

The first method I tried was starting with a randomized image as the result then optimizing on it. Unfortunately I don't think this works very well for the images I'm going for as seen in the grid of results before. 

Content

Style

Initialization

The following results are ordered by style weight on the x-axis (content weight is 1) and the content convolutional layer compared on y-axis.

10^2

10^3

10^4

10^5

conv1_1

conv2_1

conv3_1

conv4_1

As you can see, the results are very muddy and don't come out as precise as I would like. In turn, I switched to starting with the content image to optimize on. This lead to much more artistically interesting results in my opinion.

10^2

10^3

10^4

10^5

conv1_1

conv2_1

conv3_1

conv4_1

conv5_1

As expected, the deeper your content layer is the less defined the shapes become, and as style's weight increases there is a very noticable effect.

Results

Here is a selection of some of my favorite results.

An interesting thing to note in the above image is that the last image's skin was properly colorized to the peachy pale of human skin. I'm not sure why this happened but possibly the segmentation VGG network matched the skin of Van Gogh to the skin of the man

I also tried to do the sketch-feynman using this method, I think it's more cohesive and recognizable but, ironically, I think the image quilting does a better job of being in the "style" of the sparse sketch.