Leo's Home page -- Github Page -- License: CC BY-SA 4.0
This notebook presents a study I did in 2018 on Multi Resolution Convolutional Autoencoders where the input image is scaled down and passed as input, and the latent space is composed of the combined latent space of the different Convolutional Encoders.
This experience was meant to be the first step into exploring foveal-like perception but I never created the agent to deal with it (too complex for me at that moment with the knowledge, resources and time available).
The experiment was successful in the sense that I learned about image autoencoders and managed to create different versions.
I leave this code available, the only modifications are just some code adaptations to make it work with pytorch v1.7 as there were a couple of deprecated things.
All the code is available at minibrain
Feel free to play with it if you want to.
import torch
import torchvision
from torch import nn, optim
from torch.nn import functional as F
from torch.autograd import Variable
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms, utils
from torchvision import datasets
from torchvision.utils import save_image
# import skimage
import math
# import io
# import requests
# from PIL import Image
import numpy as np
import pandas as pd
# import matplotlib.pyplot as plt
import sys
import os
import cae
from helpers import *
from helper_modules import *
from multi_res_cae import *
There are some tricks in jupyter to do autoreload of the modules when they are modified
%load_ext autoreload
%autoreload
%aimport helpers, helper_modules, multi_res_cae
%aimport
Modules to reload: helper_modules helpers multi_res_cae Modules to skip:
The current notebook only contains the parameters I've chosen at the end, but there were many tests done, some examples are left as comments because they give more information on what was tested.
# Hyper Parameters
# num_epochs = 5
# batch_size = 100
# learning_rate = 0.001
num_epochs = 20
batch_size = 128
learning_rate = 0.0001
%%time
#%time model = MultiFullCAE(in_img_shape=(32,32), full_image_resize=(24,24)).cuda()
model = MultiResCAE(in_img_shape=[32,32], channels=3, conv_layer_feat=[16, 32, 64],
res_px=[[24, 24], [16, 16], [12, 12]], crop_sizes=[[32, 32], [24,24], [12, 12]],
# conv_sizes = [(3,5,7), (3,5,7,11), (3,5,7,11)] # this is too much I think
# conv_sizes=[[1, 3, 5], [1, 3, 5], [1, 3, 5, 7]] # test b
# conv_sizes=[[5, 7, 11], [3, 5, 7, 9], [1, 3, 5]] # test c
conv_sizes=[[5, 7], [3, 5, 7], [1, 3, 5]] # test d
).cuda()
CPU times: user 1.64 s, sys: 516 ms, total: 2.15 s Wall time: 2.18 s
# model.parameters
%%time
criterion = nn.MSELoss()
#criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)
CPU times: user 1.28 ms, sys: 0 ns, total: 1.28 ms Wall time: 1.29 ms
def to_img(x):
x = 0.5 * (x + 1)
x = x.clamp(0, 1)
x = x.view(x.size(0), 3, 32, 32)
return x
%%time
#transformation = monochrome_preprocess(32,32)
transformation = fullimage_preprocess(32,32)
#train_loader, test_loader = get_loaders(batch_size, transformation, dataset=datasets.CocoDetection)
train_loader, test_loader = get_loaders(batch_size, transformation)
Files already downloaded and verified CPU times: user 736 ms, sys: 179 ms, total: 914 ms Wall time: 921 ms
%%time
for epoch in range(num_epochs):
for i, (img, labels) in enumerate(train_loader):
img = Variable(img).cuda()
# ===================forward=====================
# print("encoding batch of images")
output = model(img)
# print("computing loss")
loss = criterion(output, img)
# ===================backward====================
# print("Backward ")
optimizer.zero_grad()
loss.backward()
optimizer.step()
# ===================log========================
print('epoch [{}/{}], loss:{:.4f}'.format(epoch+1, num_epochs, loss.data))
if epoch % 4 == 0:
pic = to_img(output.cpu().data)
in_pic = to_img(img.cpu().data)
save_image(pic, './mrcae_results/e_in-32x32_1-3-5_7-out_image_{}.png'.format(epoch))
save_image(in_pic, './mrcae_results/e_in-32x32_1-3-5_7-in_image_{}.png'.format(epoch))
# if loss.data[0] < 0.35: #arbitrary number because I saw that it works well enough
# break
epoch [1/20], loss:0.5369 epoch [2/20], loss:0.5012 epoch [3/20], loss:0.3744 epoch [4/20], loss:0.3535 epoch [5/20], loss:0.4147 epoch [6/20], loss:0.4017 epoch [7/20], loss:0.3398 epoch [8/20], loss:0.3981 epoch [9/20], loss:0.3920 epoch [10/20], loss:0.3525 epoch [11/20], loss:0.3438 epoch [12/20], loss:0.3502 epoch [13/20], loss:0.3335 epoch [14/20], loss:0.3064 epoch [15/20], loss:0.3982 epoch [16/20], loss:0.3694 epoch [17/20], loss:0.3668 epoch [18/20], loss:0.3373 epoch [19/20], loss:0.3481 epoch [20/20], loss:0.3426 CPU times: user 12min 44s, sys: 3min 7s, total: 15min 51s Wall time: 15min 52s
!ls mrcae_results
e_in-32x32_1-3-5_7-in_image_0.png e_in-32x32_1-3-5_7-out_image_0.png e_in-32x32_1-3-5_7-in_image_12.png e_in-32x32_1-3-5_7-out_image_12.png e_in-32x32_1-3-5_7-in_image_16.png e_in-32x32_1-3-5_7-out_image_16.png e_in-32x32_1-3-5_7-in_image_4.png e_in-32x32_1-3-5_7-out_image_4.png e_in-32x32_1-3-5_7-in_image_8.png e_in-32x32_1-3-5_7-out_image_8.png
#torch.save("fmrcae_in-64x64_32x32_3-5-7-11.pth", model)
#torch.save("mrcae_in-32x32_.pth", model)
Input and output of the first epoch
Input and output of the last saved epoch