Yuhang He's Blog

Some birds are not meant to be caged, their feathers are just too bright.

TensorFlow: Resizing Image by Keeping Aspect Ratio

It is very common to resize image by keeping aspect ratio in vision community. The python relevant code is intuitive and simple, which looks like:

1
2
3
4
5
6
7
8
9
10
11
12
import cv2
img_tmp = cv2.imread( img_name, 1 )
img_h, img_w, _ = img_tmp.shape
max_len = 1024
if max( img_h, img_w ) > max_len and img_h > img_w:
	new_h = max_len
	new_w = int( (new_h * img_w )/float( img_h ) )
	img_tmp = cv2.imresize( img_tmp, ( new_h, new_w ) )
elif max( img_h, img_w ) > max_len and img_w > img_h:
	new_w = max_len
	new_h = int( (new_w * img_h )/float( img_w ) )
	img_tmp = cv2.imresize( img_tmp, ( new_h, new_w ) )

The if conditional statement naturally navigates us to do what we expect to do. As a result, when it comes to TensorFlow, we are prone to achieve the same goal to write the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import tensorflow as tf
import os
os.putenv("CUDA_VISIBLE_DEVICES","")
g = tf.Graph()
with g.as_default():
	img_raw_data = tf.gfile.FastGFile('1.jpg','r').read()
	img_tmp = tf.image.decode_jpeg( img_raw_data )
	img_tmp = tf.image.convert_image_dtype( img_tmp, dtype = tf.uint8 )
	img_shape = tf.shape( img_tmp )
	img_h = img_shape[0]
	img_w = img_shape[1]
	max_len = 1024
	if max( img_h, img_w ) > max_len and img_h > img_w:
		new_h = max_len
		new_w = int( (new_h * img_w )/float( img_h ) )
		img_tmp = tf.image.resize_images( img_tmp, ( new_h, new_w ) )
	elif max( img_h, img_w ) > max_len and img_w > img_h:
		new_w = max_len
		new_h = int( (new_w * img_h )/float( img_w ) )
		img_tmp = tf.image.resize_images( img_tmp, ( new_h, new_w ) )
	img_resized = img_tmp
with tf.Session( graph = g ) as sess:
	init_op = tf.group( tf.global_variable_initializer(), tf.local_variable_initializer() )
	sess.run( init_op )
	img_resized_val = sess.run( img_resized )

Before it runs as you expected, the following bug would be thrown out:

raise TypeError(“Using a tf.Tensor as a Python bool is not allowed. “. TypeError: Using a tf.Tensor as a Python bool is not allowed. Use if t is not None: instead of if t: to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.

It seems TensorFlow does not support python-like if numeric comparison. Instead, we must turn to tf.cond function to solve this problem. First, let’s take a look at part of the tf.cond official doc:

1
2
3
4
5
cond(
	pred, # A scalar determining whether to return the result of true_fn or false_fn
	true_fn, #The callable function to be executed if pred if true
	false_fn, #The callable function to be executed if pred if false
)

With is in mind, tf.cond leaves us three things to do:

  • pred, a boolean function determining which side of image to resize. Usually tf.greater(), tf.less() are the choices. If the combination of several unary predictors together is necessary, we might need to use tf.logical_and()
  • true_fn, please note that the parameter true_fn is a callable function, not the return value of a function.
  • false_fn, the same as true_fn

We naturally want true_fn() returns the resized image tensor while false_fn() returns the original image. Thus, the following conversion is just what we want:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
max_len = 1024
if max( img_h, img_w ) > max_len and img_h > img_w:
	new_h = max_len
	new_w = int( (new_h * img_w )/float( img_h ) )
	img_tmp = cv2.imresize( img_tmp, ( new_h, new_w ) )
#convert to the following code
def resize_img_tensor_accord_h( img_tensor ):
	new_h = tf.constant( 1024, dtype = tf.int32 )
   	new_w = tf.cast( tf.div( tf.multiply( new_h, img_w ), img_h ), dtype = tf.int32  )
    img_tensor = tf.image.resize_images( img_tensor, ( new_h, new_w ) )
    return lambda : img_tensor
img_tensor = tf.cond( pred = tf.logical_and( tf.greater( img_h, max_len ), tf.greater( img_h, img_w )), \
                      true_fn = resize_img_tensor_accord_h( img_tensor ), \
                      false_fn = lambda : img_tensor )

Similarily, the other if conditional statement can be rewritten as:

1
2
3
4
5
6
7
8
9
10
11
12
13
elif max( img_h, img_w ) > max_len and img_w > img_h:
	new_w = max_len
	new_h = int( (new_w * img_h )/float( img_w ) )
	img_tmp = cv2.imresize( img_tmp, ( new_h, new_w ) )
#converted to the following code
def resize_img_tensor_accord_w( img_tensor ):
    new_w = tf.constant( 1024, dtype = tf.int32 )
    new_h = tf.cast( tf.div( tf.multiply( new_w, img_h), img_w ), dtype = tf.int32 )
    img_tensor = tf.image.resize_images( img_tensor, (new_h, new_w) )
	return lambda : img_tensor
img_tensor = tf.cond( pred = tf.logical_and( tf.greater( img_w, 1024 ), tf.greater( img_w, img_h ) ), \
                              true_fn = resize_img_tensor_accord_w( img_tensor ), \
                              false_fn = lambda : img_tensor )

Please pay careful attention to lambda, which guarantees the return of resize_img_tensor_accod_x() is a function. Otherwise extra errors will come out.

Farewell Word

It seems TensorFlow holds large difference with Python language. Spending more time to read the official doc and sample code becomes necessary!