einops.repeat, rearrange, reduce优雅地处理张量维度_综合

我总是搞不清啥时候dim=0或者dim=1，总会搞混，刚好在阅读Vision Transformer代码时，看到有人用einops，于是百度了一下，发现这个东西真的很好用！
大家可以参考这两篇帖子对照着学习：
https://zhuanlan.zhihu.com/p/342675997
https://blog.csdn.net/weixin_43135178/article/details/118877384
没有安装的话，首先需要pip install einops
安装完后，大家可以顺着我的代码，依次执行，我也是参考第二个链接，为了加深理解，又自己跑了一遍。

1. einops.rearrange()重新指定维度

首先仿照image创建一个4维的矩阵，供后续操作。
按照第一维(batch维)进行stack

import numpy as np
from einops import rearrange, repeat# suppose we have a set of 32 images in "h w c" format (height-width-channel)
images = [np.random.randn(30, 40, 3) for _ in range(32)]
# stack along first (batch) axis, output is a single array ：(32, 30, 40, 3)
print(rearrange(images, 'b h w c -> b h w c').shape)# Output
# (32, 30, 40, 3)

沿height维进行concat

# concatenate images along height (vertical axis), 960 = 32 * 30 ：(960, 40, 3)
print(rearrange(images, 'b h w c -> (b h) w c').shape)
# Output
# (960, 40, 3)

沿width维进行concat

# concatenated images along horizontal axis, 1280 = 32 * 40 ：(30, 1280, 3)
print(rearrange(images, 'b h w c -> h (b w) c').shape)
# Output
# (30, 1280, 3)

转换维度的次序，比如将通道维度放在height和weight前边

# reordered axes to "b c h w" format for deep learning ：(32, 3, 30, 40)
print(rearrange(images, 'b h w c -> b c h w').shape)
# Output 
# (32, 3, 30, 40)

放缩宽和高，通道数

# 这里(h h1) (w w1)就相当于h与w变为原来的1/h1,1/w1倍# split each image into 4 smaller (top-left, top-right, bottom-left, bottom-right), 128 = 32 * 2 * 2 ：(128, 15, 20, 3)
print(rearrange(images, 'b (h h1) (w w1) c -> (b h1 w1) h w c', h1=2, w1=2).shape)
# Output
# (128, 15, 20, 3)

# space-to-depth operation ：(32, 15, 20, 12)
print(rearrange(images, 'b (h h1) (w w1) c -> b h w (c h1 w1)', h1=2, w1=2).shape)
# Output
# (32, 15, 20, 12)

2. einops.repeat() 增加维度

将单通道灰度图，按照通道层扩增

import numpy as np
from einops import rearrange, repeat, reduce# a grayscale image (of shape height x width)
image = np.random.randn(30, 40)
# change it to RGB format by repeating in each channel：(30, 40, 3)
print(repeat(image, 'h w -> h w c', c=3).shape)
# Output
# (30, 40, 3)

扩增height，变为原来的2倍

# repeat image 2 times along height (vertical axis)：(60, 40)
print(repeat(image, 'h w -> (repeat h) w', repeat=2).shape)
# Output
# (60, 40)

扩增weight，变为原来的3倍

# repeat image 3 times along width：(30, 120)
print(repeat(image, 'h w -> h (repeat w)', repeat=3).shape)
# Output
# (30, 120)

把每一个pixel扩充4倍

# convert each pixel to a small square 2x2. Upsample image by 2x：(60, 80)
print(repeat(image, 'h w -> (h h2) (w w2)', h2=2, w2=2).shape)
# Output 
# (60, 80)

先下采样，然后上采样

# pixelate image first by downsampling by 2x, then upsampling：(30, 40)
downsampled = reduce(image, '(h h2) (w w2) -> h w', 'mean', h2=2, w2=2)
print(repeat(downsampled, 'h w -> (h h2) (w w2)', h2=2, w2=2).shape)
# Output
# (30, 40)

3. einops.reduce()

减少一维

import numpy as np
from einops import rearrange, reducex = np.random.randn(100, 32, 64)
# perform max-reduction on the first axis:(32, 64)
print(reduce(x, 't b c -> b c', 'max').shape)
# Output
# (32, 64)

# 和上面的操作一样，只不过，更易读
# same as previous, but with clearer axes meaning:(32, 64)
print(reduce(x, 'time batch channel -> batch channel', 'max').shape)

模拟最大池化功能

x = np.random.randn(10, 20, 30, 40)
# 2d max-pooling with kernel size = 2 * 2 for image processing:(10, 20, 15, 20)
y1 = reduce(x, 'b c (h1 h2) (w1 w2) -> b c h1 w1', 'max', h2=2, w2=2)
print(y1.shape)
# Output
# (10, 20, 15, 20)

全局平均池化

# Global average pooling:(10, 20)
print(reduce(x, 'b c h w -> b c', 'mean').shape)
# Output
# (10, 20)