深度篇——实例分割(三) 细说?mask rcnn 实例分割代码训练自己数据之相关网络，数据处理，工具等_综合

返回主目录

返回实例分割目录

上一章：深度篇——实例分割(二) 细说 mask rcnn 实例分割代码训练自己数据

论文地址：《Mask R-CNN》

作者代码地址：Mask R-CNN code

我优化的代码地址：mask_rcnn_pro

本小节，细说 mask rcnn 实例分割代码训练自己数据相关网络，数据处理，工具等

六. 相关网络，数据处理，工具代码

下面的代码携带有注释，思考一下，理解基本不难的。不好理解的地方，直接 debug 去看。更立体。

1.相关工具文件

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/05/13 22:57
# @Author   : WanDaoYi
# @FileName : image_utils.py
# ============================================import numpy as np
import skimage.color
import skimage.io
import skimage.transform
from distutils.version import LooseVersion
from config import cfgclass ImageUtils(object):def __init__(self):self.mean_pixel = np.array(cfg.COMMON.MEAN_PIXEL)passdef parse_image_meta_graph(self, meta):"""Parses a tensor that contains image attributes to its components.See compose_image_meta() for more details.:param meta: [batch, meta length] where meta length depends on NUM_CLASSES:return: Returns a dict of the parsed tensors."""image_id = meta[:, 0]original_image_shape = meta[:, 1:4]image_shape = meta[:, 4:7]window = meta[:, 7:11]  # (y1, x1, y2, x2) window of image in in pixelsscale = meta[:, 11]active_class_ids = meta[:, 12:]return {"image_id": image_id,"original_image_shape": original_image_shape,"image_shape": image_shape,"window": window,"scale": scale,"active_class_ids": active_class_ids,}passdef compose_image_meta(self, image_id, original_image_shape, image_shape,window, scale, active_class_ids):"""Takes attributes of an image and puts them in one 1D array.:param image_id: An int ID of the image. Useful for debugging.:param original_image_shape: [H, W, C] before resizing or padding.:param image_shape: [H, W, C] after resizing and padding:param window: (y1, x1, y2, x2) in pixels. The area of the image where the realimage is (excluding the padding):param scale: The scaling factor applied to the original image (float32):param active_class_ids: List of class_ids available in the dataset from whichthe image came. Useful if training on images from multiple datasetswhere not all classes are present in all datasets.:return:"""meta = np.array([image_id] +  # size=1list(original_image_shape) +  # size=3list(image_shape) +  # size=3list(window) +  # size=4 (y1, x1, y2, x2) in image cooredinates[scale] +  # size=1list(active_class_ids)  # size=class_num)return metapassdef load_image(self, image_path):"""Load the specified image and return a [H,W,3] Numpy array.:param image_path: image path:return:"""# Load imageimage = skimage.io.imread(image_path)# If grayscale. Convert to RGB for consistency.if image.ndim != 3:image = skimage.color.gray2rgb(image)# If has an alpha channel, remove it for consistencyif image.shape[-1] == 4:image = image[..., :3]return imagepassdef mold_image(self, images, mean_pixel):"""Expects an RGB image (or array of images) and subtractsthe mean pixel and converts it to float. Expects imagecolors in RGB order.:param images::param mean_pixel::return:"""return images.astype(np.float32) - np.array(mean_pixel)passdef mode_input(self, images_info_list):"""Takes a list of images and modifies them to the format expectedas an input to the neural network.:param images_info_list: List of image matrices [height,width,depth]. Images can havedifferent sizes.:return: returns 3 Numpy matrices:molded_images_list: [N, h, w, 3]. Images resized and normalized.image_metas_list: [N, length of meta data]. Details about each image.windows_list: [N, (y1, x1, y2, x2)]. The portion of the image that has theoriginal image (padding excluded)."""molded_images_list = []image_metas_list = []windows_list = []image_mi_dim = cfg.COMMON.IMAGE_MIN_DIMimage_max_dim = cfg.COMMON.IMAGE_MAX_DIMimage_min_scale = cfg.COMMON.IMAGE_MIN_SCALEimage_resize_mode = cfg.COMMON.IMAGE_RESIZE_MODEfor image_info in images_info_list:# resize imagemolded_image, window, scale, padding, crop = self.resize_image(image_info,min_dim=image_mi_dim,min_scale=image_min_scale,max_dim=image_max_dim,resize_mode=image_resize_mode)molded_image = self.mold_image(molded_image, self.mean_pixel)# Build image_metaimage_meta = self.compose_image_meta(0, image_info.shape, molded_image.shape, window, scale,np.zeros([cfg.COMMON.CLASS_NUM], dtype=np.int32))# Appendmolded_images_list.append(molded_image)image_metas_list.append(image_meta)windows_list.append(window)pass# Pack into arraysmolded_images_list = np.stack(molded_images_list)image_metas_list = np.stack(image_metas_list)windows_list = np.stack(windows_list)return molded_images_list, image_metas_list, windows_listpassdef resize(self, image, output_shape, order=1, resize_mode="constant", cval=0, clip=True,preserve_range=False, anti_aliasing=False, anti_aliasing_sigma=None):"""A wrapper for Scikit-Image resize().Scikit-Image generates warnings on every call to resize() if it doesn'treceive the right parameters. The right parameters depend on the versionof skimage. This solves the problem by using different parameters perversion. And it provides a central place to control resizing defaults.:param image::param output_shape::param order::param resize_mode::param cval::param clip::param preserve_range::param anti_aliasing::param anti_aliasing_sigma::return:"""if LooseVersion(skimage.__version__) >= LooseVersion("0.14"):# New in 0.14: anti_aliasing. Default it to False for backward# compatibility with skimage 0.13.return skimage.transform.resize(image, output_shape,order=order, mode=resize_mode, cval=cval, clip=clip,preserve_range=preserve_range, anti_aliasing=anti_aliasing,anti_aliasing_sigma=anti_aliasing_sigma)else:return skimage.transform.resize(image, output_shape,order=order, mode=resize_mode, cval=cval, clip=clip,preserve_range=preserve_range)passdef resize_image(self, image, min_dim=None, max_dim=None, min_scale=None, resize_mode="square"):"""resize an image keeping the aspect ratio unchanged.:param image::param min_dim: if provided, resize the image such that it's smaller dimension == min_dim:param max_dim: if provided, ensures that the image longest side doesn'texceed this value.:param min_scale: if provided, ensure that the image is scaled up by at leastthis percent even if min_dim doesn't require it.:param resize_mode: resizing mode.none: No resizing. Return the image unchanged.square: Resize and pad with zeros to get a square imageof size [max_dim, max_dim].pad64: Pads width and height with zeros to make them multiples of 64.If min_dim or min_scale are provided, it scales the image upbefore padding. max_dim is ignored in this mode.The multiple of 64 is needed to ensure smooth scaling of featuremaps up and down the 6 levels of the FPN pyramid (2**6=64).crop: Picks random crops from the image. First, scales the image basedon min_dim and min_scale, then picks a random crop ofsize min_dim x min_dim. Can be used in training only.max_dim is not used in this mode.:return:image: the resized imagewindow: (y1, x1, y2, x2). If max_dim is provided, padding mightbe inserted in the returned image. If so, this window is thecoordinates of the image part of the full image (excludingthe padding). The x2, y2 pixels are not included.scale: The scale factor used to resize the imagepadding: Padding added to the image [(top, bottom), (left, right), (0, 0)]"""# Keep track of image dtype and return results in the same dtypeimage_dtype = image.dtype# Default window (y1, x1, y2, x2) and default scale == 1.h, w = image.shape[:2]window = (0, 0, h, w)scale = 1padding = [(0, 0), (0, 0), (0, 0)]crop = Noneif resize_mode == "none":return image, window, scale, padding, croppass# Scale?if min_dim:# Scale up but not downscale = max(1, min_dim / min(h, w))passif min_scale and scale < min_scale:scale = min_scalepass# Does it exceed max dim?if max_dim and resize_mode == "square":image_max = max(h, w)if round(image_max * scale) > max_dim:scale = max_dim / image_maxpasspass# Resize image using bilinear interpolationif scale != 1:image = self.resize(image, (round(h * scale), round(w * scale)), preserve_range=True)pass# Need padding or cropping?if resize_mode == "square":# Get new height and widthh, w = image.shape[:2]top_pad = (max_dim - h) // 2bottom_pad = max_dim - h - top_padleft_pad = (max_dim - w) // 2right_pad = max_dim - w - left_padpadding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]image = np.pad(image, padding, mode='constant', constant_values=0)window = (top_pad, left_pad, h + top_pad, w + left_pad)passelif resize_mode == "pad64":h, w = image.shape[:2]# Both sides must be divisible by 64assert min_dim % 64 == 0, "Minimum dimension must be a multiple of 64"# Heightif h % 64 > 0:max_h = h - (h % 64) + 64top_pad = (max_h - h) // 2bottom_pad = max_h - h - top_padelse:top_pad = bottom_pad = 0# Widthif w % 64 > 0:max_w = w - (w % 64) + 64left_pad = (max_w - w) // 2right_pad = max_w - w - left_padelse:left_pad = right_pad = 0padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]image = np.pad(image, padding, mode='constant', constant_values=0)window = (top_pad, left_pad, h + top_pad, w + left_pad)passelif resize_mode == "crop":# Pick a random croph, w = image.shape[:2]y = np.random.randint(0, (h - min_dim))x = np.random.randint(0, (w - min_dim))crop = (y, x, min_dim, min_dim)image = image[y:y + min_dim, x:x + min_dim]window = (0, 0, min_dim, min_dim)passelse:raise Exception("Mode {} not supported".format(resize_mode))passreturn image.astype(image_dtype), window, scale, padding, croppass

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/05/13 12:06
# @Author   : WanDaoYi
# @FileName : misc_utils.py
# ============================================import math
import numpy as np
import tensorflow as tf
from utils.bbox_utils import BboxUtil
from config import cfgclass MiscUtils(object):def __init__(self):self.bbox_util = BboxUtil()passdef compute_backbone_shapes(self, image_shape, backbone_strides):"""Computes the width and height of each stage of the backbone network:param image_shape: [h, w, c]:param backbone_strides: The strides of each layer of the FPN Pyramid.These values are based on a resNet101 backbone.:return: [N, (height, width)]. Where N is the number of stages"""return np.array([[int(math.ceil(image_shape[0] / stride)),int(math.ceil(image_shape[1] / stride))] for stride in backbone_strides])passdef batch_slice(self, inputs, graph_fn, batch_size, names=None):"""Splits inputs into slices and feeds each slice to a copy of the givencomputation graph and then combines the results. It allows you to run agraph on a batch of inputs even if the graph is written to support oneinstance only.:param inputs: list of tensors. All must have the same first dimension length:param graph_fn: A function that returns a TF tensor that's part of a graph.:param batch_size: number of slices to divide the data into.:param names: If provided, assigns names to the resulting tensors.:return:"""if not isinstance(inputs, list):inputs = [inputs]outputs = []for i in range(batch_size):inputs_slice = [x[i] for x in inputs]output_slice = graph_fn(*inputs_slice)if not isinstance(output_slice, (tuple, list)):output_slice = [output_slice]outputs.append(output_slice)# Change outputs from a list of slices where each is# a list of outputs to a list of outputs and each has# a list of slicesoutputs = list(zip(*outputs))if names is None:names = [None] * len(outputs)result = [tf.stack(o, axis=0, name=n)for o, n in zip(outputs, names)]if len(result) == 1:result = result[0]return resultpassdef trim_zeros_graph(self, boxes, name='trim_zeros'):"""Often boxes are represented with matrices of shape [N, 4] andare padded with zeros. This removes zero boxes.:param boxes: [N, 4] matrix of boxes.:param name::return: non_zeros: [N] a 1D boolean mask identifying the rows to keep"""non_zeros = tf.cast(tf.reduce_sum(tf.abs(boxes), axis=1), tf.bool)boxes = tf.boolean_mask(boxes, non_zeros, name=name)return boxes, non_zerospassdef detection_targets_graph(self, proposals, gt_class_ids, gt_boxes, gt_masks):"""Generates detection targets for one image. Subsamples proposals andgenerates target class IDs, bounding box deltas, and masks for each.:param proposals: [POST_NMS_ROIS_TRAINING, (y1, x1, y2, x2)] in normalized coordinates.Might be zero padded if there are not enough proposals.:param gt_class_ids: [MAX_GT_INSTANCES] int class IDs:param gt_boxes: [MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized coordinates.:param gt_masks: [height, width, MAX_GT_INSTANCES] of boolean type.:return: Target ROIs and corresponding class IDs, bounding box shifts, and masks.rois: [TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized coordinatesclass_ids: [TRAIN_ROIS_PER_IMAGE]. Integer class IDs. Zero padded.deltas: [TRAIN_ROIS_PER_IMAGE, (dy, dx, log(dh), log(dw))]masks: [TRAIN_ROIS_PER_IMAGE, height, width]. Masks cropped to bboxboundaries and resized to neural network output size.Note: Returned arrays might be zero padded if not enough target ROIs."""# Assertionsasserts = [tf.Assert(tf.greater(tf.shape(proposals)[0], 0), [proposals], name="roi_assertion"), ]with tf.control_dependencies(asserts):proposals = tf.identity(proposals)pass# Remove zero paddingproposals, _ = self.trim_zeros_graph(proposals, name="trim_proposals")gt_boxes, non_zeros = self.trim_zeros_graph(gt_boxes, name="trim_gt_boxes")gt_class_ids = tf.boolean_mask(gt_class_ids, non_zeros, name="trim_gt_class_ids")gt_masks = tf.gather(gt_masks, tf.where(non_zeros)[:, 0], axis=2, name="trim_gt_masks")# Handle COCO crowds# A crowd box in COCO is a bounding box around several instances. Exclude# them from training. A crowd box is given a negative class ID.crowd_ix = tf.where(gt_class_ids < 0)[:, 0]non_crowd_ix = tf.where(gt_class_ids > 0)[:, 0]crowd_boxes = tf.gather(gt_boxes, crowd_ix)gt_class_ids = tf.gather(gt_class_ids, non_crowd_ix)gt_boxes = tf.gather(gt_boxes, non_crowd_ix)gt_masks = tf.gather(gt_masks, non_crowd_ix, axis=2)# Compute overlaps matrix [proposals, gt_boxes]overlaps = self.bbox_util.overlaps_graph(proposals, gt_boxes)# Compute overlaps with crowd boxes [proposals, crowd_boxes]crowd_overlaps = self.bbox_util.overlaps_graph(proposals, crowd_boxes)crowd_iou_max = tf.reduce_max(crowd_overlaps, axis=1)no_crowd_bool = (crowd_iou_max < 0.001)# Determine positive and negative ROIsroi_iou_max = tf.reduce_max(overlaps, axis=1)# 1. Positive ROIs are those with >= 0.5 IoU with a GT boxpositive_roi_bool = (roi_iou_max >= 0.5)positive_indices = tf.where(positive_roi_bool)[:, 0]# 2. Negative ROIs are those with < 0.5 with every GT box. Skip crowds.negative_indices = tf.where(tf.logical_and(roi_iou_max < 0.5, no_crowd_bool))[:, 0]# Subsample ROIs. Aim for 33% positive# Positive ROIspositive_count = int(cfg.TRAIN.ROIS_PER_IMAGE * cfg.TRAIN.ROI_POSITIVE_RATIO)positive_indices = tf.random_shuffle(positive_indices)[:positive_count]positive_count = tf.shape(positive_indices)[0]# Negative ROIs. Add enough to maintain positive:negative ratio.r = 1.0 / cfg.TRAIN.ROI_POSITIVE_RATIOnegative_count = tf.cast(r * tf.cast(positive_count, tf.float32), tf.int32) - positive_countnegative_indices = tf.random_shuffle(negative_indices)[:negative_count]# Gather selected ROIspositive_rois = tf.gather(proposals, positive_indices)negative_rois = tf.gather(proposals, negative_indices)# Assign positive ROIs to GT boxes.positive_overlaps = tf.gather(overlaps, positive_indices)roi_gt_box_assignment = tf.cond(tf.greater(tf.shape(positive_overlaps)[1], 0),true_fn=lambda: tf.argmax(positive_overlaps, axis=1),false_fn=lambda: tf.cast(tf.constant([]), tf.int64))roi_gt_boxes = tf.gather(gt_boxes, roi_gt_box_assignment)roi_gt_class_ids = tf.gather(gt_class_ids, roi_gt_box_assignment)# Compute bbox refinement for positive ROIsdeltas = self.bbox_util.box_refinement_graph(positive_rois, roi_gt_boxes)deltas /= np.array(cfg.COMMON.BBOX_STD_DEV)# Assign positive ROIs to GT masks# Permute masks to [N, height, width, 1]transposed_masks = tf.expand_dims(tf.transpose(gt_masks, [2, 0, 1]), -1)# Pick the right mask for each ROIroi_masks = tf.gather(transposed_masks, roi_gt_box_assignment)# Compute mask targetsboxes = positive_roisif cfg.TRAIN.USE_MINI_MASK:# Transform ROI coordinates from normalized image space# to normalized mini-mask space.y1, x1, y2, x2 = tf.split(positive_rois, 4, axis=1)gt_y1, gt_x1, gt_y2, gt_x2 = tf.split(roi_gt_boxes, 4, axis=1)gt_h = gt_y2 - gt_y1gt_w = gt_x2 - gt_x1y1 = (y1 - gt_y1) / gt_hx1 = (x1 - gt_x1) / gt_wy2 = (y2 - gt_y1) / gt_hx2 = (x2 - gt_x1) / gt_wboxes = tf.concat([y1, x1, y2, x2], 1)box_ids = tf.range(0, tf.shape(roi_masks)[0])masks = tf.image.crop_and_resize(tf.cast(roi_masks, tf.float32),boxes, box_ids,cfg.TRAIN.MASK_SHAPE)# Remove the extra dimension from masks.masks = tf.squeeze(masks, axis=3)# Threshold mask pixels at 0.5 to have GT masks be 0 or 1 to use with# binary cross entropy loss.masks = tf.round(masks)# Append negative ROIs and pad bbox deltas and masks that# are not used for negative ROIs with zeros.rois = tf.concat([positive_rois, negative_rois], axis=0)N = tf.shape(negative_rois)[0]P = tf.maximum(cfg.TRAIN.ROIS_PER_IMAGE - tf.shape(rois)[0], 0)rois = tf.pad(rois, [(0, P), (0, 0)])# roi_gt_boxes = tf.pad(roi_gt_boxes, [(0, N + P), (0, 0)])roi_gt_class_ids = tf.pad(roi_gt_class_ids, [(0, N + P)])deltas = tf.pad(deltas, [(0, N + P), (0, 0)])masks = tf.pad(masks, [[0, N + P], (0, 0), (0, 0)])return rois, roi_gt_class_ids, deltas, maskspass

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/05/01 00:22
# @Author   : WanDaoYi
# @FileName : mask_util.py
# ============================================import warnings
import numpy as np
import scipy.ndimage
from utils.image_utils import ImageUtils
from pycocotools import mask as coco_mask_utils
from config import cfgclass MaskUtil(object):def __init__(self):self.coco_model_url = cfg.COMMON.COCO_MODEL_URLself.image_utils = ImageUtils()pass# 计算两个 masks 的 IOU 重叠率def compute_overlaps_masks(self, masks1, masks2):""":param masks1: [Height, Width, instances]:param masks2: [Height, Width, instances]:return: 两个 masks 的 IOU 重叠率"""# 如果其中一个 masks 为空，则返回 空 结果mask_flag = masks1.shape[-1] == 0 or masks2.shape[-1] == 0if mask_flag:return np.zeros((masks1.shape[-1], masks2.shape[-1]))pass# 将 masks 扁平化后并计算它们的面积masks1 = np.reshape(masks1 > .5, (-1, masks1.shape[-1])).astype(np.float32)masks2 = np.reshape(masks2 > .5, (-1, masks2.shape[-1])).astype(np.float32)area1 = np.sum(masks1, axis=0)area2 = np.sum(masks2, axis=0)# intersections and unionintersections = np.dot(masks1.T, masks2)union = area1[:, None] + area2[None, :] - intersectionsoverlaps = intersections / unionreturn overlapspassdef annotation_2_mask(self, annotation, height, width):"""Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask.:param annotation: annotation info:param height: image info of height:param width: image info of width:return: binary mask (numpy 2D array)"""segment = annotation['segmentation']if isinstance(segment, list):# polygon -- a single object might consist of multiple parts# we merge all parts into one mask rle coderles = coco_mask_utils.frPyObjects(segment, height, width)rle = coco_mask_utils.merge(rles)passelif isinstance(segment['counts'], list):# uncompressed RLErle = coco_mask_utils.frPyObjects(segment, height, width)passelse:# rlerle = segment['segmentation']passmask = coco_mask_utils.decode(rle)return maskpassdef load_mask(self, data, image_id):"""Load instance masks for the given image.Different datasets use different ways to store masks. Thisfunction converts the different mask format to one formatin the form of a bitmap [height, width, instances].:param data: The Dataset object to pick data from:param image_id: image id of image:return:masks: A bool array of shape [height, width, instance count] withone mask per instance.class_ids: a 1D array of class IDs of the instance masks."""image_info = data.image_info_list[image_id]instance_masks = []class_ids = []annotations = data.image_info_list[image_id]["annotations"]# Build mask of shape [height, width, instance_count] and list# of class IDs that correspond to each channel of the mask.for annotation in annotations:class_id = data.class_from_source_map["coco.{}".format(annotation['category_id'])]if class_id:m = self.annotation_2_mask(annotation, image_info["height"], image_info["width"])# Some objects are so small that they're less than 1 pixel area# and end up rounded out. Skip those objects.if m.max() < 1:continuepass# Is it a crowd? If so, use a negative class ID.if annotation['iscrowd']:# Use negative class ID for crowdsclass_id *= -1# For crowd masks, annToMask() sometimes returns a mask# smaller than the given dimensions. If so, resize it.if m.shape[0] != image_info["height"] or m.shape[1] != image_info["width"]:m = np.ones([image_info["height"], image_info["width"]], dtype=bool)instance_masks.append(m)class_ids.append(class_id)passmask = np.stack(instance_masks, axis=2).astype(np.bool)class_ids = np.array(class_ids, dtype=np.int32)return mask, class_idspassdef resize_mask(self, mask, scale, padding, crop=None):"""resize a mask using the given scale and padding.Typically, you get the scale and padding from resize_image() toensure both, the image and the mask, are resized consistently.:param mask::param scale: mask scaling factor:param padding: Padding to add to the mask in the form[(top, bottom), (left, right), (0, 0)]:param crop::return:"""# Suppress warning from scipy 0.13.0, the output shape of zoom() is# calculated with round() instead of int()with warnings.catch_warnings():warnings.simplefilter("ignore")mask = scipy.ndimage.zoom(mask, zoom=[scale, scale, 1], order=0)if crop is not None:y, x, h, w = cropmask = mask[y:y + h, x:x + w]else:mask = np.pad(mask, padding, mode='constant', constant_values=0)return maskpassdef minimize_mask(self, bbox, mask, mini_shape):"""Resize masks to a smaller version to reduce memory load.Mini-masks can be resized back to image scale using expand_masks():param bbox::param mask::param mini_shape::return:"""# 避免 传参 过来 是 list，在 cfg.TRAIN.MINI_MASK_SHAPE 获得的是 listmini_shape = tuple(mini_shape)mini_mask = np.zeros(mini_shape + (mask.shape[-1],), dtype=bool)for i in range(mask.shape[-1]):# Pick slice and cast to bool in case load_mask() returned wrong dtypem = mask[:, :, i].astype(bool)y1, x1, y2, x2 = bbox[i][:4]m = m[y1:y2, x1:x2]if m.size == 0:raise Exception("Invalid bounding box with area of zero")# Resize with bilinear interpolationm = self.image_utils.resize(m, mini_shape)mini_mask[:, :, i] = np.around(m).astype(np.bool)return mini_maskpassdef unmold_mask(self, mask, bbox, image_shape):"""Converts a mask generated by the neural network to a format similarto its original shape.:param mask: [height, width] of type float. A small, typically 28x28 mask.:param bbox: [y1, x1, y2, x2]. The box to fit the mask in.:param image_shape::return: return a binary mask with the same size as the original image."""threshold = 0.5y1, x1, y2, x2 = bboxmask = self.image_utils.resize(mask, (y2 - y1, x2 - x1))mask = np.where(mask >= threshold, 1, 0).astype(np.bool)# Put the mask in the right location.full_mask = np.zeros(image_shape[:2], dtype=np.bool)full_mask[y1:y2, x1:x2] = maskreturn full_maskpass

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/05/01 00:13
# @Author   : WanDaoYi
# @FileName : bbox_utils.py
# ============================================import numpy as np
import tensorflow as tf
from utils.image_utils import ImageUtils
from utils.mask_util import MaskUtil
from config import cfgclass BboxUtil(object):def __init__(self):self.image_utils = ImageUtils()self.mask_util = MaskUtil()pass# 提取 bounding boxesdef extract_bboxes(self, mask):""":param mask: [height, width, num_instances]. Mask pixels are either 1 or 0.:return: bbox array [num_instances, (y1, x1, y2, x2)]"""# 获取无类别的 instances 值，只区分前景和背景，类别在 目标检测的时候区分num_instance = mask.shape[-1]# 初始化 boxesboxes = np.zeros([num_instance, 4], dtype=np.int32)for i in range(num_instance):m = mask[:, :, i]# bounding box# x 轴方向horizontal_indicies = np.where(np.any(m, axis=0))[0]# y 轴方向vertical_indicies = np.where(np.any(m, axis=1))[0]if horizontal_indicies.shape[0]:x1, x2 = horizontal_indicies[[0, -1]]y1, y2 = vertical_indicies[[0, -1]]# x2 and y2 should not be part of the box. Increment by 1.# 就是 x2 和 y2 不包含在 box 内，如 x1 = 1, x2 = 5, y1 = 1, y2 = 5# 围起来的面积右下角不包含 (5, 5)，所以加 1，以使 右下角超出 mask 面积外x2 += 1y2 += 1passelse:# No mask for this instance. Might happen due to# resizing or cropping. Set bbox to zerosx1, x2, y1, y2 = 0, 0, 0, 0passboxes[i] = np.array([y1, x1, y2, x2])passreturn boxes.astype(np.int32)pass# 计算 box 的 IOUdef compute_iou(self, box, boxes):""":param box: (y1, x1, y2, x2):param boxes: [N, (y1, x1, y2, x2)]:return: iou"""# 计算 box 面积# area = (x2 - x1) * (y2 - y1)box_area = (box[3] - box[1]) * (box[2] - box[0])boxes_area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])# 计算交面积x1 = np.maximum(box[1], boxes[:, 1])x2 = np.minimum(box[3], boxes[:, 3])y1 = np.maximum(box[0], boxes[:, 0])y2 = np.minimum(box[2], boxes[:, 2])intersection = np.maximum(x2 - x1, 0) * np.maximum(y2 - y1, 0)# 计算 IOUunion = box_area + boxes_area[:] - intersection[:]# iou = np.maximum(1.0 * inter_area / union_area, np.finfo(np.float32).eps)iou = intersection / unionreturn ioupass# 计算 boxes 的 IOU 重叠率def compute_overlaps(self, boxes1, boxes2):""":param boxes1: [N, (y1, x1, y2, x2)]:param boxes2: [N, (y1, x1, y2, x2)]:return:"""# 定义覆盖率结构overlaps = np.zeros((boxes1.shape[0], boxes2.shape[0]))for i in range(overlaps.shape[1]):box2 = boxes2[i]overlaps[:, i] = self.compute_iou(box2, boxes1)passreturn overlapspassdef overlaps_graph(self, boxes1, boxes2):"""Computes IoU overlaps between two sets of boxes.:param boxes1: [N, (y1, x1, y2, x2)].:param boxes2: [N, (y1, x1, y2, x2)].:return:"""# 1. Tile boxes2 and repeat boxes1. This allows us to compare# every boxes1 against every boxes2 without loops.# TF doesn't have an equivalent to np.repeat() so simulate it# using tf.tile() and tf.reshape.b1 = tf.reshape(tf.tile(tf.expand_dims(boxes1, 1),[1, 1, tf.shape(boxes2)[0]]), [-1, 4])b2 = tf.tile(boxes2, [tf.shape(boxes1)[0], 1])# 2. Compute intersectionsb1_y1, b1_x1, b1_y2, b1_x2 = tf.split(b1, 4, axis=1)b2_y1, b2_x1, b2_y2, b2_x2 = tf.split(b2, 4, axis=1)y1 = tf.maximum(b1_y1, b2_y1)x1 = tf.maximum(b1_x1, b2_x1)y2 = tf.minimum(b1_y2, b2_y2)x2 = tf.minimum(b1_x2, b2_x2)intersection = tf.maximum(x2 - x1, 0) * tf.maximum(y2 - y1, 0)# 3. Compute unionsb1_area = (b1_y2 - b1_y1) * (b1_x2 - b1_x1)b2_area = (b2_y2 - b2_y1) * (b2_x2 - b2_x1)union = b1_area + b2_area - intersection# 4. Compute IoU and reshape to [boxes1, boxes2]iou = intersection / unionoverlaps = tf.reshape(iou, [tf.shape(boxes1)[0], tf.shape(boxes2)[0]])return overlapspass# 非极大值抑制def non_max_suppression(self, boxes, scores, threshold):""":param boxes: [N, (y1, x1, y2, x2)]. 注意，(y2, x2) 处于 box 之外:param scores: box 的得分:param threshold: IOU 阈值:return:"""assert boxes.shape[0] > 0if boxes.dtype.kind != "f":boxes = boxes.astype(np.float32)pass# Get indices of boxes sorted by scores (highest first)ixs = scores.argsort()[::-1]pick = []while len(ixs) > 0:# Pick top box and add its index to the listi = ixs[0]pick.append(i)# Compute IoU of the picked box with the restiou = self.compute_iou(boxes[i], boxes[ixs[1:]])# Identify boxes with IoU over the threshold. This# returns indices into ixs[1:], so add 1 to get# indices into ixs.remove_ixs = np.where(iou > threshold)[0] + 1# Remove indices of the picked and overlapped boxes.ixs = np.delete(ixs, remove_ixs)ixs = np.delete(ixs, 0)return np.array(pick, dtype=np.int32)pass# boxes 信息转换，bounding box regression# tx = (x ? xa) / wa , ty = (y ? ya) / ha,# tw = log(w / wa), th = log(h / ha)def apply_box_deltas(self, boxes, deltas):""":param boxes: [N, (y1, x1, y2, x2)]. 注意，(y2, x2) 处于 box 之外:param deltas: [N, (dy, dx, log(dh), log(dw))]:return:"""boxes = boxes.astype(np.float32)# Convert to y, x, h, wheight = boxes[:, 2] - boxes[:, 0]width = boxes[:, 3] - boxes[:, 1]center_y = boxes[:, 0] + 0.5 * heightcenter_x = boxes[:, 1] + 0.5 * width# Apply deltascenter_y += deltas[:, 0] * heightcenter_x += deltas[:, 1] * widthheight *= np.exp(deltas[:, 2])width *= np.exp(deltas[:, 3])# Convert back to y1, x1, y2, x2y1 = center_y - 0.5 * heightx1 = center_x - 0.5 * widthy2 = y1 + heightx2 = x1 + widthreturn np.stack([y1, x1, y2, x2], axis=1)pass# boxes 与 ground truth 信息转换 tf 图，bounding box regression# 参考 bounding box regression 的公式def box_refinement_graph(self, box, gt_box):""":param box: [N, (y1, x1, y2, x2)]:param gt_box: [N, (y1, x1, y2, x2)]:return:"""box = tf.cast(box, tf.float32)gt_box = tf.cast(gt_box, tf.float32)height = box[:, 2] - box[:, 0]width = box[:, 3] - box[:, 1]center_y = box[:, 0] + 0.5 * heightcenter_x = box[:, 1] + 0.5 * widthgt_height = gt_box[:, 2] - gt_box[:, 0]gt_width = gt_box[:, 3] - gt_box[:, 1]gt_center_y = gt_box[:, 0] + 0.5 * gt_heightgt_center_x = gt_box[:, 1] + 0.5 * gt_widthdy = (gt_center_y - center_y) / heightdx = (gt_center_x - center_x) / widthdh = tf.log(gt_height / height)dw = tf.log(gt_width / width)result = tf.stack([dy, dx, dh, dw], axis=1)return resultpass# boxes 与 ground truth 信息转换，bounding box regression# 参考 bounding box regression 的公式def box_refinement(self, box, gt_box):""":param box: [N, (y1, x1, y2, x2)], 假设 (y2, x2) 处于 box 之外:param gt_box: [N, (y1, x1, y2, x2)]:return:"""box = box.astype(np.float32)gt_box = gt_box.astype(np.float32)height = box[:, 2] - box[:, 0]width = box[:, 3] - box[:, 1]center_y = box[:, 0] + 0.5 * heightcenter_x = box[:, 1] + 0.5 * widthgt_height = gt_box[:, 2] - gt_box[:, 0]gt_width = gt_box[:, 3] - gt_box[:, 1]gt_center_y = gt_box[:, 0] + 0.5 * gt_heightgt_center_x = gt_box[:, 1] + 0.5 * gt_widthdy = (gt_center_y - center_y) / heightdx = (gt_center_x - center_x) / widthdh = np.log(gt_height / height)dw = np.log(gt_width / width)return np.stack([dy, dx, dh, dw], axis=1)pass# 将框从像素坐标转为标准坐标def norm_boxes_graph(self, boxes, shape):""":param boxes: [..., (y1, x1, y2, x2)] in pixel coordinates:param shape: [..., (height, width)] in pixels:return: [..., (y1, x1, y2, x2)] in normalized coordinates注意：像素坐标 (y2，x2) 在框外。但在标准化坐标系下它在盒子里。"""h, w = tf.split(tf.cast(shape, tf.float32), 2)scale = tf.concat([h, w, h, w], axis=-1) - tf.constant(1.0)shift = tf.constant([0., 0., 1., 1.])return tf.divide(boxes - shift, scale)passdef norm_boxes(self, boxes, shape):"""Converts boxes from pixel coordinates to normalized coordinates.:param boxes: [N, (y1, x1, y2, x2)] in pixel coordinates:param shape: [..., (height, width)] in pixels:return: [N, (y1, x1, y2, x2)] in normalized coordinatesNote: In pixel coordinates (y2, x2) is outside the box.But in normalized coordinates it's inside the box."""h, w = shapescale = np.array([h - 1, w - 1, h - 1, w - 1])shift = np.array([0, 0, 1, 1])return np.divide((boxes - shift), scale).astype(np.float32)passdef denorm_boxes(self, boxes, shape):"""Converts boxes from normalized coordinates to pixel coordinates.:param boxes: [N, (y1, x1, y2, x2)] in normalized coordinates:param shape: [..., (height, width)] in pixels:return: [N, (y1, x1, y2, x2)] in pixel coordinatesNote: In pixel coordinates (y2, x2) is outside the box. But in normalizedcoordinates it's inside the box."""h, w = shapescale = np.array([h - 1, w - 1, h - 1, w - 1])shift = np.array([0, 0, 1, 1])return np.around(np.multiply(boxes, scale) + shift).astype(np.int32)passdef apply_box_deltas_graph(self, boxes, deltas):"""Applies the given deltas to the given boxes.:param boxes: [N, (y1, x1, y2, x2)] boxes to update:param deltas: [N, (dy, dx, log(dh), log(dw))] refinements to apply:return:"""# Convert to y, x, h, wheight = boxes[:, 2] - boxes[:, 0]width = boxes[:, 3] - boxes[:, 1]center_y = boxes[:, 0] + 0.5 * heightcenter_x = boxes[:, 1] + 0.5 * width# Apply deltascenter_y += deltas[:, 0] * heightcenter_x += deltas[:, 1] * widthheight *= tf.exp(deltas[:, 2])width *= tf.exp(deltas[:, 3])# Convert back to y1, x1, y2, x2y1 = center_y - 0.5 * heightx1 = center_x - 0.5 * widthy2 = y1 + heightx2 = x1 + widthresult = tf.stack([y1, x1, y2, x2], axis=1, name="apply_box_deltas_out")return resultpassdef clip_boxes_graph(self, boxes, window):""":param boxes: [N, (y1, x1, y2, x2)]:param window: [4] in the form y1, x1, y2, x2:return:"""# Splitwy1, wx1, wy2, wx2 = tf.split(window, 4)y1, x1, y2, x2 = tf.split(boxes, 4, axis=1)# Clipy1 = tf.maximum(tf.minimum(y1, wy2), wy1)x1 = tf.maximum(tf.minimum(x1, wx2), wx1)y2 = tf.maximum(tf.minimum(y2, wy2), wy1)x2 = tf.maximum(tf.minimum(x2, wx2), wx1)clipped = tf.concat([y1, x1, y2, x2], axis=1, name="clipped_boxes")clipped.set_shape((clipped.shape[0], 4))return clippedpassdef load_image_gt(self, data, image_id, augmentation=None, use_mini_mask=False):"""Load and return ground truth data for an image (image, mask, bounding boxes).:param data: The Dataset object to pick data from:param image_id: GT bounding boxes and masks for image id.:param augmentation: Optional. An imgaug (https://github.com/aleju/imgaug) augmentation.For example, passing imgaug.augmenters.Fliplr(0.5) flips imagesright/left 50% of the time.:param use_mini_mask: If False, returns full-size masks that are the same heightand width as the original image. These can be big, for example1024x1024x100 (for 100 instances). Mini masks are smaller, typically,224x224 and are generated by extracting the bounding box of theobject and resizing it to MINI_MASK_SHAPE.:return:image: [height, width, 3]shape: the original shape of the image before resizing and cropping.class_ids: [instance_count] Integer class IDsbbox: [instance_count, (y1, x1, y2, x2)]mask: [height, width, instance_count]. The height and width are thoseof the image unless use_mini_mask is True, in which case they aredefined in MINI_MASK_SHAPE."""# Load image and maskimage_path = data.image_info_list[image_id]["path"]image = self.image_utils.load_image(image_path)mask, class_ids = self.mask_util.load_mask(data, image_id)original_shape = image.shapeimage, window, scale, padding, crop = self.image_utils.resize_image(image,min_dim=cfg.COMMON.IMAGE_MIN_DIM,min_scale=cfg.COMMON.IMAGE_MIN_SCALE,max_dim=cfg.COMMON.IMAGE_MAX_DIM,resize_mode=cfg.COMMON.IMAGE_RESIZE_MODE)mask = self.mask_util.resize_mask(mask, scale, padding, crop)# Augmentation# This requires the imgaug lib (https://github.com/aleju/imgaug)if augmentation:import imgaugdef hook(images, augmenter, parents, default):"""Determines which augmenters to apply to masks."""return augmenter.__class__.__name__ in cfg.TRAIN.MASK_AUGMENTERS# Store shapes before augmentation to compareimage_shape = image.shapemask_shape = mask.shape# Make augmenters deterministic to apply similarly to images and masksdet = augmentation.to_deterministic()image = det.augment_image(image)# Change mask to np.uint8 because imgaug doesn't support np.boolmask = det.augment_image(mask.astype(np.uint8), hooks=imgaug.HooksImages(activator=hook))# Verify that shapes didn't changeassert image.shape == image_shape, "Augmentation shouldn't change image size"assert mask.shape == mask_shape, "Augmentation shouldn't change mask size"# Change mask back to boolmask = mask.astype(np.bool)pass# Note that some boxes might be all zeros if the corresponding mask got cropped out.# and here is to filter them out_idx = np.sum(mask, axis=(0, 1)) > 0mask = mask[:, :, _idx]class_ids = class_ids[_idx]# Bounding boxes. Note that some boxes might be all zeros# if the corresponding mask got cropped out.# bbox: [num_instances, (y1, x1, y2, x2)]bbox = self.extract_bboxes(mask)# Active classes# Different datasets have different classes, so track the# classes supported in the dataset of this image.active_class_ids = np.zeros([data.class_num], dtype=np.int32)source_class_ids = data.source_class_ids[data.image_info_list[image_id]["source"]]active_class_ids[source_class_ids] = 1# Resize masks to smaller size to reduce memory usageif use_mini_mask:mask = self.mask_util.minimize_mask(bbox, mask, cfg.TRAIN.MINI_MASK_SHAPE)# Image meta dataimage_meta = self.image_utils.compose_image_meta(image_id, original_shape, image.shape,window, scale, active_class_ids)return image, image_meta, class_ids, bbox, maskpass

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/05/13 12:08
# @Author   : WanDaoYi
# @FileName : anchor_utils.py
# ============================================import numpy as np
from utils.misc_utils import MiscUtils
from utils.bbox_utils import BboxUtil
from config import cfgclass AnchorUtils(object):def __init__(self):self.misc_utils = MiscUtils()self.bbox_utils = BboxUtil()# Cache anchors and reuse if image shape is the sameself._anchor_cache = {}# self.anchors = Nonepassdef get_anchors(self, image_shape):""":return: Returns anchor pyramid for the given image size"""if tuple(image_shape) not in self._anchor_cache:# Generate Anchorsanchor = self.generate_pyramid_anchors(image_shape)# Keep a copy of the latest anchors in pixel coordinates because# it's used in inspect_model notebooks.# TODO: Remove this after the notebook are refactored to not use it# self.anchors = anchorself._anchor_cache[tuple(image_shape)] = self.bbox_utils.norm_boxes(anchor, image_shape[:2])passreturn self._anchor_cache[tuple(image_shape)]passdef generate_pyramid_anchors(self, image_shape):"""Generate anchors at different levels of a feature pyramid.Each scale is associated with a level of the pyramid,but each ratio is used in all levels of the pyramid.:param image_shape: [h, w, c]:return: anchors: [N, (y1, x1, y2, x2)]All generated anchors in one array.Sorted with the same order of the given scales.So, anchors of scale[0] come first, then anchors of scale[1], and so on."""backbone_strides = cfg.COMMON.BACKBONE_STRIDES# [N, (height, width)]. Where N is the number of stagesbackbone_shape = self.misc_utils.compute_backbone_shapes(image_shape, backbone_strides)# Anchors# [anchor_count, (y1, x1, y2, x2)]anchors = []scales = cfg.COMMON.RPN_ANCHOR_SCALESscales_len = len(scales)for i in range(scales_len):anchor_box = self.generate_anchors(scales[i], backbone_shape[i], backbone_strides[i])anchors.append(anchor_box)passreturn np.concatenate(anchors, axis=0)pass# generate anchor boxdef generate_anchors(self, scales, backbone_shape, backbone_strides):""":param scales: 1D array of anchor sizes in pixels. Example: [32, 64, 128]:param backbone_shape: [height, width] spatial shape of the feature map over which to generate anchors.:param backbone_strides: Stride of the feature map relative to the image in pixels.:return: anchor box: Convert to corner coordinates (y1, x1, y2, x2)"""# 1D array of anchor ratios of width/height. Example: [0.5, 1, 2]ratios = cfg.COMMON.RPN_ANCHOR_RATIOS# Stride of anchors on the feature map. For example,# if the value is 2 then generate anchors for every other feature map pixel.anchor_stride = cfg.COMMON.RPN_ANCHOR_STRIDE# Get all combinations of scales and ratiosscales, ratios = np.meshgrid(np.array(scales), np.array(ratios))scales = scales.flatten()ratios = ratios.flatten()# Enumerate heights and widths from scales and ratiosheights = scales / np.sqrt(ratios)widths = scales * np.sqrt(ratios)# Enumerate shifts in feature spaceshifts_y = np.arange(0, backbone_shape[0], anchor_stride) * backbone_stridesshifts_x = np.arange(0, backbone_shape[1], anchor_stride) * backbone_stridesshifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)# Enumerate combinations of shifts, widths, and heightsbox_widths, box_centers_x = np.meshgrid(widths, shifts_x)box_heights, box_centers_y = np.meshgrid(heights, shifts_y)# Reshape to get a list of (y, x) and a list of (h, w)box_centers = np.stack([box_centers_y, box_centers_x], axis=2).reshape([-1, 2])box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])# Convert to corner coordinates (y1, x1, y2, x2)boxes = np.concatenate([box_centers - 0.5 * box_sizes, box_centers + 0.5 * box_sizes], axis=1)return boxespass

2. 公共文件

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/05/12 15:40
# @Author   : WanDaoYi
# @FileName : common.py
# ============================================import numpy as np
import tensorflow as tf
import keras.backend as K
import keras.layers as KL
import keras.models as KM
import keras.engine as KE
from utils.misc_utils import MiscUtils
from utils.bbox_utils import BboxUtil
from utils.image_utils import ImageUtils
from config import cfgdef log2_graph(x):"""Implementation of Log2. TF doesn't have a native implementation."""return tf.log(x) / tf.log(2.0)def conv_block(input_tensor, kernel_size, filters, stage, block,strides=(2, 2), use_bias=True, train_flag=True):"""conv_block is the block that has a conv layer at shortcut:param input_tensor: input tensor:param kernel_size: default 3, the kernel size of middle conv layer at main path:param filters: list of integers, the nb_filters of 3 conv layer at main path:param stage: integer, current stage label, used for generating layer names:param block: 'a','b'..., current block label, used for generating layer names:param strides::param use_bias: Boolean. To use or not use a bias in conv layers.:param train_flag: Boolean. Train or freeze Batch Norm layers:return:Note that from stage 3, the first conv layer at main path is with subsample=(2,2)And the shortcut should have subsample=(2,2) as well"""nb_filter1, nb_filter2, nb_filter3 = filtersconv_name_base = 'res' + str(stage) + block + '_branch'bn_name_base = 'bn' + str(stage) + block + '_branch'x = KL.Conv2D(nb_filter1, (1, 1), strides=strides,name=conv_name_base + '2a', use_bias=use_bias)(input_tensor)x = BatchNorm(name=bn_name_base + '2a')(x, training=train_flag)x = KL.Activation('relu')(x)x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',name=conv_name_base + '2b', use_bias=use_bias)(x)x = BatchNorm(name=bn_name_base + '2b')(x, training=train_flag)x = KL.Activation('relu')(x)x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c', use_bias=use_bias)(x)x = BatchNorm(name=bn_name_base + '2c')(x, training=train_flag)shortcut = KL.Conv2D(nb_filter3, (1, 1), strides=strides,name=conv_name_base + '1', use_bias=use_bias)(input_tensor)shortcut = BatchNorm(name=bn_name_base + '1')(shortcut, training=train_flag)x = KL.Add()([x, shortcut])x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)return xdef identity_block(input_tensor, kernel_size, filters, stage, block,use_bias=True, train_flag=True):"""The identity_block is the block that has no conv layer at shortcut:param input_tensor: input tensor:param kernel_size: default 3, the kernel size of middle conv layer at main path:param filters: list of integers, the nb_filters of 3 conv layer at main path:param stage: nteger, current stage label, used for generating layer names:param block: 'a','b'..., current block label, used for generating layer names:param use_bias: Boolean. To use or not use a bias in conv layers.:param train_flag: Boolean. Train or freeze Batch Norm layers:return:"""nb_filter1, nb_filter2, nb_filter3 = filtersconv_name_base = 'res' + str(stage) + block + '_branch'bn_name_base = 'bn' + str(stage) + block + '_branch'x = KL.Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a',use_bias=use_bias)(input_tensor)x = BatchNorm(name=bn_name_base + '2a')(x, training=train_flag)x = KL.Activation('relu')(x)x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',name=conv_name_base + '2b', use_bias=use_bias)(x)x = BatchNorm(name=bn_name_base + '2b')(x, training=train_flag)x = KL.Activation('relu')(x)x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c',use_bias=use_bias)(x)x = BatchNorm(name=bn_name_base + '2c')(x, training=train_flag)x = KL.Add()([x, input_tensor])x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)return xdef build_fpn_mask_graph(rois, feature_maps, image_meta,pool_size, class_num, train_flag=True):"""Builds the computation graph of the mask head of Feature Pyramid Network.:param rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized coordinates.:param feature_maps: List of feature maps from different layers of the pyramid,[P2, P3, P4, P5]. Each has a different resolution.:param image_meta: [batch, (meta data)] Image details. See compose_image_meta():param pool_size: The width of the square feature map generated from ROI Pooling.:param class_num: number of classes, which determines the depth of the results:param train_flag: Boolean. Train or freeze Batch Norm layers:return: Masks [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, NUM_CLASSES]"""# ROI Pooling# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]x = PyramidROIAlign([pool_size, pool_size], name="roi_align_mask")([rois, image_meta] + feature_maps)# Conv layersx = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv1")(x)x = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn1')(x, training=train_flag)x = KL.Activation('relu')(x)x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv2")(x)x = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn2')(x, training=train_flag)x = KL.Activation('relu')(x)x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv3")(x)x = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn3')(x, training=train_flag)x = KL.Activation('relu')(x)x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv4")(x)x = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn4')(x, training=train_flag)x = KL.Activation('relu')(x)x = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation="relu"), name="mrcnn_mask_deconv")(x)x = KL.TimeDistributed(KL.Conv2D(class_num, (1, 1), strides=1, activation="sigmoid"), name="mrcnn_mask")(x)return xdef rpn_graph(feature_map, anchors_per_location, rpn_anchor_stride):"""Builds the computation graph of Region Proposal Network.:param feature_map: backbone features [batch, height, width, depth]:param anchors_per_location: number of anchors per pixel in the feature map:param rpn_anchor_stride: Controls the density of anchors.Typically 1 (anchors for every pixel in the feature map),or 2 (every other pixel).:return:rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax)rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities.rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))] Deltas to beapplied to anchors."""# TODO: check if stride of 2 causes alignment issues if the feature map# is not even.# Shared convolutional base of the RPNshared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',strides=rpn_anchor_stride, name='rpn_conv_shared')(feature_map)# Anchor Score. [batch, height, width, anchors per location * 2].x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',activation='linear', name='rpn_class_raw')(shared)# Reshape to [batch, anchors, 2]rpn_class_logits = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)# Softmax on last dimension of BG/FG.rpn_probs = KL.Activation("softmax", name="rpn_class_xxx")(rpn_class_logits)# Bounding box refinement. [batch, H, W, anchors per location * depth]# where depth is [x, y, log(w), log(h)]x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",activation='linear', name='rpn_bbox_pred')(shared)# Reshape to [batch, anchors, 4]rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x)return [rpn_class_logits, rpn_probs, rpn_bbox]passdef build_rpn_model(rpn_anchor_stride, anchors_per_location, depth):"""Builds a Keras model of the Region Proposal Network.It wraps the RPN graph so it can be used multiple times with shared weights.:param rpn_anchor_stride: Controls the density of anchors.Typically 1 (anchors for every pixel in the feature map),or 2 (every other pixel).:param anchors_per_location: number of anchors per pixel in the feature map:param depth: Depth of the backbone feature map.:return: Returns a Keras Model object. The model outputs, when called, are:rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax)rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities.rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh),log(dw))] Deltas to be applied to anchors."""input_feature_map = KL.Input(shape=[None, None, depth], name="input_rpn_feature_map")outputs = rpn_graph(input_feature_map, anchors_per_location, rpn_anchor_stride)return KM.Model([input_feature_map], outputs, name="rpn_model")passdef fpn_classifier_graph(rois, feature_maps, image_meta, pool_size,class_num, train_flag=True, fc_layers_size=1024):"""Builds the computation graph of the feature pyramid network classifierand regressor heads.:param rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized coordinates.:param feature_maps: List of feature maps from different layers of the pyramid,[P2, P3, P4, P5]. Each has a different resolution.:param image_meta: [batch, (meta data)] Image details. See compose_image_meta():param class_num: number of classes, which determines the depth of the results:param pool_size: The width of the square feature map generated from ROI Pooling.:param train_flag: Boolean. Train or freeze Batch Norm layers:param fc_layers_size: Size of the 2 FC layers:return:logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax)probs: [batch, num_rois, NUM_CLASSES] classifier probabilitiesbbox_deltas: [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] Deltas to apply toproposal boxes"""# ROI Pooling# Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels]x = PyramidROIAlign([pool_size, pool_size],name="roi_align_classifier")([rois, image_meta] + feature_maps)# Two 1024 FC layers (implemented with Conv2D for consistency)x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"),name="mrcnn_class_conv1")(x)x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1')(x, training=train_flag)x = KL.Activation('relu')(x)x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (1, 1)),name="mrcnn_class_conv2")(x)x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn2')(x, training=train_flag)x = KL.Activation('relu')(x)shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),name="pool_squeeze")(x)# Classifier headmrcnn_class_logits = KL.TimeDistributed(KL.Dense(class_num),name='mrcnn_class_logits')(shared)mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"),name="mrcnn_class")(mrcnn_class_logits)# BBox head# [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))]x = KL.TimeDistributed(KL.Dense(class_num * 4, activation='linear'),name='mrcnn_bbox_fc')(shared)# Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]s = K.int_shape(x)mrcnn_bbox = KL.Reshape((s[1], class_num, 4), name="mrcnn_bbox")(x)return mrcnn_class_logits, mrcnn_probs, mrcnn_bboxpassdef build_rpn_targets(anchors, gt_class_ids, gt_boxes):"""Given the anchors and GT boxes, compute overlaps and identify positiveanchors and deltas to refine them to match their corresponding GT boxes.:param anchors: [num_anchors, (y1, x1, y2, x2)]:param gt_class_ids: [num_gt_boxes] Integer class IDs.:param gt_boxes: [num_gt_boxes, (y1, x1, y2, x2)]:return:rpn_match: [N] (int32) matches between anchors and GT boxes.1 = positive anchor, -1 = negative anchor, 0 = neutralrpn_bbox: [N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas."""# RPN Match: 1 = positive anchor, -1 = negative anchor, 0 = neutralrpn_match = np.zeros([anchors.shape[0]], dtype=np.int32)anchor_per_image = cfg.TRAIN.ANCHORS_PER_IMAGE# RPN bounding boxes: [max anchors per image, (dy, dx, log(dh), log(dw))]rpn_bbox = np.zeros((anchor_per_image, 4))bbox_util = BboxUtil()# Handle COCO crowds# A crowd box in COCO is a bounding box around several instances. Exclude# them from training. A crowd box is given a negative class ID.crowd_ix = np.where(gt_class_ids < 0)[0]if crowd_ix.shape[0] > 0:# Filter out crowds from ground truth class IDs and boxesnon_crowd_ix = np.where(gt_class_ids > 0)[0]crowd_boxes = gt_boxes[crowd_ix]gt_class_ids = gt_class_ids[non_crowd_ix]gt_boxes = gt_boxes[non_crowd_ix]# Compute overlaps with crowd boxes [anchors, crowds]crowd_overlaps = bbox_util.compute_overlaps(anchors, crowd_boxes)crowd_iou_max = np.amax(crowd_overlaps, axis=1)no_crowd_bool = (crowd_iou_max < 0.001)passelse:# All anchors don't intersect a crowdno_crowd_bool = np.ones([anchors.shape[0]], dtype=bool)pass# Compute overlaps [num_anchors, num_gt_boxes]overlaps = bbox_util.compute_overlaps(anchors, gt_boxes)# Match anchors to GT Boxes# If an anchor overlaps a GT box with IoU >= 0.7 then it's positive.# If an anchor overlaps a GT box with IoU < 0.3 then it's negative.# Neutral anchors are those that don't match the conditions above,# and they don't influence the loss function.# However, don't keep any GT box unmatched (rare, but happens). Instead,# match it to the closest anchor (even if its max IoU is < 0.3).## 1. Set negative anchors first. They get overwritten below if a GT box is# matched to them. Skip boxes in crowd areas.anchor_iou_argmax = np.argmax(overlaps, axis=1)anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]rpn_match[(anchor_iou_max < 0.3) & (no_crowd_bool)] = -1# 2. Set an anchor for each GT box (regardless of IoU value).# If multiple anchors have the same IoU match all of themgt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:, 0]rpn_match[gt_iou_argmax] = 1# 3. Set anchors with high overlap as positive.rpn_match[anchor_iou_max >= 0.7] = 1# Subsample to balance positive and negative anchors# Don't let positives be more than half the anchorsids = np.where(rpn_match == 1)[0]extra = len(ids) - (anchor_per_image // 2)if extra > 0:# Reset the extra ones to neutralids = np.random.choice(ids, extra, replace=False)rpn_match[ids] = 0# Same for negative proposalsids = np.where(rpn_match == -1)[0]extra = len(ids) - (anchor_per_image - np.sum(rpn_match == 1))if extra > 0:# Rest the extra ones to neutralids = np.random.choice(ids, extra, replace=False)rpn_match[ids] = 0pass# For positive anchors, compute shift and scale needed to transform them# to match the corresponding GT boxes.ids = np.where(rpn_match == 1)[0]ix = 0  # index into rpn_bbox# TODO: use box_refinement() rather than duplicating the code herefor i, a in zip(ids, anchors[ids]):# Closest gt box (it might have IoU < 0.7)gt = gt_boxes[anchor_iou_argmax[i]]# Convert coordinates to center plus width/height.# GT Boxgt_h = gt[2] - gt[0]gt_w = gt[3] - gt[1]gt_center_y = gt[0] + 0.5 * gt_hgt_center_x = gt[1] + 0.5 * gt_w# Anchora_h = a[2] - a[0]a_w = a[3] - a[1]a_center_y = a[0] + 0.5 * a_ha_center_x = a[1] + 0.5 * a_w# Compute the bbox refinement that the RPN should predict.rpn_bbox[ix] = [(gt_center_y - a_center_y) / a_h,(gt_center_x - a_center_x) / a_w,np.log(gt_h / a_h),np.log(gt_w / a_w),]# Normalizerpn_bbox_std_dev = np.array(cfg.COMMON.RPN_BBOX_STD_DEV)rpn_bbox[ix] /= rpn_bbox_std_devix += 1return rpn_match, rpn_bboxpassdef rpn_class_loss_graph(rpn_match, rpn_class_logits):"""RPN anchor classifier loss.:param rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,-1=negative, 0=neutral anchor.:param rpn_class_logits: [batch, anchors, 2]. RPN classifier logits for BG/FG.:return:"""# Squeeze last dim to simplifyrpn_match = tf.squeeze(rpn_match, -1)# Get anchor classes. Convert the -1/+1 match to 0/1 values.anchor_class = K.cast(K.equal(rpn_match, 1), tf.int32)# Positive and Negative anchors contribute to the loss,# but neutral anchors (match value = 0) don't.indices = tf.where(K.not_equal(rpn_match, 0))# Pick rows that contribute to the loss and filter out the rest.rpn_class_logits = tf.gather_nd(rpn_class_logits, indices)anchor_class = tf.gather_nd(anchor_class, indices)# Cross entropy lossloss = K.sparse_categorical_crossentropy(target=anchor_class,output=rpn_class_logits,from_logits=True)loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))return lossdef batch_pack_graph(x, counts, num_rows):"""Picks different number of values from each row in x depending on the values in counts.:param x::param counts::param num_rows::return:"""outputs = []for i in range(num_rows):outputs.append(x[i, :counts[i]])return tf.concat(outputs, axis=0)def smooth_l1_loss(y_true, y_pred):"""Implements Smooth-L1 loss. y_true and y_pred are typically: [N, 4], but could be any shape.:param y_true::param y_pred::return:"""diff = K.abs(y_true - y_pred)less_than_one = K.cast(K.less(diff, 1.0), "float32")loss = (less_than_one * 0.5 * diff ** 2) + (1 - less_than_one) * (diff - 0.5)return lossdef rpn_bbox_loss_graph(batch_size, target_bbox, rpn_match, rpn_bbox):""":param batch_size::param target_bbox: [batch, max positive anchors, (dy, dx, log(dh), log(dw))].Uses 0 padding to fill in unsed bbox deltas.:param rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,-1=negative, 0=neutral anchor.:param rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]:return: Return the RPN bounding box loss graph."""# Positive anchors contribute to the loss, but negative and# neutral anchors (match value of 0 or -1) don't.rpn_match = K.squeeze(rpn_match, -1)indices = tf.where(K.equal(rpn_match, 1))# Pick bbox deltas that contribute to the lossrpn_bbox = tf.gather_nd(rpn_bbox, indices)# Trim target bounding box deltas to the same length as rpn_bbox.batch_counts = K.sum(K.cast(K.equal(rpn_match, 1), tf.int32), axis=1)target_bbox = batch_pack_graph(target_bbox, batch_counts, batch_size)loss = smooth_l1_loss(target_bbox, rpn_bbox)loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))return lossdef mrcnn_class_loss_graph(target_class_ids, pred_class_logits, active_class_ids):"""Loss for the classifier head of Mask RCNN.:param target_class_ids: [batch, num_rois]. Integer class IDs. Uses zeropadding to fill in the array.:param pred_class_logits: [batch, num_rois, num_classes]:param active_class_ids: [batch, num_classes]. Has a value of 1 forclasses that are in the dataset of the image, and 0for classes that are not in the dataset.:return:"""# During model building, Keras calls this function with# target_class_ids of type float32. Unclear why. Cast it# to int to get around it.target_class_ids = tf.cast(target_class_ids, 'int64')# Find predictions of classes that are not in the dataset.pred_class_ids = tf.argmax(pred_class_logits, axis=2)# TODO: Update this line to work with batch > 1. Right now it assumes all#       images in a batch have the same active_class_idspred_active = tf.gather(active_class_ids[0], pred_class_ids)# Lossloss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=target_class_ids, logits=pred_class_logits)# Erase losses of predictions of classes that are not in the active# classes of the image.loss = loss * pred_active# Computer loss mean. Use only predictions that contribute# to the loss to get a correct mean.loss = tf.reduce_sum(loss) / tf.reduce_sum(pred_active)return lossdef mrcnn_bbox_loss_graph(target_bbox, target_class_ids, pred_bbox):"""Loss for Mask R-CNN bounding box refinement.:param target_bbox: [batch, num_rois, (dy, dx, log(dh), log(dw))]:param target_class_ids: [batch, num_rois]. Integer class IDs.:param pred_bbox: [batch, num_rois, num_classes, (dy, dx, log(dh), log(dw))]:return:"""# Reshape to merge batch and roi dimensions for simplicity.target_class_ids = K.reshape(target_class_ids, (-1,))target_bbox = K.reshape(target_bbox, (-1, 4))pred_bbox = K.reshape(pred_bbox, (-1, K.int_shape(pred_bbox)[2], 4))# Only positive ROIs contribute to the loss. And only# the right class_id of each ROI. Get their indices.positive_roi_ix = tf.where(target_class_ids > 0)[:, 0]positive_roi_class_ids = tf.cast(tf.gather(target_class_ids, positive_roi_ix), tf.int64)indices = tf.stack([positive_roi_ix, positive_roi_class_ids], axis=1)# Gather the deltas (predicted and true) that contribute to losstarget_bbox = tf.gather(target_bbox, positive_roi_ix)pred_bbox = tf.gather_nd(pred_bbox, indices)# Smooth-L1 Lossloss = K.switch(tf.size(target_bbox) > 0,smooth_l1_loss(y_true=target_bbox, y_pred=pred_bbox),tf.constant(0.0))loss = K.mean(loss)return lossdef mrcnn_mask_loss_graph(target_masks, target_class_ids, pred_masks):"""Mask binary cross-entropy loss for the masks head.:param target_masks: [batch, num_rois, height, width].A float32 tensor of values 0 or 1. Uses zero padding to fill array.:param target_class_ids: [batch, num_rois]. Integer class IDs. Zero padded.:param pred_masks: [batch, proposals, height, width, num_classes] float32 tensorwith values from 0 to 1.:return:"""# Reshape for simplicity. Merge first two dimensions into one.target_class_ids = K.reshape(target_class_ids, (-1,))mask_shape = tf.shape(target_masks)target_masks = K.reshape(target_masks, (-1, mask_shape[2], mask_shape[3]))pred_shape = tf.shape(pred_masks)pred_masks = K.reshape(pred_masks,(-1, pred_shape[2], pred_shape[3], pred_shape[4]))# Permute predicted masks to [N, num_classes, height, width]pred_masks = tf.transpose(pred_masks, [0, 3, 1, 2])# Only positive ROIs contribute to the loss. And only# the class specific mask of each ROI.positive_ix = tf.where(target_class_ids > 0)[:, 0]positive_class_ids = tf.cast(tf.gather(target_class_ids, positive_ix), tf.int64)indices = tf.stack([positive_ix, positive_class_ids], axis=1)# Gather the masks (predicted and true) that contribute to lossy_true = tf.gather(target_masks, positive_ix)y_pred = tf.gather_nd(pred_masks, indices)# Compute binary cross entropy. If no positive ROIs, then return 0.# shape: [batch, roi, num_classes]loss = K.switch(tf.size(y_true) > 0,K.binary_crossentropy(target=y_true, output=y_pred),tf.constant(0.0))loss = K.mean(loss)return lossdef refine_detections_graph(rois, probs, deltas, window):"""Refine classified proposals and filter overlaps and return final detections.:param rois: [N, (y1, x1, y2, x2)] in normalized coordinates:param probs: [N, num_classes]. Class probabilities.:param deltas: [N, num_classes, (dy, dx, log(dh), log(dw))]. Class-specific bounding box deltas.:param window: (y1, x1, y2, x2) in normalized coordinates.The part of the image that contains the image excluding the padding.:return: Returns detections shaped:[num_detections, (y1, x1, y2, x2, class_id, score)] where coordinates are normalized."""# Class IDs per ROIclass_ids = tf.argmax(probs, axis=1, output_type=tf.int32)# Class probability of the top class of each ROIindices = tf.stack([tf.range(probs.shape[0]), class_ids], axis=1)class_scores = tf.gather_nd(probs, indices)# Class-specific bounding box deltasdeltas_specific = tf.gather_nd(deltas, indices)# Apply bounding box deltas# Shape: [boxes, (y1, x1, y2, x2)] in normalized coordinatesbbox_utils = BboxUtil()refined_rois = bbox_utils.apply_box_deltas_graph(rois, deltas_specific * cfg.COMMON.BBOX_STD_DEV)# Clip boxes to image windowrefined_rois = bbox_utils.clip_boxes_graph(refined_rois, window)# TODO: Filter out boxes with zero area# Filter out background boxeskeep = tf.where(class_ids > 0)[:, 0]# Filter out low confidence boxesdefection_min_confidence = cfg.COMMON.DETECTION_MIN_CONFIDENCEif defection_min_confidence:conf_keep = tf.where(class_scores >= defection_min_confidence)[:, 0]keep = tf.sets.set_intersection(tf.expand_dims(keep, 0), tf.expand_dims(conf_keep, 0))keep = tf.sparse_tensor_to_dense(keep)[0]# Apply per-class NMS# 1. Prepare variablespre_nms_class_ids = tf.gather(class_ids, keep)pre_nms_scores = tf.gather(class_scores, keep)pre_nms_rois = tf.gather(refined_rois, keep)unique_pre_nms_class_ids = tf.unique(pre_nms_class_ids)[0]def nms_keep_map(class_id):"""Apply Non-Maximum Suppression on ROIs of the given class."""defection_max_instances = cfg.TEST.DETECTION_MAX_INSTANCES# Indices of ROIs of the given classixs = tf.where(tf.equal(pre_nms_class_ids, class_id))[:, 0]# Apply NMSclass_keep = tf.image.non_max_suppression(tf.gather(pre_nms_rois, ixs),tf.gather(pre_nms_scores, ixs),max_output_size=defection_max_instances,iou_threshold=cfg.TEST.DETECTION_NMS_THRESHOLD)# Map indicesclass_keep = tf.gather(keep, tf.gather(ixs, class_keep))# Pad with -1 so returned tensors have the same shapegap = defection_max_instances - tf.shape(class_keep)[0]class_keep = tf.pad(class_keep, [(0, gap)],mode='CONSTANT', constant_values=-1)# Set shape so map_fn() can infer result shapeclass_keep.set_shape([defection_max_instances])return class_keep# 2. Map over class IDsnms_keep = tf.map_fn(nms_keep_map, unique_pre_nms_class_ids,dtype=tf.int64)# 3. Merge results into one list, and remove -1 paddingnms_keep = tf.reshape(nms_keep, [-1])nms_keep = tf.gather(nms_keep, tf.where(nms_keep > -1)[:, 0])# 4. Compute intersection between keep and nms_keepkeep = tf.sets.set_intersection(tf.expand_dims(keep, 0),tf.expand_dims(nms_keep, 0))keep = tf.sparse_tensor_to_dense(keep)[0]# Keep top detectionsroi_count = cfg.TEST.DETECTION_MAX_INSTANCESclass_scores_keep = tf.gather(class_scores, keep)num_keep = tf.minimum(tf.shape(class_scores_keep)[0], roi_count)top_ids = tf.nn.top_k(class_scores_keep, k=num_keep, sorted=True)[1]keep = tf.gather(keep, top_ids)# Arrange output as [N, (y1, x1, y2, x2, class_id, score)]# Coordinates are normalized.detections = tf.concat([tf.gather(refined_rois, keep),tf.to_float(tf.gather(class_ids, keep))[..., tf.newaxis],tf.gather(class_scores, keep)[..., tf.newaxis]], axis=1)# Pad with zeros if detections < DETECTION_MAX_INSTANCESgap = cfg.TEST.DETECTION_MAX_INSTANCES - tf.shape(detections)[0]detections = tf.pad(detections, [(0, gap), (0, 0)], "CONSTANT")return detectionsclass BatchNorm(KL.BatchNormalization):"""Extends the Keras BatchNormalization class to allow a central placeto make changes if needed.Batch normalization has a negative effect on training if batches are smallso this layer is often frozen (via setting in Config class) and functionsas linear layer."""def call(self, inputs, training=None):"""Note about training values:None: Train BN layers. This is the normal modeFalse: Freeze BN layers. Good when batch size is smallTrue: (don't use). Set layer in training mode even when making inferences"""return super(self.__class__, self).call(inputs, training=training)class ProposalLayer(KE.Layer):"""Receives anchor scores and selects a subset to pass as proposalsto the second stage. Filtering is done based on anchor scores andnon-max suppression to remove overlaps. It also applies boundingbox refinement deltas to anchors.Inputs:rpn_probs: [batch, num_anchors, (bg prob, fg prob)]rpn_bbox: [batch, num_anchors, (dy, dx, log(dh), log(dw))]anchors: [batch, num_anchors, (y1, x1, y2, x2)] anchors in normalized coordinatesReturns:Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]"""def __init__(self, proposal_count, nms_threshold, batch_size, **kwargs):super(ProposalLayer, self).__init__(**kwargs)self.proposal_count = proposal_countself.nms_threshold = nms_thresholdself.batch_size = batch_sizeself.misc_utils = MiscUtils()self.bbox_utils = BboxUtil()passdef call(self, inputs):"""这里的 call 方法，会被 __init__() 方法回调:param inputs::return:"""# Box Scores. Use the foreground class confidence. [Batch, num_rois, 1]scores = inputs[0][:, :, 1]# Box deltas [batch, num_rois, 4]deltas = inputs[1]rpn_bbox_std_dev = np.array(cfg.COMMON.RPN_BBOX_STD_DEV)deltas = deltas * np.reshape(rpn_bbox_std_dev, [1, 1, 4])# Anchorsanchors = inputs[2]# Improve performance by trimming to top anchors by score# and doing the rest on the smaller subset.pre_nms_limit = tf.minimum(cfg.COMMON.PRE_NMS_LIMIT, tf.shape(anchors)[1])ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True, name="top_anchors").indicesscores = self.misc_utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),self.batch_size)deltas = self.misc_utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),self.batch_size)pre_nms_anchors = self.misc_utils.batch_slice([anchors, ix],lambda a, x: tf.gather(a, x),self.batch_size,names=["pre_nms_anchors"])# Apply deltas to anchors to get refined anchors.# [batch, N, (y1, x1, y2, x2)]boxes = self.misc_utils.batch_slice([pre_nms_anchors, deltas],lambda x, y: self.bbox_utils.apply_box_deltas_graph(x, y),self.batch_size,names=["refined_anchors"])# Clip to image boundaries. Since we're in normalized coordinates,# clip to 0..1 range. [batch, N, (y1, x1, y2, x2)]window = np.array([0, 0, 1, 1], dtype=np.float32)boxes = self.misc_utils.batch_slice(boxes,lambda x: self.bbox_utils.clip_boxes_graph(x, window),self.batch_size,names=["refined_anchors_clipped"])# Filter out small boxes# According to Xinlei Chen's paper, this reduces detection accuracy# for small objects, so we're skipping it.# Non-max suppressiondef nms(boxes, scores):indices = tf.image.non_max_suppression(boxes, scores, self.proposal_count,self.nms_threshold, name="rpn_non_max_suppression")proposals = tf.gather(boxes, indices)# Pad if neededpadding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)proposals = tf.pad(proposals, [(0, padding), (0, 0)])return proposalsproposals = self.misc_utils.batch_slice([boxes, scores], nms, self.batch_size)return proposalsdef compute_output_shape(self, input_shape):return (None, self.proposal_count, 4)class DetectionTargetLayer(KE.Layer):"""Subsamples proposals and generates target box refinement, class_ids, and masks for each.Inputs:proposals: [batch, N, (y1, x1, y2, x2)] in normalized coordinates. Mightbe zero padded if there are not enough proposals.gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs.gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalizedcoordinates.gt_masks: [batch, height, width, MAX_GT_INSTANCES] of boolean typeReturns: Target ROIs and corresponding class IDs, bounding box shifts, and masks.rois: [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalizedcoordinatestarget_class_ids: [batch, TRAIN_ROIS_PER_IMAGE]. Integer class IDs.target_deltas: [batch, TRAIN_ROIS_PER_IMAGE, (dy, dx, log(dh), log(dw)]target_mask: [batch, TRAIN_ROIS_PER_IMAGE, height, width]Masks cropped to bbox boundaries and resized to neuralnetwork output size.Note: Returned arrays might be zero padded if not enough target ROIs."""def __init__(self, batch_size, **kwargs):super(DetectionTargetLayer, self).__init__(**kwargs)self.batch_size = batch_sizeself.misc_utils = MiscUtils()self.rois_per_image = cfg.TRAIN.ROIS_PER_IMAGEself.mask_shape = cfg.TRAIN.MASK_SHAPEpassdef call(self, inputs):"""这里的 call 方法，会被 __init__() 方法回调:param inputs: 参数如下所示:return:"""proposals = inputs[0]gt_class_ids = inputs[1]gt_boxes = inputs[2]gt_masks = inputs[3]# Slice the batch and run a graph for each slice# TODO: Rename target_bbox to target_deltas for claritynames = ["rois", "target_class_ids", "target_bbox", "target_mask"]outputs = self.misc_utils.batch_slice([proposals, gt_class_ids, gt_boxes, gt_masks],lambda w, x, y, z: self.misc_utils.detection_targets_graph(w, x, y, z),self.batch_size, names=names)return outputsdef compute_output_shape(self, input_shape):return [(None, self.rois_per_image, 4),  # rois(None, self.rois_per_image),  # class_ids(None, self.rois_per_image, 4),  # deltas(None, self.rois_per_image, self.mask_shape[0], self.mask_shape[1])  # masks]def compute_mask(self, inputs, mask=None):return [None, None, None, None]class PyramidROIAlign(KE.Layer):"""Implements ROI Pooling on multiple levels of the feature pyramid.Inputs:- boxes: [batch, num_boxes, (y1, x1, y2, x2)] in normalizedcoordinates. Possibly padded with zeros if not enoughboxes to fill the array.- image_meta: [batch, (meta data)] Image details. See compose_image_meta()- feature_maps: List of feature maps from different levels of the pyramid.Each is [batch, height, width, channels]Output:Pooled regions in the shape: [batch, num_boxes, pool_height, pool_width, channels].The width and height are those specific in the pool_shape in the layerconstructor."""def __init__(self, pool_shape, **kwargs):super(PyramidROIAlign, self).__init__(**kwargs)self.pool_shape = tuple(pool_shape)self.image_utils = ImageUtils()def call(self, inputs):# Crop boxes [batch, num_boxes, (y1, x1, y2, x2)] in normalized coordsboxes = inputs[0]# Image meta# Holds details about the image. See compose_image_meta()image_meta = inputs[1]# Feature Maps. List of feature maps from different level of the# feature pyramid. Each is [batch, height, width, channels]feature_maps = inputs[2:]# Assign each ROI to a level in the pyramid based on the ROI area.y1, x1, y2, x2 = tf.split(boxes, 4, axis=2)h = y2 - y1w = x2 - x1# Use shape of first image. Images in a batch must have the same size.image_shape = self.image_utils.parse_image_meta_graph(image_meta)['image_shape'][0]# Equation 1 in the Feature Pyramid Networks paper. Account for# the fact that our coordinates are normalized here.# e.g. a 224x224 ROI (in pixels) maps to P4image_area = tf.cast(image_shape[0] * image_shape[1], tf.float32)roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))roi_level = tf.minimum(5, tf.maximum(2, 4 + tf.cast(tf.round(roi_level), tf.int32)))roi_level = tf.squeeze(roi_level, 2)# Loop through levels and apply ROI pooling to each. P2 to P5.pooled = []box_to_level = []for i, level in enumerate(range(2, 6)):ix = tf.where(tf.equal(roi_level, level))level_boxes = tf.gather_nd(boxes, ix)# Box indices for crop_and_resize.box_indices = tf.cast(ix[:, 0], tf.int32)# Keep track of which box is mapped to which levelbox_to_level.append(ix)# Stop gradient propogation to ROI proposalslevel_boxes = tf.stop_gradient(level_boxes)box_indices = tf.stop_gradient(box_indices)# Crop and Resize# From Mask R-CNN paper: "We sample four regular locations, so# that we can evaluate either max or average pooling. In fact,# interpolating only a single value at each bin center (without# pooling) is nearly as effective."## Here we use the simplified approach of a single value per bin,# which is how it's done in tf.crop_and_resize()# Result: [batch * num_boxes, pool_height, pool_width, channels]pooled.append(tf.image.crop_and_resize(feature_maps[i], level_boxes, box_indices, self.pool_shape,method="bilinear"))# Pack pooled features into one tensorpooled = tf.concat(pooled, axis=0)# Pack box_to_level mapping into one array and add another# column representing the order of pooled boxesbox_to_level = tf.concat(box_to_level, axis=0)box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1)box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range],axis=1)# Rearrange pooled features to match the order of the original boxes# Sort box_to_level by batch then box index# TF doesn't have a way to sort by two columns, so merge them and sort.sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1]ix = tf.nn.top_k(sorting_tensor, k=tf.shape(box_to_level)[0]).indices[::-1]ix = tf.gather(box_to_level[:, 2], ix)pooled = tf.gather(pooled, ix)# Re-add the batch dimensionshape = tf.concat([tf.shape(boxes)[:2], tf.shape(pooled)[1:]], axis=0)pooled = tf.reshape(pooled, shape)return pooleddef compute_output_shape(self, input_shape):return input_shape[0][:2] + self.pool_shape + (input_shape[2][-1],)class DetectionLayer(KE.Layer):"""Takes classified proposal boxes and their bounding box deltas andreturns the final detection boxes.Returns:[batch, num_detections, (y1, x1, y2, x2, class_id, class_score)] wherecoordinates are normalized."""def __init__(self, batch_size, **kwargs):super(DetectionLayer, self).__init__(**kwargs)self.batch_size = batch_sizeself.detection_max_instances = cfg.TEST.DETECTION_MAX_INSTANCESself.image_utils = ImageUtils()self.bbox_utils = BboxUtil()self.misc_utils = MiscUtils()def call(self, inputs):rois = inputs[0]mrcnn_class = inputs[1]mrcnn_bbox = inputs[2]image_meta = inputs[3]# Get windows of images in normalized coordinates. Windows are the area# in the image that excludes the padding.# Use the shape of the first image in the batch to normalize the window# because we know that all images get resized to the same size.m = self.image_utils.parse_image_meta_graph(image_meta)image_shape = m['image_shape'][0]window = self.bbox_utils.norm_boxes_graph(m['window'], image_shape[:2])# Run detection refinement graph on each item in the batchdetections_batch = self.misc_utils.batch_slice([rois, mrcnn_class, mrcnn_bbox, window],lambda x, y, w, z: refine_detections_graph(x, y, w, z),self.batch_size)# Reshape output# [batch, num_detections, (y1, x1, y2, x2, class_id, class_score)] in# normalized coordinatesreturn tf.reshape(detections_batch, [self.batch_size, self.detection_max_instances, 6])def compute_output_shape(self, input_shape):return (None, self.detection_max_instances, 6)

3. 背骨网络

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/05/12 15:40
# @Author   : WanDaoYi
# @FileName : backbone.py
# ============================================import keras.layers as kl
from m_rcnn import common
from config import cfgdef resnet_graph(input_image, architecture, stage5=False):"""resNet 背骨图，没什么好说的，数好参数就好了。:param input_image: input image info:param architecture: Can be resNet50 or resNet101:param stage5: Boolean. If False, stage5 of the network is not created:return: [c1, c2, c3, c4, c5]"""train_flag = cfg.COMMON.TRAIN_FLAGassert architecture in ["resNet50", "resNet101"]# Stage 1x = kl.ZeroPadding2D((3, 3))(input_image)x = kl.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)x = common.BatchNorm(name='bn_conv1')(x, training=train_flag)x = kl.Activation('relu')(x)c1 = x = kl.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)# Stage 2x = common.conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_flag=train_flag)x = common.identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_flag=train_flag)c2 = x = common.identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_flag=train_flag)# Stage 3x = common.conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_flag=train_flag)x = common.identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_flag=train_flag)x = common.identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_flag=train_flag)c3 = x = common.identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_flag=train_flag)# Stage 4x = common.conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_flag=train_flag)block_count = {"resNet50": 5, "resNet101": 22}[architecture]for i in range(block_count):x = common.identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_flag=train_flag)c4 = x# Stage 5if stage5:x = common.conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_flag=train_flag)x = common.identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_flag=train_flag)c5 = common.identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_flag=train_flag)else:c5 = Nonereturn [c1, c2, c3, c4, c5]

4. 将 coco 数据处理成网络模型数据

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/05/11 20:49
# @Author   : WanDaoYi
# @FileName : coco_dataset.py
# ============================================from datetime import datetime
import os
import json
import itertools
import numpy as np
from collections import defaultdict
from config import cfgdef is_array_like(obj):return hasattr(obj, '__iter__') and hasattr(obj, '__len__')class CocoDataset(object):def __init__(self, annotation_path, image_file_path):""":param annotation_path: annotation json path, as ./instances_train2014.json:param image_file_path: image file path, as ./train2014"""# file pathself.annotation_path = annotation_pathself.image_file_path = image_file_path# dataset infoself.dataset = self.read_coco_json_data()# class infoself.categories_dict = self.categories_info()# image infoself.image_dict = self.images_info()# annotations info, image to annotations info, class to image infoself.annotations_dict, self.image_2_annotations, self.categories_2_image = self.annotations_info()self.image_info_list = []# Background is always the first classself.class_info_list = cfg.COMMON.DEFAULT_CLASS_INFO# 数据处理self.deal_data()# 类别数量self.class_num = len(self.class_info_list)# 类别 id listself.class_ids_list = np.arange(self.class_num)# 类别名字self.class_names_list = [self.clean_name(c["name"]) for c in self.class_info_list]# 图像数量self.images_num = len(self.image_info_list)# 图像 id listself._image_ids_list = np.arange(self.images_num)# Map sources to class_ids they supportself.sources = list(set([i['source'] for i in self.class_info_list]))self.source_class_ids = self.get_source_class_ids()# Mapping from source class and image IDs to internal IDsself.class_from_source_map = {"{}.{}".format(info['source'], info['id']): class_idfor info, class_id in zip(self.class_info_list, self.class_ids_list)}self.image_from_source_map = {"{}.{}".format(info['source'], info['id']): image_idfor info, image_id in zip(self.image_info_list, self.image_ids_list)}pass@propertydef image_ids_list(self):return self._image_ids_listdef get_source_class_ids(self):source_class_ids = {}# Loop over datasetfor source in self.sources:source_class_ids[source] = []# Find classes that belong to this datasetfor i, info in enumerate(self.class_info_list):# Include BG class in all datasetif i == 0 or source == info['source']:source_class_ids[source].append(i)passpasspassreturn source_class_idspass# load coco datadef read_coco_json_data(self):# str to jsonprint("json load.....")data_json = json.load(open(self.annotation_path, encoding="utf-8"))assert type(data_json) == dict, 'annotation file format {} not supported'.format(type(data_json))# json_key_list = [key_name for key_name in data_json]# print(json_key_list)return data_json# deal class infodef categories_info(self):categories_dict = dict()if "categories" in self.dataset:print("categories info...")categories_info = self.dataset["categories"]for categories in categories_info:categories_dict[categories["id"]] = categoriespass# categories_ids = [categories['id'] for categories in categories_info]# print(categories_ids)passreturn categories_dict# deal image infodef images_info(self):image_dict = dict()if "images" in self.dataset:print("images info...")image_info_list = self.dataset["images"]for image_info in image_info_list:image_dict[image_info["id"]] = image_infopasspassreturn image_dict# deal annotation info and image to annotation, class to imagedef annotations_info(self):annotations_dict = dict()image_2_annotations = defaultdict(list)categories_2_image = defaultdict(list)if "annotations" in self.dataset:print("annotations info...")annotations_list = self.dataset["annotations"]for annotations in annotations_list:annotations_dict[annotations["id"]] = annotationsimage_2_annotations[annotations["image_id"]].append(annotations)if "categories" in self.dataset:categories_2_image[annotations["category_id"]].append(annotations["image_id"])passpasspassreturn annotations_dict, image_2_annotations, categories_2_image# image ids listdef get_image_ids(self, image_ids=[], class_ids=[]):if len(image_ids) == 0 and len(class_ids) == 0:ids = self.image_dict.keys()else:ids = set(image_ids)for i, class_id in enumerate(class_ids):if i == 0 and len(ids) == 0:ids = set(self.categories_2_image[class_id])else:ids &= set(self.categories_2_image[class_id])return list(ids)pass# class idsdef get_class_ids(self):# class idscategories_ids = sorted([categories['id'] for categories in self.dataset['categories']])return categories_idspassdef get_annotation_ids(self, image_ids=[], class_ids=[], area_rang=[], is_crowd=False):""":param image_ids : (int array)  get annotation for given images:param class_ids: (int array)   get annotation for given classes:param area_rang: (float array) get annotation for given area range (e.g. [0 inf]):param is_crowd: (boolean)  get annotation for given crowd label (False or True):return: annotation_ids: (int array)    integer array of ann ids"""if len(image_ids) == len(class_ids) == len(area_rang) == 0:annotations = self.dataset['annotations']passelse:if len(image_ids) != 0:lists = [self.image_2_annotations[image_id] for image_id in image_ids ifimage_id in self.image_2_annotations]annotations = list(itertools.chain.from_iterable(lists))passelse:annotations = self.dataset['annotations']passannotations = annotations if len(class_ids) == 0 else [ann for ann in annotations if ann['category_id'] in class_ids]annotations = annotations if len(area_rang) == 0 else [ann for ann in annotations if ann['area'] > area_rang[0] and ann['area'] < area_rang[1]]passif is_crowd:annotation_ids = [annotation['id'] for annotation in annotations if annotation['iscrowd'] == is_crowd]passelse:annotation_ids = [annotation['id'] for annotation in annotations]passreturn annotation_idspassdef load_class_info(self, class_ids=[]):return [self.categories_dict[class_ids]]passdef load_annotation_info(self, annotation_ids=[]):if is_array_like(annotation_ids):return [self.annotations_dict[annotation_id] for annotation_id in annotation_ids]elif type(annotation_ids) == int:return [self.annotations_dict[annotation_ids]]pass# 增加类别信息def add_class(self, source, class_id, class_name):""":param source: 来源:param class_id: 类别的 id 号:param class_name: 类别名称:return:"""assert "." not in source, "Source name cannot contain a dot"# 判断类别是否已存在for info_map in self.class_info_list:class_info_flag = info_map["source"] == source and info_map["id"] == class_idif class_info_flag:# source.class_id combination already available, skipreturnpass# 添加新的类别信息info_map = {"source": source,"id": class_id,"name": class_name}self.class_info_list.append(info_map)pass# 添加图像信息def add_image(self, source, image_id, path, **kwargs):""":param source: 来源:param image_id: 图像 id:param path: 路径:param kwargs: 一个 map 超参:return:"""image_info_map = {"id": image_id, "source": source, "path": path}image_info_map.update(kwargs)self.image_info_list.append(image_info_map)pass# 对数据处理def deal_data(self):image_ids = []class_ids = self.get_class_ids()for class_id in class_ids:image_ids.extend(list(self.get_image_ids(class_ids=[class_id])))pass# Remove duplicatesimage_ids = list(set(image_ids))# Add classesfor i in class_ids:self.add_class("coco", i, self.load_class_info(i)[0]["name"])pass# Add imagesfor i in image_ids:self.add_image(source="coco", image_id=i,path=os.path.join(self.image_file_path, self.image_dict[i]['file_name']),width=self.image_dict[i]["width"],height=self.image_dict[i]["height"],annotations=self.load_annotation_info(self.get_annotation_ids(image_ids=[i], class_ids=class_ids, is_crowd=False)))pass# class name value cleandef clean_name(self, name):""":param name: name value:return:"""return ",".join(name.split(",")[: 1])passif __name__ == "__main__":# 代码开始时间start_time = datetime.now()print("开始时间: {}".format(start_time))anno_file_path = "G:/deep_learning_demo/data/instance_segmentation/annotations"train_anno_json_name = "instances_train2014.json"val_anno_json_name = "instances_minival2014.json"train_anno_json_data_path = os.path.join(anno_file_path, train_anno_json_name)val_anno_json_data_path = os.path.join(anno_file_path, val_anno_json_name)train_image_path = "G:/deep_learning_demo/data/instance_segmentation/train2014"val_image_path = "G:/deep_learning_demo/data/instance_segmentation/val2014"train_data = CocoDataset(train_anno_json_data_path, train_image_path)dataset = train_data.datasetdataset_key = [key for key in dataset]print("dataset_key: {}".format(dataset_key))print("dataset_type: {}".format(type(dataset)))print("info_type: {}".format(type(dataset["info"])))print("images_type: {}".format(type(dataset["images"])))print("licenses_type: {}".format(type(dataset["licenses"])))print("annotations_type: {}".format(type(dataset["annotations"])))print("categories_type: {}".format(type(dataset["categories"])))info_key = [key for key in dataset["info"]]print("info_key: {}".format(info_key))print("info: {}".format(dataset["info"]))print("licenses: {}".format(dataset["licenses"]))print("categories: {}".format(dataset["categories"]))print("images_0-1: {}".format(dataset["images"][: 2]))print("annotations_0-1: {}".format(dataset["annotations"][: 2]))print("It's over!")# 代码结束时间end_time = datetime.now()print("结束时间: {}, 训练模型耗时: {}".format(end_time, end_time - start_time))pass

5. mask rcnn 模型

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/05/12 15:42
# @Author   : WanDaoYi
# @FileName : mask_rcnn.py
# ============================================import h5py
import numpy as np
import tensorflow as tf
import keras.backend as k
import keras.layers as kl
import keras.models as km
from utils.bbox_utils import BboxUtil
from utils.anchor_utils import AnchorUtils
from utils.image_utils import ImageUtils
from utils.mask_util import MaskUtil
from m_rcnn import common
from m_rcnn import backbone
from config import cfg# Conditional import to support versions of Keras before 2.2
try:from keras.engine import saving
except ImportError:# Keras before 2.2 used the 'topology' namespace.from keras.engine import topology as savingclass MaskRCNN(object):def __init__(self, train_flag=True):""":param train_flag: 是否为训练，训练为 True，测试为 False"""self.train_flag = train_flagself.bbox_util = BboxUtil()self.anchor_utils = AnchorUtils()self.image_utils = ImageUtils()self.mask_util = MaskUtil()# 模型 路径self.model_path = cfg.TRAIN.MODEL_PATH if self.train_flag else cfg.TEST.COCO_MODEL_PATH# batch sizeself.batch_size = cfg.TRAIN.BATCH_SIZE if self.train_flag else cfg.TEST.BATCH_SIZE# 模型保存路径self.save_model_path = cfg.TRAIN.SAVE_MODEL_PATHself.backbone = cfg.COMMON.BACKBONEself.backbone_strides = cfg.COMMON.BACKBONE_STRIDES# 输入图像self.image_shape = np.array(cfg.COMMON.IMAGE_SHAPE)# 用于构建特征金字塔的自顶向下层的大小self.top_down_pyramid_size = cfg.COMMON.TOP_DOWN_PYRAMID_SIZEself.rpn_anchor_stride = cfg.COMMON.RPN_ANCHOR_STRIDEself.rpn_anchor_ratios = cfg.COMMON.RPN_ANCHOR_RATIOSself.rpn_nms_threshold = cfg.COMMON.RPN_NMS_THRESHOLDself.class_num = cfg.COMMON.CLASS_NUMself.rois_per_image = cfg.TRAIN.ROIS_PER_IMAGEself.roi_positive_ratio = cfg.TRAIN.ROI_POSITIVE_RATIOself.keras_model = self.build()passdef build(self):# image shapeh, w, c = self.image_shape[:]print("image_shape: {}".format(self.image_shape))if h / 2 ** 6 != int(h / 2 ** 6) or w / 2 ** 6 != int(w / 2 ** 6):raise Exception("Image size must be dividable by 2 at least 6 times ""to avoid fractions when downscaling and upscaling.""For example, use 256, 320, 384, 448, 512, ... etc. ")# Inputsinput_image = kl.Input(shape=[None, None, c], name="input_image")input_image_meta = kl.Input(shape=[cfg.COMMON.IMAGE_META_SIZE], name="input_image_meta")# 训练if self.train_flag:# RPN GTinput_rpn_match = kl.Input(shape=[None, 1], name="input_rpn_match", dtype=tf.int32)input_rpn_bbox = kl.Input(shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32)# Detection GT (class IDs, bounding boxes, and masks)# 1. GT Class IDs (zero padded)input_gt_class_ids = kl.Input(shape=[None], name="input_gt_class_ids", dtype=tf.int32)# 2. GT Boxes in pixels (zero padded)# [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinatesinput_gt_boxes = kl.Input(shape=[None, 4], name="input_gt_boxes", dtype=tf.float32)# Normalize coordinatesgt_boxes = kl.Lambda(lambda x: self.bbox_util.norm_boxes_graph(x, k.shape(input_image)[1:3]))(input_gt_boxes)# 3. GT Masks (zero padded)# [batch, height, width, MAX_GT_INSTANCES]if cfg.TRAIN.USE_MINI_MASK:min_h, min_w = cfg.TRAIN.MINI_MASK_SHAPE[:]input_gt_masks = kl.Input(shape=[min_h, min_w, None], name="input_gt_masks", dtype=bool)else:input_gt_masks = kl.Input(shape=[h, w, None], name="input_gt_masks", dtype=bool)pass# anchoranchors = self.anchor_utils.get_anchors(self.image_shape)# Duplicate across the batch dimension because Keras requires it# TODO: can this be optimized to avoid duplicating the anchors?anchors = np.broadcast_to(anchors, (self.batch_size,) + anchors.shape)# A hack to get around Keras's bad support for constantsanchors = kl.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)anchors = kl.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)passelse:# Anchors in normalized coordinatesanchors = kl.Input(shape=[None, 4], name="input_anchors")# 上面训练用到的参数，测试不需要，但是在 if else 里面定义一下，免得 undefinedinput_rpn_match = Noneinput_rpn_bbox = Noneinput_gt_class_ids = Nonegt_boxes = Noneinput_gt_boxes = Noneinput_gt_masks = Nonepass# Build the shared convolutional layers.# Bottom-up Layers# Returns a list of the last layers of each stage, 5 in total.# Don't create the thead (stage 5), so we pick the 4th item in the list._, c2, c3, c4, c5 = backbone.resnet_graph(input_image, self.backbone, stage5=True)# Top-down Layers# TODO: add assert to varify feature map sizes match what's in configp5 = kl.Conv2D(self.top_down_pyramid_size, (1, 1), name='fpn_c5p5')(c5)p4 = kl.Add(name="fpn_p4add")([kl.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(p5),kl.Conv2D(self.top_down_pyramid_size, (1, 1), name='fpn_c4p4')(c4)])p3 = kl.Add(name="fpn_p3add")([kl.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(p4),kl.Conv2D(self.top_down_pyramid_size, (1, 1), name='fpn_c3p3')(c3)])p2 = kl.Add(name="fpn_p2add")([kl.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(p3),kl.Conv2D(self.top_down_pyramid_size, (1, 1), name='fpn_c2p2')(c2)])# Attach 3x3 conv to all P layers to get the final feature maps.p2 = kl.Conv2D(self.top_down_pyramid_size, (3, 3), padding="SAME", name="fpn_p2")(p2)p3 = kl.Conv2D(self.top_down_pyramid_size, (3, 3), padding="SAME", name="fpn_p3")(p3)p4 = kl.Conv2D(self.top_down_pyramid_size, (3, 3), padding="SAME", name="fpn_p4")(p4)p5 = kl.Conv2D(self.top_down_pyramid_size, (3, 3), padding="SAME", name="fpn_p5")(p5)# P6 is used for the 5th anchor scale in RPN. Generated by# subsampling from P5 with stride of 2.p6 = kl.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(p5)# Note that P6 is used in RPN, but not in the classifier heads.rpn_feature_maps = [p2, p3, p4, p5, p6]mrcnn_feature_maps = [p2, p3, p4, p5]# RPN Modelrpn = common.build_rpn_model(self.rpn_anchor_stride, len(self.rpn_anchor_ratios), self.top_down_pyramid_size)# Loop through pyramid layerslayer_outputs = []  # list of listsfor p in rpn_feature_maps:layer_outputs.append(rpn([p]))pass# Concatenate layer outputs# Convert from list of lists of level outputs to list of lists# of outputs across levels.# e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]]output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]outputs = list(zip(*layer_outputs))outputs = [kl.Concatenate(axis=1, name=n)(list(o)) for o, n in zip(outputs, output_names)]rpn_class_logits, rpn_class, rpn_bbox = outputs# Generate proposals# Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates# and zero padded.proposal_count = cfg.TRAIN.POST_NMS_ROIS if self.train_flag else cfg.TEST.POST_NMS_ROISrpn_rois = common.ProposalLayer(proposal_count=proposal_count,nms_threshold=self.rpn_nms_threshold,batch_size=self.batch_size,name="ROI")([rpn_class, rpn_bbox, anchors])fc_layer_size = cfg.COMMON.FPN_CLASS_FC_LAYERS_SIZEpool_size = cfg.COMMON.POOL_SIZEmask_pool_size = cfg.COMMON.MASK_POOL_SIZEtrain_or_freeze = cfg.COMMON.TRAIN_FLAGif self.train_flag:# Class ID mask to mark class IDs supported by the dataset the image# came from.active_class_ids = kl.Lambda(lambda x: self.image_utils.parse_image_meta_graph(x)["active_class_ids"])(input_image_meta)if not cfg.TRAIN.USE_RPN_ROIS:# Ignore predicted ROIs and use ROIs provided as an input.input_rois = kl.Input(shape=[proposal_count, 4], name="input_roi", dtype=np.int32)# Normalize coordinatestarget_rois = kl.Lambda(lambda x: self.bbox_util.norm_boxes_graph(x, k.shape(input_image)[1:3]))(input_rois)else:target_rois = rpn_roisinput_rois = None# Generate detection targets# Subsamples proposals and generates target outputs for training# Note that proposal class IDs, gt_boxes, and gt_masks are zero# padded. Equally, returned rois and targets are zero padded.rois, target_class_ids, target_bbox, target_mask = \common.DetectionTargetLayer(self.batch_size, name="proposal_targets")([target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])# Network Heads# TODO: verify that this handles zero padded ROIsmrcnn_class_logits, mrcnn_class, mrcnn_bbox = common.fpn_classifier_graph(rois,mrcnn_feature_maps,input_image_meta,pool_size,self.class_num,train_flag=train_or_freeze,fc_layers_size=fc_layer_size)mrcnn_mask = common.build_fpn_mask_graph(rois, mrcnn_feature_maps,input_image_meta,mask_pool_size,self.class_num,train_flag=train_or_freeze)# TODO: clean up (use tf.identify if necessary)output_rois = kl.Lambda(lambda x: x * 1, name="output_rois")(rois)# Lossesrpn_class_loss = kl.Lambda(lambda x: common.rpn_class_loss_graph(*x), name="rpn_class_loss")([input_rpn_match, rpn_class_logits])rpn_bbox_loss = kl.Lambda(lambda x: common.rpn_bbox_loss_graph(self.batch_size, *x), name="rpn_bbox_loss")([input_rpn_bbox, input_rpn_match, rpn_bbox])class_loss = kl.Lambda(lambda x: common.mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")([target_class_ids, mrcnn_class_logits, active_class_ids])bbox_loss = kl.Lambda(lambda x: common.mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")([target_bbox, target_class_ids, mrcnn_bbox])mask_loss = kl.Lambda(lambda x: common.mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")([target_mask, target_class_ids, mrcnn_mask])# Modelinputs = [input_image, input_image_meta,input_rpn_match, input_rpn_bbox, input_gt_class_ids, input_gt_boxes, input_gt_masks]if not cfg.TRAIN.USE_RPN_ROIS:inputs.append(input_rois)outputs = [rpn_class_logits, rpn_class, rpn_bbox,mrcnn_class_logits, mrcnn_class, mrcnn_bbox, mrcnn_mask,rpn_rois, output_rois,rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss]model = km.Model(inputs, outputs, name='mask_rcnn')passelse:# Network Heads# Proposal classifier and BBox regressor headsmrcnn_class_logits, mrcnn_class, mrcnn_bbox = common.fpn_classifier_graph(rpn_rois,mrcnn_feature_maps,input_image_meta,pool_size,self.class_num,train_flag=train_or_freeze,fc_layers_size=fc_layer_size)# Detections# output is [batch, num_detections, (y1, x1, y2, x2, class_id, score)] in# normalized coordinatesdetections = common.DetectionLayer(self.batch_size, name="mrcnn_detection")([rpn_rois,mrcnn_class,mrcnn_bbox,input_image_meta])# Create masks for detectionsdetection_boxes = kl.Lambda(lambda x: x[..., :4])(detections)mrcnn_mask = common.build_fpn_mask_graph(detection_boxes,mrcnn_feature_maps,input_image_meta,mask_pool_size,self.class_num,train_flag=train_or_freeze)model = km.Model([input_image, input_image_meta, anchors],[detections, mrcnn_class, mrcnn_bbox, mrcnn_mask, rpn_rois, rpn_class, rpn_bbox],name='mask_rcnn')pass# Add multi-GPU support. 多 GPU 操作gpu_count = cfg.COMMON.GPU_COUNTif gpu_count > 1:from m_rcnn.parallel_model import ParallelModelmodel = ParallelModel(model, gpu_count)return modelpassdef load_weights(self, model_path, by_name=False, exclude=None):"""Modified version of the corresponding Keras functionwith the addition of multi-GPU support and the abilityto exclude some layers from loading.:param model_path::param by_name::param exclude: list of layer names to exclude:return:"""if exclude:by_name = Truepassif h5py is None:raise ImportError('`load_weights` requires h5py.')passmodel_file = h5py.File(model_path, mode='r')if 'layer_names' not in model_file.attrs and 'model_weights' in model_file:model_file = model_file['model_weights']# In multi-GPU training, we wrap the model. Get layers# of the inner model because they have the weights.keras_model = self.keras_modellayers = keras_model.inner_model.layers if hasattr(keras_model, "inner_model") else keras_model.layersprint("layers: {}".format(layers))# Exclude some layersif exclude:layers = filter(lambda l: l.name not in exclude, layers)if by_name:saving.load_weights_from_hdf5_group_by_name(model_file, layers)else:saving.load_weights_from_hdf5_group(model_file, layers)if hasattr(model_file, 'close'):model_file.close()passdef generate_random_rois(self, image_shape, count, gt_boxes):"""Generates ROI proposals similar to what a region proposal networkwould generate.:param image_shape: [Height, Width, Depth]:param count: Number of ROIs to generate:param gt_boxes: [N, (y1, x1, y2, x2)] Ground truth boxes in pixels.:return:"""# placeholderrois = np.zeros((count, 4), dtype=np.int32)# Generate random ROIs around GT boxes (90% of count)rois_per_box = int(0.9 * count / gt_boxes.shape[0])for i in range(gt_boxes.shape[0]):gt_y1, gt_x1, gt_y2, gt_x2 = gt_boxes[i]h = gt_y2 - gt_y1w = gt_x2 - gt_x1# random boundariesr_y1 = max(gt_y1 - h, 0)r_y2 = min(gt_y2 + h, image_shape[0])r_x1 = max(gt_x1 - w, 0)r_x2 = min(gt_x2 + w, image_shape[1])# To avoid generating boxes with zero area, we generate double what# we need and filter out the extra. If we get fewer valid boxes# than we need, we loop and try again.while True:y1y2 = np.random.randint(r_y1, r_y2, (rois_per_box * 2, 2))x1x2 = np.random.randint(r_x1, r_x2, (rois_per_box * 2, 2))# Filter out zero area boxesthreshold = 1y1y2 = y1y2[np.abs(y1y2[:, 0] - y1y2[:, 1]) >= threshold][:rois_per_box]x1x2 = x1x2[np.abs(x1x2[:, 0] - x1x2[:, 1]) >= threshold][:rois_per_box]if y1y2.shape[0] == rois_per_box and x1x2.shape[0] == rois_per_box:break# Sort on axis 1 to ensure x1 <= x2 and y1 <= y2 and then reshape# into x1, y1, x2, y2 orderx1, x2 = np.split(np.sort(x1x2, axis=1), 2, axis=1)y1, y2 = np.split(np.sort(y1y2, axis=1), 2, axis=1)box_rois = np.hstack([y1, x1, y2, x2])rois[rois_per_box * i:rois_per_box * (i + 1)] = box_rois# Generate random ROIs anywhere in the image (10% of count)remaining_count = count - (rois_per_box * gt_boxes.shape[0])# To avoid generating boxes with zero area, we generate double what# we need and filter out the extra. If we get fewer valid boxes# than we need, we loop and try again.while True:y1y2 = np.random.randint(0, image_shape[0], (remaining_count * 2, 2))x1x2 = np.random.randint(0, image_shape[1], (remaining_count * 2, 2))# Filter out zero area boxesthreshold = 1y1y2 = y1y2[np.abs(y1y2[:, 0] - y1y2[:, 1]) >= threshold][:remaining_count]x1x2 = x1x2[np.abs(x1x2[:, 0] - x1x2[:, 1]) >= threshold][:remaining_count]if y1y2.shape[0] == remaining_count and x1x2.shape[0] == remaining_count:break# Sort on axis 1 to ensure x1 <= x2 and y1 <= y2 and then reshape# into x1, y1, x2, y2 orderx1, x2 = np.split(np.sort(x1x2, axis=1), 2, axis=1)y1, y2 = np.split(np.sort(y1y2, axis=1), 2, axis=1)global_rois = np.hstack([y1, x1, y2, x2])rois[-remaining_count:] = global_roisreturn roispassdef build_detection_targets(self, rpn_rois, gt_class_ids, gt_boxes, gt_masks):"""Generate targets for training Stage 2 classifier and mask heads.This is not used in normal training. It's useful for debugging or to trainthe Mask RCNN heads without using the RPN head.:param rpn_rois: [N, (y1, x1, y2, x2)] proposal boxes.:param gt_class_ids: [instance count] Integer class IDs:param gt_boxes: [instance count, (y1, x1, y2, x2)]:param gt_masks: [height, width, instance count] Ground truth masks. Can be fullsize or mini-masks.:return:rois: [TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)]class_ids: [TRAIN_ROIS_PER_IMAGE]. Integer class IDs.bboxes: [TRAIN_ROIS_PER_IMAGE, NUM_CLASSES, (y, x, log(h), log(w))]. Class-specificbbox refinements.masks: [TRAIN_ROIS_PER_IMAGE, height, width, NUM_CLASSES). Class specific masks croppedto bbox boundaries and resized to neural network output size."""assert rpn_rois.shape[0] > 0assert gt_class_ids.dtype == np.int32, "Expected int but got {}".format(gt_class_ids.dtype)assert gt_boxes.dtype == np.int32, "Expected int but got {}".format(gt_boxes.dtype)assert gt_masks.dtype == np.bool_, "Expected bool but got {}".format(gt_masks.dtype)# It's common to add GT Boxes to ROIs but we don't do that here because# according to XinLei Chen's paper, it doesn't help.# Trim empty padding in gt_boxes and gt_masks partsinstance_ids = np.where(gt_class_ids > 0)[0]assert instance_ids.shape[0] > 0, "Image must contain instances."gt_class_ids = gt_class_ids[instance_ids]gt_boxes = gt_boxes[instance_ids]gt_masks = gt_masks[:, :, instance_ids]# Compute areas of ROIs and ground truth boxes.# rpn_roi_area = (rpn_rois[:, 2] - rpn_rois[:, 0]) * (rpn_rois[:, 3] - rpn_rois[:, 1])# gt_box_area = (gt_boxes[:, 2] - gt_boxes[:, 0]) * (gt_boxes[:, 3] - gt_boxes[:, 1])# Compute overlaps [rpn_rois, gt_boxes]overlaps = np.zeros((rpn_rois.shape[0], gt_boxes.shape[0]))for i in range(overlaps.shape[1]):gt = gt_boxes[i]overlaps[:, i] = self.bbox_util.compute_iou(gt, rpn_rois)pass# Assign ROIs to GT boxesrpn_roi_iou_argmax = np.argmax(overlaps, axis=1)rpn_roi_iou_max = overlaps[np.arange(overlaps.shape[0]), rpn_roi_iou_argmax]# GT box assigned to each ROIrpn_roi_gt_boxes = gt_boxes[rpn_roi_iou_argmax]rpn_roi_gt_class_ids = gt_class_ids[rpn_roi_iou_argmax]# Positive ROIs are those with >= 0.5 IoU with a GT box.fg_ids = np.where(rpn_roi_iou_max > 0.5)[0]# Negative ROIs are those with max IoU 0.1-0.5 (hard example mining)# TODO: To hard example mine or not to hard example mine, that's the question# bg_ids = np.where((rpn_roi_iou_max >= 0.1) & (rpn_roi_iou_max < 0.5))[0]bg_ids = np.where(rpn_roi_iou_max < 0.5)[0]# Subsample ROIs. Aim for 33% foreground.# FGfg_roi_count = int(self.rois_per_image * self.roi_positive_ratio)if fg_ids.shape[0] > fg_roi_count:keep_fg_ids = np.random.choice(fg_ids, fg_roi_count, replace=False)else:keep_fg_ids = fg_ids# BGremaining = self.rois_per_image - keep_fg_ids.shape[0]if bg_ids.shape[0] > remaining:keep_bg_ids = np.random.choice(bg_ids, remaining, replace=False)else:keep_bg_ids = bg_ids# Combine indices of ROIs to keepkeep = np.concatenate([keep_fg_ids, keep_bg_ids])# Need more?remaining = self.rois_per_image - keep.shape[0]if remaining > 0:# Looks like we don't have enough samples to maintain the desired# balance. Reduce requirements and fill in the rest. This is# likely different from the Mask RCNN paper.# There is a small chance we have neither fg nor bg samples.if keep.shape[0] == 0:# Pick bg regions with easier IoU thresholdbg_ids = np.where(rpn_roi_iou_max < 0.5)[0]assert bg_ids.shape[0] >= remainingkeep_bg_ids = np.random.choice(bg_ids, remaining, replace=False)assert keep_bg_ids.shape[0] == remainingkeep = np.concatenate([keep, keep_bg_ids])else:# Fill the rest with repeated bg rois.keep_extra_ids = np.random.choice(keep_bg_ids, remaining, replace=True)keep = np.concatenate([keep, keep_extra_ids])assert keep.shape[0] == self.rois_per_image, \"keep doesn't match ROI batch size {}, {}".format(keep.shape[0], self.rois_per_image)# Reset the gt boxes assigned to BG ROIs.rpn_roi_gt_boxes[keep_bg_ids, :] = 0rpn_roi_gt_class_ids[keep_bg_ids] = 0# For each kept ROI, assign a class_id, and for FG ROIs also add bbox refinement.rois = rpn_rois[keep]roi_gt_boxes = rpn_roi_gt_boxes[keep]roi_gt_class_ids = rpn_roi_gt_class_ids[keep]roi_gt_assignment = rpn_roi_iou_argmax[keep]# Class-aware bbox deltas. [y, x, log(h), log(w)]bboxes = np.zeros((self.rois_per_image, self.class_num, 4), dtype=np.float32)pos_ids = np.where(roi_gt_class_ids > 0)[0]bboxes[pos_ids, roi_gt_class_ids[pos_ids]] = self.bbox_util.box_refinement(rois[pos_ids],roi_gt_boxes[pos_ids, :4])# Normalize bbox refinementsbbox_std_dev = np.array(cfg.COMMON.BBOX_STD_DEV)bboxes /= bbox_std_dev# Generate class-specific target masksmasks = np.zeros((self.rois_per_image, self.image_shape[0], self.image_shape[1], self.class_num),dtype=np.float32)for i in pos_ids:class_id = roi_gt_class_ids[i]assert class_id > 0, "class id must be greater than 0"gt_id = roi_gt_assignment[i]class_mask = gt_masks[:, :, gt_id]if cfg.TRAIN.USE_MINI_MASK:# Create a mask placeholder, the size of the imageplaceholder = np.zeros(self.image_shape[:2], dtype=bool)# GT boxgt_y1, gt_x1, gt_y2, gt_x2 = gt_boxes[gt_id]gt_w = gt_x2 - gt_x1gt_h = gt_y2 - gt_y1# Resize mini mask to size of GT boxplaceholder[gt_y1:gt_y2, gt_x1:gt_x2] = \np.round(self.image_utils.resize(class_mask, (gt_h, gt_w))).astype(bool)# Place the mini batch in the placeholderclass_mask = placeholder# Pick part of the mask and resize ity1, x1, y2, x2 = rois[i].astype(np.int32)m = class_mask[y1:y2, x1:x2]mask = self.image_utils.resize(m, self.image_shape)masks[i, :, :, class_id] = maskreturn rois, roi_gt_class_ids, bboxes, maskspass# ############################################################################################## test# #############################################################################################def detect(self, images_info_list, verbose=0):"""Runs the detection pipeline.:param images_info_list: List of images, potentially of different sizes.:param verbose::return: a list of dicts, one dict per image. The dict contains:rois: [N, (y1, x1, y2, x2)] detection bounding boxesclass_ids: [N] int class IDsscores: [N] float probability scores for the class IDsmasks: [H, W, N] instance binary masks"""if verbose:print("processing {} image_info.".format(len(images_info_list)))for image_info in images_info_list:print("image_info: {}".format(image_info))passpass# Mold inputs to format expected by the neural networkmolded_images_list, image_metas_list, windows_list = self.image_utils.mode_input(images_info_list)# Validate image sizes# All images in a batch MUST be of the same sizeimage_shape = molded_images_list[0].shapefor g in molded_images_list[1:]:assert g.shape == image_shape, \"After resizing, all images must have the same size. Check IMAGE_RESIZE_MODE and image sizes."pass# Anchorsanchors = self.anchor_utils.get_anchors(image_shape)# Duplicate across the batch dimension because Keras requires it# TODO: can this be optimized to avoid duplicating the anchors?anchors = np.broadcast_to(anchors, (cfg.TEST.BATCH_SIZE,) + anchors.shape)if verbose:print("molded_images_list: ", molded_images_list)print("image_metas_list: ", image_metas_list)print("anchors: ", anchors)pass# Run object detectiondetections, _, _, mrcnn_mask, _, _, _ = \self.keras_model.predict([molded_images_list, image_metas_list, anchors], verbose=0)# Process detectionsresults_list = []for i, image_info in enumerate(images_info_list):molded_image_shape = molded_images_list[i].shapefinal_rois, final_class_ids, final_scores, final_masks = self.un_mold_detections(detections[i],mrcnn_mask[i],image_info.shape,molded_image_shape,windows_list[i])results_list.append({"rois": final_rois,"class_ids": final_class_ids,"scores": final_scores,"masks": final_masks,})return results_listpassdef un_mold_detections(self, detections, mrcnn_mask, original_image_shape,image_shape, window):"""Reformats the detections of one image from the format of the neuralnetwork output to a format suitable for use in the rest of theapplication.:param detections: [N, (y1, x1, y2, x2, class_id, score)] in normalized coordinates:param mrcnn_mask: [N, height, width, num_classes]:param original_image_shape: [H, W, C] Original image shape before resizing:param image_shape: [H, W, C] Shape of the image after resizing and padding:param window: [y1, x1, y2, x2] Pixel coordinates of box in the image where the realimage is excluding the padding.:return:boxes: [N, (y1, x1, y2, x2)] Bounding boxes in pixelsclass_ids: [N] Integer class IDs for each bounding boxscores: [N] Float probability scores of the class_idmasks: [height, width, num_instances] Instance masks"""# How many detections do we have?# Detections array is padded with zeros. Find the first class_id == 0.zero_ix = np.where(detections[:, 4] == 0)[0]n = zero_ix[0] if zero_ix.shape[0] > 0 else detections.shape[0]# Extract boxes, class_ids, scores, and class-specific masksboxes = detections[: n, :4]class_ids = detections[: n, 4].astype(np.int32)scores = detections[: n, 5]masks = mrcnn_mask[np.arange(n), :, :, class_ids]# Translate normalized coordinates in the resized image to pixel# coordinates in the original image before resizingwindow = self.bbox_util.norm_boxes(window, image_shape[:2])wy1, wx1, wy2, wx2 = windowshift = np.array([wy1, wx1, wy1, wx1])wh = wy2 - wy1  # window heightww = wx2 - wx1  # window widthscale = np.array([wh, ww, wh, ww])# Convert boxes to normalized coordinates on the windowboxes = np.divide(boxes - shift, scale)# Convert boxes to pixel coordinates on the original imageboxes = self.bbox_util.denorm_boxes(boxes, original_image_shape[:2])# Filter out detections with zero area. Happens in early training when# network weights are still randomexclude_ix = np.where((boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) <= 0)[0]if exclude_ix.shape[0] > 0:boxes = np.delete(boxes, exclude_ix, axis=0)class_ids = np.delete(class_ids, exclude_ix, axis=0)scores = np.delete(scores, exclude_ix, axis=0)masks = np.delete(masks, exclude_ix, axis=0)n = class_ids.shape[0]# Resize masks to original image size and set boundary threshold.full_masks = []for i in range(n):# Convert neural network mask to full size maskfull_mask = self.mask_util.unmold_mask(masks[i], boxes[i], original_image_shape)full_masks.append(full_mask)passfull_masks = np.stack(full_masks, axis=-1) if full_masks else np.empty(original_image_shape[:2] + (0,))return boxes, class_ids, scores, full_maskspass

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/05/15 12:44
# @Author   : WanDaoYi
# @FileName : parallel_model.py
# ============================================import tensorflow as tf
import keras.backend as K
import keras.layers as KL
import keras.models as KMclass ParallelModel(KM.Model):"""Subclasses the standard Keras Model and adds multi-GPU support.It works by creating a copy of the model on each GPU. Then it slicesthe inputs and sends a slice to each copy of the model, and thenmerges the outputs together and applies the loss on the combined outputs."""def __init__(self, keras_model, gpu_count):"""Class constructor.:param keras_model: The Keras model to parallelize:param gpu_count: gpu 个数，当 gpu 个数 大于 1 时，调用这个对象，启用多 GPU 训练"""self.inner_model = keras_modelself.gpu_count = gpu_countmerged_outputs = self.make_parallel()super(ParallelModel, self).__init__(inputs=self.inner_model.inputs,outputs=merged_outputs)def __getattribute__(self, attrname):"""Redirect loading and saving methods to the inner model. That's where the weights are stored:param attrname::return:"""if 'load' in attrname or 'save' in attrname:return getattr(self.inner_model, attrname)return super(ParallelModel, self).__getattribute__(attrname)def summary(self, *args, **kwargs):"""Override summary() to display summaries of both, the wrapper and inner models.:param args::param kwargs::return:"""super(ParallelModel, self).summary(*args, **kwargs)self.inner_model.summary(*args, **kwargs)def make_parallel(self):"""Creates a new wrapper model that consists of multiple replicas ofthe original model placed on different GPUs.:return:"""# Slice inputs. Slice inputs on the CPU to avoid sending a copy# of the full inputs to all GPUs. Saves on bandwidth and memory.input_slices = {name: tf.split(x, self.gpu_count)for name, x in zip(self.inner_model.input_names, self.inner_model.inputs)}output_names = self.inner_model.output_namesoutputs_all = []for i in range(len(self.inner_model.outputs)):outputs_all.append([])# Run the model call() on each GPU to place the ops therefor i in range(self.gpu_count):with tf.device('/gpu:%d' % i):with tf.name_scope('tower_%d' % i):# Run a slice of inputs through this replicazipped_inputs = zip(self.inner_model.input_names,self.inner_model.inputs)inputs = [KL.Lambda(lambda s: input_slices[name][i],output_shape=lambda s: (None,) + s[1:])(tensor)for name, tensor in zipped_inputs]# Create the model replica and get the outputsoutputs = self.inner_model(inputs)if not isinstance(outputs, list):outputs = [outputs]# Save the outputs for merging back together laterfor l, o in enumerate(outputs):outputs_all[l].append(o)# Merge outputs on CPUwith tf.device('/cpu:0'):merged = []for outputs, name in zip(outputs_all, output_names):# Concatenate or average outputs?# Outputs usually have a batch dimension and we concatenate# across it. If they don't, then the output is likely a loss# or a metric value that gets averaged across the batch.# Keras expects losses and metrics to be scalars.if K.int_shape(outputs[0]) == ():# Averagem = KL.Lambda(lambda o: tf.add_n(o) / len(outputs), name=name)(outputs)else:# Concatenatem = KL.Concatenate(axis=0, name=name)(outputs)merged.append(m)return merged

到这里，收工

回主目录

返回实例分割目录

上一章：深度篇——实例分割(二) 细说 mask rcnn 实例分割代码训练自己数据

深度篇——实例分割(三) 细说?mask rcnn 实例分割代码 训练自己数据 之 相关网络，数据处理，工具等

论文地址：《Mask R-CNN》

作者代码地址：Mask R-CNN code

我优化的代码地址：mask_rcnn_pro

本小节，细说 mask rcnn 实例分割代码 训练自己数据 相关网络，数据处理，工具等

六. 相关网络，数据处理，工具 代码

深度篇——实例分割(三) 细说?mask rcnn 实例分割代码训练自己数据之相关网络，数据处理，工具等

本小节，细说 mask rcnn 实例分割代码训练自己数据相关网络，数据处理，工具等

六. 相关网络，数据处理，工具代码