RetinaFace Pytorch实现训练、测试，pytorch模型转onnx转ncnn C++推理_综合

首先感谢大佬的开源：
RetinaFace Pytorch：https://github.com/biubug6/Pytorch_Retinaface
pytorch转onnx转ncnn C++推理：https://github.com/biubug6/Face-Detector-1MB-with-landmark

RetinaFace Pytorch

训练和评估直接看github项目里的就可以了，主要提供了RetinaFace（resnet和mobilenet）、Slim和RFB网络结构（version-slim (network backbone simplification,slightly faster) and version-RFB (with the modified RFB module, higher precision)。基本思路类似于yolov3，在多个尺度的特征图上生成box。
下面给出单张图片和视频的推理代码，1050显卡推理速度在30ms左右。

import os
import sys
import os
import argparse
import torch
import torch.backends.cudnn as cudnn
import numpy as np
from data.config import cfg_mnet, cfg_re50
from layers.functions.prior_box import PriorBox
from utils.nms.py_cpu_nms import py_cpu_nms
import cv2
from models.retinaface import RetinaFace
from utils.box_utils import decode
import glob
import timeforce_cpu = False
if force_cpu:device = torch.device('cpu')device = torch.cuda.current_device()parser = argparse.ArgumentParser(description='Retinaface')
parser.add_argument('-m', '--trained_model', default='./weights/Resnet50_Final.pth',type=str, help='Trained state_dict file path to open')
parser.add_argument('--origin_size', default=True, type=str,help='Whether use origin image size to evaluate')
parser.add_argument('--img_folder', default='./images/',type=str, help='dataset path')
parser.add_argument('--confidence_threshold', default=0.02,type=float, help='confidence_threshold')
parser.add_argument('--top_k', default=5000, type=int, help='top_k')
parser.add_argument('--nms_threshold', default=0.3,type=float, help='nms_threshold')
parser.add_argument('--keep_top_k', default=750, type=int, help='keep_top_k')
parser.add_argument('-s', '--show_image', action="store_true",default=True, help='show detection results')
parser.add_argument('--vis_thres', default=0.3, type=float,help='visualization_threshold')
args = parser.parse_args()def check_keys(model, pretrained_state_dict):ckpt_keys = set(pretrained_state_dict.keys())model_keys = set(model.state_dict().keys())used_pretrained_keys = model_keys & ckpt_keysunused_pretrained_keys = ckpt_keys - model_keysmissing_keys = model_keys - ckpt_keysprint('Missing keys:{}'.format(len(missing_keys)))print('Unused checkpoint keys:{}'.format(len(unused_pretrained_keys)))print('Used keys:{}'.format(len(used_pretrained_keys)))assert len(used_pretrained_keys) > 0, 'load NONE from pretrained checkpoint'return Truedef remove_prefix(state_dict, prefix):''' Old style model is stored with all names of parameters sharing common prefix 'module.' '''print('remove prefix \'{}\''.format(prefix))def f(x): return x.split(prefix, 1)[-1] if x.startswith(prefix) else xreturn {
    f(key): value for key, value in state_dict.items()}def load_model(model, pretrained_path):print('Loading pretrained model from {}'.format(pretrained_path))pretrained_dict = torch.load(pretrained_path,map_location=lambda storage, loc: storage if force_cpu else storage.cuda(device))if "state_dict" in pretrained_dict.keys():pretrained_dict = remove_prefix(pretrained_dict['state_dict'], 'module.')else:pretrained_dict = remove_prefix(pretrained_dict, 'module.')check_keys(model, pretrained_dict)model.load_state_dict(pretrained_dict, strict=False)return modeldef detect_vis(net,img_raw):img = np.float32(img_raw)# testing scaletarget_size = 1600max_size = 2150im_shape = img.shapeim_size_min = np.min(im_shape[0:2])im_size_max = np.max(im_shape[0:2])resize = float(target_size) / float(im_size_min)# prevent bigger axis from being more than max_size:if np.round(resize * im_size_max) > max_size:resize = float(max_size) / float(im_size_max)if args.origin_size:resize = 1if resize != 1:img = cv2.resize(img, None, None, fx=resize,fy=resize, interpolation=cv2.INTER_LINEAR)im_height, im_width, _ = img.shapescale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]])img -= (104, 117, 123)img = img.transpose(2, 0, 1)img = torch.from_numpy(img).unsqueeze(0)img = img.to(device)scale = scale.to(device)tic = time.time()loc, conf, landms = net(img)  # forward passprint('net forward time: {}'.format(time.time() - tic))priorbox = PriorBox(cfg_re50, image_size=(im_height, im_width))priors = priorbox.forward()priors = priors.to(device)prior_data = priors.databoxes = decode(loc.data.squeeze(0), prior_data, cfg_re50['variance'])boxes = boxes * scale / resizeboxes = boxes.cpu().numpy()scores = conf.squeeze(0).data.cpu().numpy()[:, 1]# ignore low scoresinds = np.where(scores > args.confidence_threshold)[0]boxes = boxes[inds]scores = scores[inds]# keep top-K before NMSorder = scores.argsort()[::-1][:args.top_k]boxes = boxes[order]scores = scores[order]# do NMSdets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)keep = py_cpu_nms(dets, args.nms_threshold)dets = dets[keep, :]# keep top-K faster NMSdets = dets[:args.keep_top_k, :]# show imageif args.show_image:for b in dets:if b[4] < args.vis_thres:continuetext = "{:.4f}".format(b[4])b = list(map(int, b))cv2.rectangle(img_raw, (b[0], b[1]),(b[2], b[3]), (0, 0, 255), 2)cx = b[0]cy = b[1] + 12cv2.putText(img_raw, text, (cx, cy),cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255))cv2.imshow("res", img_raw)def test_video(net):cap = cv2.VideoCapture(0)while cap.isOpened():ret,img = cap.read()detect_vis(net,img)k = cv2.waitKey(1)if k == ord('a') or k == ord('A'):cv2.imwrite('test.jpg', img)if k == ord('q') or k == ord('Q'):cap.release()def test_pic(net):img_folder = args.img_folderall_imgs = glob.glob(os.path.join(img_folder, '*.jpg'))# testing beginfor i, img_f in enumerate(all_imgs):img_raw = cv2.imread(img_f, cv2.IMREAD_COLOR)img = np.float32(img_raw)# testing scaletarget_size = 1600max_size = 2150im_shape = img.shapeim_size_min = np.min(im_shape[0:2])im_size_max = np.max(im_shape[0:2])resize = float(target_size) / float(im_size_min)# prevent bigger axis from being more than max_size:if np.round(resize * im_size_max) > max_size:resize = float(max_size) / float(im_size_max)if args.origin_size:resize = 1if resize != 1:img = cv2.resize(img, None, None, fx=resize,fy=resize, interpolation=cv2.INTER_LINEAR)im_height, im_width, _ = img.shapescale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]])img -= (104, 117, 123)img = img.transpose(2, 0, 1)img = torch.from_numpy(img).unsqueeze(0)img = img.to(device)scale = scale.to(device)print('input tensor shape: {}'.format(img.size()))tic = time.time()loc, conf, landms = net(img)  # forward passprint('net forward time: {}'.format(time.time() - tic))priorbox = PriorBox(cfg_re50, image_size=(im_height, im_width))priors = priorbox.forward()priors = priors.to(device)prior_data = priors.databoxes = decode(loc.data.squeeze(0), prior_data, cfg_re50['variance'])boxes = boxes * scale / resizeboxes = boxes.cpu().numpy()scores = conf.squeeze(0).data.cpu().numpy()[:, 1]# ignore low scoresinds = np.where(scores > args.confidence_threshold)[0]boxes = boxes[inds]scores = scores[inds]# keep top-K before NMSorder = scores.argsort()[::-1][:args.top_k]boxes = boxes[order]scores = scores[order]# do NMSdets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)keep = py_cpu_nms(dets, args.nms_threshold)dets = dets[keep, :]# keep top-K faster NMSdets = dets[:args.keep_top_k, :]# show imageif args.show_image:for b in dets:if b[4] < args.vis_thres:continuetext = "{:.4f}".format(b[4])b = list(map(int, b))cv2.rectangle(img_raw, (b[0], b[1]),(b[2], b[3]), (0, 0, 255), 2)cx = b[0]cy = b[1] + 12cv2.putText(img_raw, text, (cx, cy),cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255))print(img_f.split("/")[-1][7:])cv2.imwrite("./result/"+img_f.split("/")[-1][7:],img_raw)if __name__ == '__main__':torch.set_grad_enabled(False)# net and modelnet = RetinaFace(cfg=cfg_re50, phase='test')net = load_model(net, args.trained_model)net.eval()print('Finished loading model!')cudnn.benchmark = Truenet = net.to(device)test_video(net)# test_pic(net)

pytorch转onnx转ncnn C++推理

环境：ubuntu16.04或win10；pytorch1.2（虽然作者说1.1.0以上就可以了，但是1.1.0版本onnx转换会有bug）；protobuf；opencv；ncnn。

ubuntu：

ubuntu下安装protobu、opencvf和ncnn还是比较简单的，收集了一些比较靠谱的博客。
protobuf安装：https://blog.csdn.net/u010918487/article/details/82947157
opencv安装：https://www.jianshu.com/p/f646448da265
ncnn安装：https://yyingbiu.github.io/2019/08/21/linux-xia-bian-yi-an-zhuang-ncnn/
执行以下指令，将pytorch模型转换成onnx格式

python convert_to_onnx.py --trained_model weight_file --network mobile0.25 or slim or RFB

pytorch生成onnx转换模型时有一些冗余，我们用工具简化一下onnx模型执行下面命令，转换就是正常的了，这个face_sim.onnx 就是最终的onnx文件。

pip install onnx-simplifier
python -m onnxsim face.onnx face_sim.onnx

后面按照github里面的流程来就可以了

win10

cmake、opencv和VS2015我之前安装过了，这里就不介绍了
推荐一个安装protobuf、ncnn比较靠谱的博客
https://blog.csdn.net/heiheiya/article/details/100519584
有以下几点要注意的：
1.protobuf的版本为3.4.0，我之前先安装了一个3.6.1版本的，在ncnn编译的时候onnx2caffe等工具的时候出了问题。

2.我是用命令行编译protobuf和ncnn的，这里要注意不能用普通的cmd窗口，要用Visual C++ 2015 x64 Native Build Tools Command Prompt，否则环境不行导致编译失败。

3.在编译ncnn之前要配置一下opencv，cmake可能找不到opencv的路径，在对应的CMakeLists.txt里面添加

set(OpenCV_DIR D:/mysoftware/opencv/build)
include_directories(${
    OpenCV_DIR}/include)

4.可以提前修改CMakeLists.txt里面相关路径，也可以直接执行以下指令后build里面有个vs2015工程.sln在vs里面配置include和lib路径，注意把ncnn的包含路径换成自己编译的，不要用github作者否则有重定义的bug！

mkdir build
cd build
cmake ..