作者|facebookresearch 编译|Flin 来源|Github
基准测试
在这里,我们以一些其他流行的开源Mask R-CNN实现为基准,对Detectron2中Mask R-CNN的训练速度进行了基准测试。
设置
-
硬件:8个带有NVLink的NVIDIA V100。
-
软件: Python 3.7, CUDA 10.0, cuDNN 7.6.4, PyTorch 1.3.0 (链接(download.pytorch.org/whl/nightly…)), TensorFlow 1.15.0rc2, Keras 2.2.5, MxNet 1.6.0b20190820.
-
模型:端到端R-50-FPN Mask-RCNN模型,使用与Detectron基线配置(github.com/facebookres… 。
-
指标:我们使用100-500次迭代中的平均吞吐量来跳过GPU预热时间。请注意,对于R-CNN样式的模型,模型的吞吐量通常会在训练期间发生变化,因为它取决于模型的预测。因此,该指标不能直接与model zoo中的"训练速度"相比较,后者是整个训练过程的平均速度。
主要结果
工具 | 吞吐率(img / s) |
Detectron2 | 59 |
maskrcnn-benchmark | 51 |
tensorpack | 50 |
mmdetection | 41 |
simpledet | 39 |
Detectron | 19 |
matterport/Mask_RCNN | 14 |
- Detectron2:github.com/facebookres…
- maskrcnn-benchmark:github.com/facebookres…
- tensorpack:github.com/tensorpack/…
- mmdetection:github.com/open-mmlab/…
- simpledet:github.com/TuSimple/si…
- Detectron:github.com/facebookres…
- matterport/Mask_RCNN:github.com/matterport/…
每个实现的详细信息:
-
Detectron2:
python tools/train_net.py --config-file configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml --num-gpus 8复制代码
-
maskrcnn-benchmark: 通过
sed -i ‘s/torch.uint8/torch.bool/g’ **/*.py
使用commit0ce8f6f
与使其与新的PyTorch兼容。然后,运行python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml复制代码
我们观察到的速度比其model zoo快,这可能是由于软件版本不同所致。
-
tensorpack: 在提交
caafda
,export TF_CUDNN_USE_AUTOTUNE=0
, 然后运行mpirun -np 8 ./train.py --config DATA.BASEDIR=/data/coco TRAINER=horovod BACKBONE.STRIDE_1X1=True TRAIN.STEPS_PER_EPOCH=50 --load ImageNet-R50-AlignPadding.npz复制代码
-
mmdetection: commit
4d9a5f
,应用以下diff,然后运行./tools/dist_train.sh configs/mask_rcnn_r50_fpn_1x.py 8复制代码
我们观察到的速度比其model zoo快,这可能是由于软件版本不同所致。
(diff使其使用相同的超参数-单击展开)
diff --git i/configs/mask_rcnn_r50_fpn_1x.py w/configs/mask_rcnn_r50_fpn_1x.py index 04f6d22..ed721f2 100644 --- i/configs/mask_rcnn_r50_fpn_1x.py +++ w/configs/mask_rcnn_r50_fpn_1x.py @@ -1,14 +1,15 @@ # model settings model = dict( type='MaskRCNN', - pretrained='torchvision://resnet50', + pretrained='open-mmlab://resnet50_caffe', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - style='pytorch'), + norm_cfg=dict(type="BN", requires_grad=False), + style='caffe'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], @@ -115,7 +116,7 @@ test_cfg = dict( dataset_type = 'CocoDataset' data_root = 'data/coco/' img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) + mean=[123.675, 116.28, 103.53], std=[1.0, 1.0, 1.0], to_rgb=False) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True),复制代码
-
SimpleDet: 在commit
9187a1
时运行python detection_train.py --config config/mask_r50v1_fpn_1x.py复制代码
-
Detectron: 运行
python tools/train_net.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml复制代码
请注意,它的许多操作都在CPU上运行,因此性能受到限制。
-
matterport/Mask_RCNN:在commit时
3deaec
,应用以下diff ,export TF_CUDNN_USE_AUTOTUNE=0
, 然后运行python coco.py train --dataset=/data/coco/ --model=imagenet复制代码
请注意,此实现中的许多小细节可能与Detectron的标准不同。
(diff使其使用相同的超参数-单击展开)
diff --git i/mrcnn/model.py w/mrcnn/model.py
index 62cb2b0..61d7779 100644
--- i/mrcnn/model.py
+++ w/mrcnn/model.py
@@ -2367,8 +2367,8 @@ class MaskRCNN():
epochs=epochs,
steps_per_epoch=self.config.STEPS_PER_EPOCH,
callbacks=callbacks,
- validation_data=val_generator,
- validation_steps=self.config.VALIDATION_STEPS,
+ #validation_data=val_generator,
+ #validation_steps=self.config.VALIDATION_STEPS,
max_queue_size=100,
workers=workers,
use_multiprocessing=True,
diff --git i/mrcnn/parallel_model.py w/mrcnn/parallel_model.py
index d2bf53b..060172a 100644
--- i/mrcnn/parallel_model.py
+++ w/mrcnn/parallel_model.py
@@ -32,6 +32,7 @@ class ParallelModel(KM.Model):
keras_model: The Keras model to parallelize
gpu_count: Number of GPUs. Must be > 1
"""
+ super().__init__()
self.inner_model = keras_model
self.gpu_count = gpu_count
merged_outputs = self.make_parallel()
diff --git i/samples/coco/coco.py w/samples/coco/coco.py
index 5d172b5..239ed75 100644
--- i/samples/coco/coco.py
+++ w/samples/coco/coco.py
@@ -81,7 +81,10 @@ class CocoConfig(Config):
IMAGES_PER_GPU = 2
# Uncomment to train on 8 GPUs (default is 1)
- # GPU_COUNT = 8
+ GPU_COUNT = 8
+ BACKBONE = "resnet50"
+ STEPS_PER_EPOCH = 50
+ TRAIN_ROIS_PER_IMAGE = 512
# Number of classes (including background)
NUM_CLASSES = 1 + 80 # COCO has 80 classes
@@ -496,29 +499,10 @@ if __name__ == '__main__':
# *** This training schedule is an example. Update to your needs ***
# Training - Stage 1
- print("Training network heads")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=40,
- layers='heads',
- augmentation=augmentation)
-
- # Training - Stage 2
- # Finetune layers from ResNet stage 4 and up
- print("Fine tune Resnet stage 4 and up")
- model.train(dataset_train, dataset_val,
- learning_rate=config.LEARNING_RATE,
- epochs=120,
- layers='4+',
- augmentation=augmentation)
-
- # Training - Stage 3
- # Fine tune all layers
- print("Fine tune all layers")
- model.train(dataset_train, dataset_val,
- learning_rate=config.LEARNING_RATE / 10,
- epochs=160,
- layers='all',
+ layers='3+',
augmentation=augmentation)
elif args.command == "evaluate":复制代码
原文链接:detectron2.readthedocs.io/notes/bench…
欢迎关注磐创AI博客站: panchuang.net/
sklearn机器学习中文官方文档: sklearn123.com/
欢迎关注磐创博客资源汇总站: docs.panchuang.net/