FCOS核心代码阅读笔记

fcos_core/modeling/rpn/fcos/fcos.py

这个文件主要包括fcos的网络结构,包含三个loss: loss_cls, loss_reg, loss_centerness。

FCOS Architecture

其中一个关键函数是compute_locations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def compute_locations(self, features):
locations = []
for level, feature in enumerate(features):
h, w = feature.size()[-2:]
locations_per_level = self.compute_locations_per_level(
h, w, self.fpn_strides[level],
feature.device
)
locations.append(locations_per_level)
return locations

def compute_locations_per_level(self, h, w, stride, device):
shifts_x = torch.arange(
0, w * stride, step=stride,
dtype=torch.float32, device=device
)
shifts_y = torch.arange(
0, h * stride, step=stride,
dtype=torch.float32, device=device
)
shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)
shift_x = shift_x.reshape(-1)
shift_y = shift_y.reshape(-1)
locations = torch.stack((shift_x, shift_y), dim=1) + stride // 2
return locations

对于fpn的5个特征图:P3,P4,P5,P6,P7,计算特征图上的点映射到原图的位置,即生成一个二维网格(meshgrid)。

上面代码的24行加上stride // 2是为了解决一个向下取整造成的问题,使原图上的对应点尽可能接近location(x,y)的感受野中心。

最后得到的locations是一个list,包含5个level特征图的所有点映射到原图的坐标。

Read More

How to use 10,582 trainaug images on DeeplabV3 code?

You know what I mean if you have experience on training segmentation network models on Pascal VOC dataset. The dataset only provides 1464 pixel-level image annotations for training. But every paper uses 10,582 images for training, which is usually called trainaug. The additional annotations are from SBD, but the annotation format is not the same as Pascal VOC. Fortunately someone has already made a converted version, which is SegmentationClassAug.

DeeplabV3 code do not contain SBD annotations for some reasons that we can understand. So I wrote a simple script to solve this.

To use 10,582 trainaug images on DeeplabV3 code, you just need to do the following steps:

1. Create a script named convert_voc2012_aug.sh.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Exit immediately if a command exits with a non-zero status.
set -e

CURRENT_DIR=$(pwd)
WORK_DIR="./pascal_voc_seg"
mkdir -p ${WORK_DIR}

cd ${WORK_DIR}
tar -xf "../VOCtrainval_11-May-2012.tar"
cp "../trainaug.txt" "./VOCdevkit/VOC2012/ImageSets/Segmentation"
unzip "../SegmentationClassAug.zip" -d "./VOCdevkit/VOC2012"
rm -r "./VOCdevkit/VOC2012/__MACOSX"

cd ${CURRENT_DIR}

# Root path for PASCAL VOC 2012 dataset.
PASCAL_ROOT="${WORK_DIR}/VOCdevkit/VOC2012"

# Remove the colormap in the ground truth annotations.
SEG_FOLDER="${PASCAL_ROOT}/SegmentationClassAug"
SEMANTIC_SEG_FOLDER="${PASCAL_ROOT}/SegmentationClassAugRaw"

echo "Removing the color map in ground truth annotations..."
python ./remove_gt_colormap.py \
--original_gt_folder="${SEG_FOLDER}" \
--output_dir="${SEMANTIC_SEG_FOLDER}"

# Build TFRecords of the dataset.
# First, create output directory for storing TFRecords.
OUTPUT_DIR="${WORK_DIR}/tfrecord"
mkdir -p "${OUTPUT_DIR}"

IMAGE_FOLDER="${PASCAL_ROOT}/JPEGImages"
LIST_FOLDER="${PASCAL_ROOT}/ImageSets/Segmentation"

echo "Converting PASCAL VOC 2012 dataset..."
python ./build_voc2012_data.py \
--image_folder="${IMAGE_FOLDER}" \
--semantic_segmentation_folder="${SEMANTIC_SEG_FOLDER}" \
--list_folder="${LIST_FOLDER}" \
--image_format="jpg" \
--output_dir="${OUTPUT_DIR}"

2. Create a txt file named trainaug.txt with this content.

3. Download Pascal VOC dataset and SegmentationClassAug annotations.

4. Put all of them (‘convert_voc2012_aug.sh’, ‘trainaug.txt’, ‘VOCtrainval_11-May-2012.tar’, ‘SegmentationClassAug.zip’) to the research/deeplab/datasets folder.

5. Execute convert_voc2012_aug.sh (give it execute permission) in research/deeplab/datasets.

6. Change the code in research/deeplab/datasets/segmentation_dataset.py from:

1
2
3
4
5
6
7
8
9
_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 1464,
'trainval': 2913,
'val': 1449,
},
num_classes=21,
ignore_label=255,
)

to:

1
2
3
4
5
6
7
8
9
10
_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 1464,
'trainaug': 10582,
'trainval': 2913,
'val': 1449,
},
num_classes=21,
ignore_label=255,
)

7. Don’t forget to change the train_split parameter in research/deeplab/train.py to trainaug.

LeetCode-Interleaving Strings 解题报告

Question

Given s1, s2, s3, find whether s3 is formed by the interleaving of s1 and s2.

For example,
Given:
s1 = “aabcc”,
s2 = “dbbca”,

When s3 = “aadbbcbcac”, return true.
When s3 = “aadbbbaccc”, return false.

首先想到的是一个回溯的方法。可以看作一个二叉树的遍历过程,匹配到s1的字符就是走左子节点,匹配到s2的字符就是走右子节点。若s1和s2中的两个字符都和s3相同,就有两种选择,直接都取s1的,并且把这时s1和s2的索引存下来,遇到后面匹配失败的时候找前一个存下来的索引,从s2走,也就是回溯了。无奈,超时了。时间复杂度应该是,m和n分别是s1和s2的长度。

Read More

解决数据集导致的大内存占用和磁盘IO问题

看到Ubuntu上的磁盘IO和内存占用出奇的高,忍无可忍,决定必须解决一下了。

内存占用最高的是chrome和gvfsd-metadata,前者1.6G没办法,后者竟然有1.8G。查了一下维基百科

gvfsd-metadata is a daemon acting as a write serialiser to the internal gvfs metadata storage. It is autostarted by GIO clients when they make metadata changes. Read operations are done by client-side GIO code directly, and don’t require the daemon to be running. The gvfs metadata capabilities are used by the GNOME Files file manager, for example.

虽然仍然没搞懂它是个什么东西,但是罪魁祸首就是它没跑了,形成的原因大概是我打开过很多大的数据集的文件夹,比如pascal voc、shapenet、pascal3d,要是还有imagenet就更恐怖了。从这个帖子来看,它还导致了100%的CPU占用。后面给出了临时的解决办法:

rm -rf ~/.local/share/gvfs-metadata
pkill gvfsd-metadata

如果这种情况继续出现,那么直接按照这里所说取消gvfsd-metadata的执行权限即可:sudo chmod -x /usr/lib/gvfs/gvfsd-metadata

另外,iowait也达到了27%左右,我只不过开了个sublime-text而已。原因仍然是数据集,有好几个sublime_text --crawl的进程在不停地读磁盘,给文件做索引。这些数据集的文件格式大多是图片,所以只要按照这个帖子给sublime-text的配置加上:

"folder_exclude_patterns": [".svn", ".git", ".hg", "CVS", "node_modules/*"],
"binary_file_patterns": ["*.mat","*.jpg", "*.jpeg", "*.png", "*.gif", "*.ttf", "*.tga", "*.dds", "*.ico", "*.eot", "*.pdf", "*.swf", "*.jar", "*.zip"],

问题就解决了。

不过对于ShapeNet这样的仍然没有办法==,所以最好还是避免用sublime-text打开带有数据集的文件夹(这种需要却并不少见,因为用软链接方便)。

How to make linemod and KinectV2 work with ROS Indigo?

I’m using Ubuntu 14.04.5 with ROS Indigo, and I want to make ork work with linemod, a fairly simple need. But sometimes if some packages are not maintained well (especially in ROS), you have to investigate the problem and even to contribute code to the project…

Following the installation guide to install ork is very simple, don’t forget to install couchdb. Building from source is the only choice now, you have to modify the code to make it work as you wish.

In my case, tabletop method works well with KinectV1, even with KinectV2(but only the hd resolution config worked). However, linemod caused a huge memory leak, nearly 1GB/s, and it didn’t publish /recognized_object_array topic. At the beginning I thought it is the problem is in linemod, but it turned out to be in ork_renderer. A thread in ORK Google Group said,

Currently LINEMOD uses ork_renderer for its training phase. ork_renderer uses either GLUT or osmesa to generate synthetic images of the training data. It seems that the ork_renderer in your computer is linked to osmesa.

Fortunately now we just can change CMakeLists.txt to use GLUT. Just change option(USE_GLUT "Use GLUT instead of OSMesa" OFF) to option(USE_GLUT "Use GLUT instead of OSMesa" ON).

Update: Now I just use the version from JimmyDaSilva but not the official wg-perception.

But the current linemod version still have some problem related to assimp_devel, it seems the developer is working on it, you have to revert linemod to the previous version(35aebd).

So I just created a repo here to make the whole thing work. When linemod is training it will show a assimp window, but it do not contain anything in my case, not a serious problem, linemod works anyway with KinectV1, but not with KinectV2, because KinectV2 has a special resolution, causing an OpenCV error in linearMemoryPyramid. Fortunately again an awesome guy has worked it out, and he also fixed many other issues. I need to use KinectV2 in my work, so I followed this guy’s modification and successfully made it work on KinectV2 with QHD resolution. If you want to use SD resolution, you can set T={2,4} in linemod_detect.cpp and renderer_width: 512 renderer_height: 424 in training.ork as JimmyDaSilva said.

This ork repo integrated all of them, just to make it easier to work on ork, maybe for myself in the future.

Tips:

  • If you used linemod to train, you’d better delete the whole object_recognition database in CouchDB.
  • Using coke.stl is simpler, using coke.obj with texture will get a better result.
  • When training linemod, make sure you are in the folder which contains the obj, mtl and image files.