Supplementary Materials for Perspective-Guided Convolution Networks for Crowd Counting

To verify the effectiveness of our single-column architecture, we compare it with two kinds of feature concatenations, i.e. singleand multi-feature concatenations. Denoted by Θk the parameters of i-th convolution layer with kernel size k in our network. F i−1(X) denotes the feature of (i − 1)-th convolution layer with the input X . Then, Θk(F i−1(X)) is the feature generated by Θk given F i−1(X) as input. The singlefeature concatenation (k × k) indicates the concatenation of Θk(F i−1(X)) and F i−1(X); on the other hand, the multi-feature concatenation (1⊗K) indicates the concatenation of Θ1(F (X)),Θ3(F i−1(X)), · · · ,ΘK(F i−1(X)) and F i−1(X). The comparisons are conducted on ShanghaiTech Part A, and the results are shown in Table 1. It is observed that our PGC block noticeably outperforms both singleand multi-feature concatenation strategies, which indicates its feasibility over conventional feature concatenations with fixed kernel size. Besides, tables 2, 3, 4, 5, 6