SANet++: Enhanced Scale Aggregation with Densely Connected Feature Fusion for Crowd Counting

Crowd counting has gained considerable attention recently but remains challenging mainly due to large scale variations. In this paper, we present SANet++ with a novel architecture to generate high-quality density maps and further perform accurate counting. SANet++ obtains enhanced multi-scale representation with densely connected feature fusion between branches. Our approach avoids information redundancy while exploits complementary features at different scales. In addition, we introduce a novel Bulk loss which incorporates the spatial correlation within a whole patch. This global structural supervision enforces the network to learn the interactions between pixels without limitations on region size. Our SANet++ outperforms state-of-the-art crowd counting approaches according to extensive experiments conducted on three major datasets.