Advanced intelligent embedded systems perform cognitive tasks with highly-efficient vector-processing units for deep neural network (DNN) inference and other vector-based signal processing using limited power. SRAM-based compute-in-memory (CIM) achieves high energy efficiency for vector-matrix multiplications, offers <1ns read/write speed, and saves vastly repeating memory accesses. However, prior SRAM CIM macros require a large area for compute circuits (either using ADC for analog CIM [1– 4] or CMOS static logic for all-digital CIM [5–6]), have limited CIM functions, and use fixed vector-processing dimensions that cause a low-spatial-utilization rate when deploying DNN (Fig. 11.7.1).