Growing applications are employing finite automata as their basic computational model. These applications match tens to thousands of patterns on a large amount of data, which brings great challenges to conventional processors. Hardware-based solutions have achieved high throughputs automata processing. However, they are too heavy to be integrated into small chips. Besides, they have to rely on DRAMs or other high capacity memories to store their underlying automata models. We focus on building a more lightweight automata processor, which can store the whole automata model into SRAMs with limited size and run independently. We propose LAP, a lightweight automata processor. Extremely high storage efficiency is achieved in LAP, leveraging a novel automata model (ADFA) and efficient packing algorithms. Besides, we exploit software-hardware co-design to achieve faster processing speed. We observe that ADFA's traversal algorithm is parallelizable. Thus, we propose novel hardware instructions to parallel the additional memory accesses in ADFA model and hide their access overhead. LAP is organized into a four-stage pipeline and prototyped into Xilinx Artix-7 FPGA at 263 MHz frequency. Evaluations show that LAP achieves extremely high storage efficiency, exceeding IBM's RegX and Micron's AP by 8×. Besides, LAP achieves significant improvements in processing speed ranging from 32% to 91% compared with previous lightweight implementations. As a result, a low-power CPU equipped with five LAP cores can achieve 9.5 Gbps processing throughput matching 400 patterns simultaneously.