Character information extraction based on CRFsuite

By applying the Conditional Random Fields based on discriminant undirected graph to character information extraction, this paper proposes an automation character information extraction method based on CRFsuite. Through learning the known domain, this method extracts the feature leading words, position and means from the character information in the Internet to build up a character parameter. By using CRFsuite as a model, the method adopts it to data from the Internet, matches character information and builds up the structured character information database. The method proposed by this paper demonstrates the feasibility of the implement of automation extraction of character information in the mass Internet data, and provides an effective way to facilitate character information tracking and looking-up.