A novel approach for Web page modeling in personal information extraction