Analyzing Fine-grained Hypertext Features for Enhanced Crawling and Topic Distillation