Intelligent web crawler for file safety inspection

The Internet has always been growing with all the contents and information added by different types of users. Without proper storage and indexing, these contents can easily be lost in the sea of information housed by the Internet. Hence, an automated program, known as the web crawler is used to index all the contents added to the Internet. With proper configurations and settings, a web crawler can be used for other purposes besides web indexing, which include downloading files from the web. Millions or billions of files are uploaded on the Internet and for most of the sites which host these files, there are no direct indication of whether the file is safe and free of malicious codes. Therefore, this paper aims to provide a construction of a web crawler which crawls all the pages in a given website domain, and download all the possible downloadable files linked to those pages, for the purpose of file safety inspection.