Species Identification using DNA Barcode Sequences through Supervised Learning Methods

DNA barcode is a short sequence taken from organism's DNA which can be used as marker for species identification purpose. The barcoding technique is quite similar to the way a barcode scanner is used in supermarket to scan the black and white stripes of Universal Product Code for finding out the product details. DNA barcoding in a very similar manner is useful for species classification. Before DNA barcoding techniques, specimens were used to categorize into species by looking at their morphological features like size, shape and color. Most of the times professional taxonomists were required for this identification procedure because even trained technicians failed to do so. A various approach of machine learning algorithms has been implemented and analyzed for identifying species. Results show that up to 100% accuracy rate can be achieved by using supervised machine learning algorithms on both synthetic and empirical types of datasets. Moreover along with better accuracy rates some supervised learning methods also guarantee stability of their performances on various types of datasets.