Sampling: Knowing Whole from Its Part

Sampling is a well-established statistical technique that selects a part from a whole to make inferences about the whole. It can be employed to overcome problems caused by high dimensionality of attributes as well as large volumes of data in data mining. This chapter summarizes the basic ideas, assumptions, considerations and advantages as well as limitations of sampling, categorizes representative sampling methods by their features, provides a preliminary guideline on how to choose suitable sampling methods. We hope this can help users build a big picture of sampling methods and apply them in data mining.