Data Understanding, Representation, and Visualization

This chapter introduces the concepts of understanding, representing, and visualizing the data. These are essential steps in the before one starts to build a machine learning model or artificially intelligent application. Although these concepts might appear trivial, when the dimensionality of the data is more than 3, they quickly become quite non-trivial and difficult. This chapter introduces dimensionality reduction techniques like principal component analysis and linear discriminant analysis for the purpose of better visualization of high dimensional data. The better visualization gives the user insights into the data distribution and relation of various features with each other and with the output. These insights are valuable when making various choices in the subsequent machine learning pipeline.