Selection of the Suitable Neighborhood Size Based on Bayesian Information Criterion

To select a suitable neighborhood size for manifold learning algorithms efficiently, a new method based on BIC (Bayesian Information Criterion) is used in this paper. Due to the locally Euclidean property of the manifold, the PCA (Principal Component Analysis) reconstruction errors of the neighborhoods without shortcut edges remain small; however, those of the neighborhoods with shortcut edges are relatively quite large. So all the PCA reconstruction errors fall into two clusters when the neighborhood size is unsuitable, or one cluster when the neighborhood size is suitable, which can be detected by BIC. Concretely speaking, if the BIC value of the two-cluster solution is larger than that of the one-cluster solution, all the PCA reconstruction errors fall into two clusters, which means that the neighborhood size is unsuitable, otherwise which means that the neighborhood size is suitable. This method only requires running PCA and computing BIC, whose time complexities are relatively small, but not running the time-consuming manifold learning algorithm as those methods based on residual variance do, so this method is much more efficient than those methods based on residual variance. The effectivity of this method can be verified by experimental results well.