A Biclustering Method for Heterogeneous and Temporal Medical Data
暂无分享,去创建一个
We address the problem of biclustering on heterogeneous data, that is, data of various types (binary, numeric, symbolic, temporal). We propose a new method, HBC-t (Heterogeneous BiClustering for temporal data), designed to extract biclusters from heterogeneous, temporal, large-scale, sparse data matrices. HBC-t is based on HBC, using similar mechanisms but adding support for temporal data. The goal of this method is to handle Electronic Health Records (EHR) data gathered by hospitals on patients, stays, acts, diagnoses, prescriptions, etc.; and to provide valuable insights on this data. Temporal data accounts for a majority of the data available for this study, and in EHR in general where medical events are timestamped. Therefore, it is crucial to have an algorithm that supports this type of data. The proposed algorithm takes advantage of the data sparsity and uses a constructive greedy heuristic to build a large number of possibly overlapping biclusters. HBC-t is successfully compared with several other biclustering algorithms on numeric and temporal data. Experiments on full-scale real-life data sets further assert its scalability and efficiency.