Design of Data Storage and Management System Framework for Big Data for Internet of Things

The Internet of Things is a new network technology that has developed rapidly in recent years. Persistent storage of IoT big data and statistical analysis of the stored data can better manage the IoT system and save IoT applications. cost. However, the big data of the Internet of Things has massive characteristics, and traditional data storage technologies and management systems are difficult to meet actual needs. In addition, the need to quickly find data requires a more efficient new storage architecture. Based on this, this paper designs the DMFS system architecture and file writing process based on the Hadoop file system, and builds a distributed file system for massive small files. An aggregate query data system is designed on the basis of probability-oriented data OLAP, and the type and implementation of the query are designed. Based on the above research, an HDFS dynamic copy management strategy based on fragile storage is proposed. Introduction In recent years, the number of wireless sensor components of the Internet of Things system has been increasing, and the data tasks performed on these sensor components have become more diverse. With the development of modern Internet systems, the data space of Internet big data will reach a larger scale. Extracting valuable data from the Internet of Things system to improve management efficiency, office efficiency and improve daily life are important directions for the development of modern science and technology. For statistical analysis of the data stored in the Internet of Things system, a new system architecture is needed to manage the entire information space, and new storage and computing technologies are required to complete data statistics and analysis. It is also necessary to ensure real-time response of service information. Data storage technology and management systems put forward more stringent requirements. This paper uses distributed thinking to build a data storage and management system framework for IoT big data, which aims to improve system operation efficiency. Design of Distributed File System for Massive Small Files Considering the common problems of HDFS, in order to change the current situation of writing large numbers of small files, this study uses two designs: "write cache" and "cluster write". The former is to write files in memory and use cluster writing to improve files. Write throughput rate; the latter is a process of clustering small files from different sensors into large files, and writing these synthesized files into system memory. The system architecture design of Sensor FS is shown in Figure 1. DMFS uses the "top" method, and it can be installed on the server of the system's main node storage node. 2019 International Conference on Information Science, Medical and Health Informatics (ISMHI 2019) Copyright © (2019) Francis Academic Press, UK DOI: 10.25236/ISMHI.2019.133 694 Figure 1 System architecture design of Sensor FS The DMFS system architecture is shown in Figure 2. It consists of a master node and different storage nodes. The master node includes a write and transfer magic chopsticks module and a sensor clustering module. The former mainly receives the sensor's write request and monitors the client's request command; the latter mainly sends the write request data and receives the returned information. Each storage node includes a write buffer and a write merge module, which is mainly responsible for cooperating with the master node to fully cluster the sensors. Figure 2 DMFS system architecture diagram The workflow of DMFS is as follows: First, write scheduling. When the sensor in the system establishes a connection with the DMFS for the first time, the sensor connects to the write scheduling module of the manager, and the write scheduling module returns information such as the main storage address to the sensor according to the memory capacity. Second, file writing. The sensor establishes a connection with the main storage. The main storage receives a large number of small files in the form of TCP. After completing the write operation, it returns a command to the sensor to write successfully, and the corresponding data is also transmitted to each storage node. Third, cluster writing. When the total amount of files in the system reaches the threshold, the DMFS system initializes the files that need to be stored and calculates the sensor level accumulation. The corresponding management module summarizes the calculation results and completes the initial dependency graph initialization. The result determines the combined write position of the cluster. Aggregated query data system architecture and replica dynamic management The traditional method of expressing the fact table of data is to use the PWS model, but the probability data scale of this model may be smaller than the actual number, which affects the aggregated value. Based on this, this research redesigned the query framework for aggregated data, and built a data cube oriented to probability data (see Table 1).