Application-storage discovery

Discovering application dependency on data and storage is a key prerequisite for many storage optimization tasks such as data assignment to storage tiers, storage consolidation, virtualization, and handling unused data. However, in the real world these dependencies are rarely known, and discovering them is a challenge because of virtualization at various levels and the need for discovery methods to be non-intrusive. As a result, many optimization tasks are performed, if at all, without the full knowledge of application-to-storage dependencies. This paper presents a non-intrusive application-to-storage discovery method, and while it is built on our prior work, the storage discovery described here is entirely new. We used this discovery method in two production enterprise environments, consisting of about 323 servers, and we show how the discovered data enables three optimization tasks. First, we relate application criticality with storage tiers. Second, we find unused storage devices and we show how this information together with storage consolidation can be used to achieve power savings of up to two orders of magnitude. Third, we identify opportunities for database storage optimization.

[1]  Yuanyuan Zhou,et al.  Hibernator: helping disk arrays sleep through the winter , 2005, SOSP '05.

[2]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[3]  Murthy V. Devarakonda,et al.  Galapagos: Automatically Discovering Application-Data Relationships in Networked Systems , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[4]  Dirk Grunwald,et al.  Massive Arrays of Idle Disks For Storage Archives , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[5]  J. Heidemann,et al.  A Layered Approach to File System Development , 1991 .

[6]  Gregory R. Ganger,et al.  Ironmodel: robust performance models in the wild , 2008, SIGMETRICS '08.

[7]  Nikolai Joukov,et al.  Galapagos: Model-driven discovery of end-to-end application - storage relationships in distributed systems , 2008, IBM J. Res. Dev..

[8]  Nikolai Joukov,et al.  ITBVM: IT Business Value Modeler , 2009, 2009 IEEE International Conference on Services Computing.

[9]  Andreas Kind,et al.  Mining semantic relations using NetFlow , 2008, 2008 3rd IEEE/IFIP International Workshop on Business-driven IT Management.

[10]  Alan Jay Smith,et al.  A File System Tracing Package for Berkeley UNIX , 1985 .

[11]  Julio César López-Hernández,et al.  Stardust: tracking activity in a distributed storage system , 2006, SIGMETRICS '06/Performance '06.

[12]  Erez Zadok,et al.  Tracefs: A File System to Trace Them All , 2004, FAST.

[13]  Luca Deri,et al.  Categorizing Computing Assets According to Communication Patterns , 2002, NETWORKING Tutorials.

[14]  D. Gantenbein,et al.  Relationship Discovery with NetFlow to Enable Business-Driven IT Management , 2006, 2006 IEEE/IFIP Business Driven IT Management.

[15]  Bryan Cantrill,et al.  Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[16]  Nikolai Joukov,et al.  Operating system profiling via latency analysis , 2006, OSDI '06.

[17]  Shivnath Babu,et al.  DIADS: Addressing the "My-Problem-or-Yours" Syndrome with Integrated SAN and Database Diagnosis , 2009, FAST.

[18]  Ali Ghodsi Common Object Request Broker Architecture , 2009, Encyclopedia of Database Systems.

[19]  Nikolai Joukov,et al.  GreenFS: making enterprise computers greener by protecting them better , 2008, Eurosys '08.

[20]  Margo I. Seltzer,et al.  Passive NFS Tracing of Email and Research Workloads , 2003, FAST.

[21]  Nikolai Joukov,et al.  Built-to-Order Service Engineering for Enterprise IT Discovery , 2008, 2008 IEEE International Conference on Services Computing.

[22]  Erez Zadok,et al.  FIST: a language for stackable file systems , 2000, OPSR.

[23]  Xu Chen,et al.  Automating Network Application Dependency Discovery: Experiences, Limitations, and New Solutions , 2008, OSDI.