Use the Force to Combat Dark Data

Not surprising to anyone that the amount of data that we generate, keep, maintain and administer is growing and growing fast.  In fact it is growing so fast and there is so much of it being stored, that we have begun giving it names: Big, Meta, Structured, Semi-Structured, Unstructured, and Dark are just some of the most popular.

Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). They state that 'similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets and organizations often retain dark data for compliance purposes only.' Storing and securing this data typically incurs more expense (and sometimes greater risk) than value.

So how can we get a better handle on shedding some light on this expensive, risk prone information?  How much dark data do we have sitting on our storage assets or in our virtual machine infrastructure?  How much of that data is dormant and hasn't been reviewed or touched in over 1 year?  How much are we spending on backing up, protecting and replicating this data between sites and data centers?

Shining some light on Dormant Data

One of my favorite screens in the DataGravity UI is what is called File Analytics.  It is my favorite for many reasons, but primarily because it provides a great place to start finding answers and gathering insights on the data that I am tasked with storing and maintaining.

Screen Shot 2014-11-07 at 11.42.56 AM.png

In the lower left quadrant of the File Analytics screen is section entitled Dormant Data.  This section, just like every part of the File Analytics UI, allows me to grab a quick glance at the demographics of my data and if desired, drill down for more detail.  By doing so I am able to quickly produce a full list of the data at a granular, file-level which can then be searched, exported, or drilled into further.

Drilling into the Dormant Data that hasn't been accessed in over 9 Months, allows me to not only see the data itself, but also provides a search experience to find items like it's owners, file sizes, file types, readers and writers. I also have the ability to export all of this information.

9 month list.png


In only a few clicks, I now have a better understanding of a number of things and answers to some very real questions: How much dormant data do I have on a per-share or per-VM basis?  Who owns most of that data? Perhaps they are no longer with the company. Do I have any possible liabilities lurking around like PII information such as SSN, Credit Card #'s, etc.? Do I really need to keep all of this or is there an opportunity to clean house and better utilize my storage assets.