And so the Journey Continues.....

Just over two years ago I made a change professionally that was a catalyst for my career.  I joined the proverbial "Dark Side" of IT working at my first OEM as a solutions architect.  My years at DataGravity were simply amazing.  Contributing to product launch, developing a channel business, building a customer base, and always being the internal advocate of the customer.  All invaluable experiences that have forever shaped me.  

I have heard it said "The person that you will be in five years is based on the books you read and the people you surround yourself with today."  If this is in fact true, then my five years look bright - and it is not because I am good at reading.  The people that I have had the fortune of surrounding myself and developing friendships with over the last couple of years has without question, been the most incredible part of the journey.  

As my time with DataGravity begun starting to wind down, I drew on inspiration gathered from a couple of episodes from one of my favorite podcasts, The Geek Whisperers. In particular, Scott Lowe's advice to never underestimate the power of your own community and Kenneth Hui's genuine vulnerability, absolute trust & reliance on his network to call in quits without the next job lined up.  Thank you both, and of course the Geek Whisperers crew for providing such timely inspiration because when the rubber meets the road, the community did not disappoint.

In fact, things can happen very FAST...sometimes TOO FAST and before I knew it my head was swimming, combing over what the next career move would be.  Drawing on good advice from several friends I realized quickly that I needed to take take a break before jumping back into the fast lane.  I was fortunate enough to enjoy some much needed time at home, as well as take a few trips to see some old friends.  A key highlight during this time off was observing Andy Banta in his natural habitat - an absolute bucket list item for those who have not had the experience.  Trust me.

And so the journey continues.  I am very excited to announce that I have begun working at River Point Technology with a focus on emerging technology, predictive analytics, and cloud enablement for a strong and ever growing customer base.

During this time, not a resume was transferred nor a suit put on - yet there was plenty of opportunity.  I don't say that to boast but rather to draw upon and confirm that my greatest professional assets are the very people I meet, help, and get help from on a daily basis.  Thank you all.

Securing Files Containing Sensitive Data

Everyday I see more and more sensitive information being saved in places where security is wide open.  Credit Cards, Intellectual Property, Social Security Numbers, Private Certificates - you name it - I have seen it.  So I wanted to build on a series of recent posts to demonstrate how PowerShell can be used to help secure sensitive files. In this workflow we will identify sensitive files using the DataGravity Discovery system, secure them with PowerShell, and validate our updates.


  1. Identify files containing sensitive data with DataGravity and export files names. (CSV format)
  2. Run the ChangeFilePermissions.ps1 PowerShell script
  3. Validate Permission Changes and restore original permissions if required.

Identify Sensitive DAta

In earlier posts I have highlighted easy ways to find sensitive data using the DataGravity search and dynamic tagging.  An example of these sensitive tags are social security numbers lurking in unsecured files.  The simple search below returns this list of files residing on a public share. 

We can export this information out to a CSV file and use it as an input parameter in then next step.


The full ChangePermissions PowerShell script is available on my Powershell repo on GitHub.  Let's look at an example of how to run it:

ChangeFilePermissions.ps1 -ShareFilePath "\\CorporateDrive\Public" -csvFilePath "c:\temp\public.csv" -SensitiveTag "SS" -logFile "C:\Temp\FilesPermissionChanges.log

Script parameters:

-ShareFilePath is the path to share where the files containing sensitive data live.  In our example it is the public share.

-csvFilePath is the path to the exported CSV listing all files, including those that contain sensitive information. This is an export from a DataGravity search.

-SensitiveTag the sensitive tag(s) to look for when selecting which files to secure (Ex. SS, CC, Email Address, etc.)

-logFile is an optional location for where we want to log the activity of what files have been secured.

Securing Files with Sensitive Data

It is very important to place emphasis on the fact that when dealing with automation and the modification of security permissions for anything we must BE CAREFUL and be sure to have our UNDO button handy.  Remember that just as fast as you can automate a process, you can equally as fast have a royal mess on your hands.  Check out my UNDO button later in the post.

In the example below, we are running the ChangeFilePermissions.ps1 script against the public folder to deny access to all files that containing social security numbers.  The script can be modified to include other sensitive tags or a combination of tags.

We secured 30 files on the share by changing their security access.  This is validated by looking at the activity timeline within DataGravity, which confirms that the 'set ACL' operation was performed on the files.


We can also validate the security of the files was updated by attempting to access one of the files and verifying that we no longer have permission to view the file.  This is validated by: i) output log from the script ii) security tab of the file properties.

The UNDO Button

I personally always like to have a back out plan when making large amounts of changes - after all who doesn't like and UNDO button? DataGravity's Discovery Points work very well as my UNDO button and therefore I recommend creating a manual one before running the script.

This gives us the ability to restore any or all of the modified files to their original security settings.  You can see that it is easy to view previous versions for any file and restore if needed, including the original permissions.

I hope you find this walkthrough and script valuable to making your environment more secure.

Finding Duplicate Files with PowerShell

Let's explore a script that leverages DataGravity's file fingerprints to identify the top 10 duplicate files on a given department share or virtual machine.

The Workflow

  1. Export fingerprints and file names to File List (CSV format)
  2. Run the FindDuplicateFiles.ps1 powershell script
  3. List the Top 10 duplicate files and space they are consuming

Files and Fingerprints

DataGravity makes it easy to identify files and their unique SHA-1 fingerprints on a share or virtual machine (VMware or Hyper-V).  In this example we are going to gather the file names and fingerprints in the Sales department share.

The Script:

FindDuplicateFiles.ps1 -csvFilePath "c:\temp\sales.csv" -top 10

Script parameters:

-csvFilePath is the path to the CSV file we downloaded in the first step which contains a list of the files and file fingerprints.  This is an export from DataGravity's Search.

-top optional parameter that if specified will show the top number of duplicate files

Listing and Validating Duplicates

Let's run the script to return the top 10 duplicate files, and their file size.

These can of course be validated as the example below returns duplicate files consuming the most space.

The full powershell script is listed below, and available on my Powershell repo on GitHub.

Finding Duplicate Files using DataGravity FingerPrints

I love it when community feedback brings an idea to life.  I have had the benefit of seeing this first hand many times since joining the DataGravity family first as an Alpha customer, and for the last two years as a Solutions Architect.  The most recent example centers on the topic of duplicate files and stems from a conversation at Tech Field Day Extra - VMWorld 2014.  Several of the delegates were discussing the reality of just how many duplicate files exist within a given file system and how valuable it would be to be able to identify those to provide space and performance savings.  In the words of Hans De Leenheer - 'That is 101, finding what is duplicate'.

Imagine if you will for a minute how many duplicate copies of the exact same file live on a department share, virtual machine or home directory.  Copies of office templates, time reporting spreadsheets, company wide memos, or department powerpoints.  All the exact same files saved to different locations, by different people on the storage system. Howard Marks proposed a use case to find just how many copies of the same marketing powerpoint have been saved.

File Fingerprinting

DataGravity now creates a file fingerprint for every supported file.  A SHA-1 cryptographic hash value of the file provides the file's "fingerprint" as a 40 character hexadecimal value.  Each file has a unique SHA-1 value associated with file contents allowing inspection with far more accuracy then being only able to look at simple file meta-data such as file name and size.

The file fingerprint is unique to the contents of a file to allow the following:

  • Locate a file on any mount point / share / VM based on its unique content.
  • Find all files with identical content, even if the files have different names or reside in different locations.
  • Ensure that a file has not changed over time, by viewing the file fingerprint from different DiscoveryPoints.
  • Ensure that a file containing specific content, as identified by the file SHA-1 value, does not reside on the DataGravity Discovery system.

Finding Duplicates

Finding duplicate files all with the same unique fingerprint is extended to DataGravity's search and discovery. Let's search for all duplicates of the recent marketing presentation using the file's fingerprint.

It is easy to see that indeed there are duplicates of the presentation being saved by multiple people, to multiple locations.  In fact some of these files appear to be copied by the same user into different directories on their home share, but are the EXACT SAME file.

Using the preview function from the search confirms our duplicates.


There is a growing number of examples of how file fingerprinting is useful, many of which I will continue to share here on the blog.  Identifying duplicate files is one of my favorite uses of the feature, mostly because of how useful it is, but also because it demonstrates how DataGravity listens and incorporates feedback to enhance the product.