Securing Files Containing Sensitive Data

Everyday I see more and more sensitive information being saved in places where security is wide open.  Credit Cards, Intellectual Property, Social Security Numbers, Private Certificates - you name it - I have seen it.  So I wanted to build on a series of recent posts to demonstrate how PowerShell can be used to help secure sensitive files. In this workflow we will identify sensitive files using the DataGravity Discovery system, secure them with PowerShell, and validate our updates.


  1. Identify files containing sensitive data with DataGravity and export files names. (CSV format)
  2. Run the ChangeFilePermissions.ps1 PowerShell script
  3. Validate Permission Changes and restore original permissions if required.

Identify Sensitive DAta

In earlier posts I have highlighted easy ways to find sensitive data using the DataGravity search and dynamic tagging.  An example of these sensitive tags are social security numbers lurking in unsecured files.  The simple search below returns this list of files residing on a public share. 

We can export this information out to a CSV file and use it as an input parameter in then next step.


The full ChangePermissions PowerShell script is available on my Powershell repo on GitHub.  Let's look at an example of how to run it:

ChangeFilePermissions.ps1 -ShareFilePath "\\CorporateDrive\Public" -csvFilePath "c:\temp\public.csv" -SensitiveTag "SS" -logFile "C:\Temp\FilesPermissionChanges.log

Script parameters:

-ShareFilePath is the path to share where the files containing sensitive data live.  In our example it is the public share.

-csvFilePath is the path to the exported CSV listing all files, including those that contain sensitive information. This is an export from a DataGravity search.

-SensitiveTag the sensitive tag(s) to look for when selecting which files to secure (Ex. SS, CC, Email Address, etc.)

-logFile is an optional location for where we want to log the activity of what files have been secured.

Securing Files with Sensitive Data

It is very important to place emphasis on the fact that when dealing with automation and the modification of security permissions for anything we must BE CAREFUL and be sure to have our UNDO button handy.  Remember that just as fast as you can automate a process, you can equally as fast have a royal mess on your hands.  Check out my UNDO button later in the post.

In the example below, we are running the ChangeFilePermissions.ps1 script against the public folder to deny access to all files that containing social security numbers.  The script can be modified to include other sensitive tags or a combination of tags.

We secured 30 files on the share by changing their security access.  This is validated by looking at the activity timeline within DataGravity, which confirms that the 'set ACL' operation was performed on the files.


We can also validate the security of the files was updated by attempting to access one of the files and verifying that we no longer have permission to view the file.  This is validated by: i) output log from the script ii) security tab of the file properties.

The UNDO Button

I personally always like to have a back out plan when making large amounts of changes - after all who doesn't like and UNDO button? DataGravity's Discovery Points work very well as my UNDO button and therefore I recommend creating a manual one before running the script.

This gives us the ability to restore any or all of the modified files to their original security settings.  You can see that it is easy to view previous versions for any file and restore if needed, including the original permissions.

I hope you find this walkthrough and script valuable to making your environment more secure.

Finding Duplicate Files with PowerShell

Let's explore a script that leverages DataGravity's file fingerprints to identify the top 10 duplicate files on a given department share or virtual machine.

The Workflow

  1. Export fingerprints and file names to File List (CSV format)
  2. Run the FindDuplicateFiles.ps1 powershell script
  3. List the Top 10 duplicate files and space they are consuming

Files and Fingerprints

DataGravity makes it easy to identify files and their unique SHA-1 fingerprints on a share or virtual machine (VMware or Hyper-V).  In this example we are going to gather the file names and fingerprints in the Sales department share.

The Script:

FindDuplicateFiles.ps1 -csvFilePath "c:\temp\sales.csv" -top 10

Script parameters:

-csvFilePath is the path to the CSV file we downloaded in the first step which contains a list of the files and file fingerprints.  This is an export from DataGravity's Search.

-top optional parameter that if specified will show the top number of duplicate files

Listing and Validating Duplicates

Let's run the script to return the top 10 duplicate files, and their file size.

These can of course be validated as the example below returns duplicate files consuming the most space.

The full powershell script is listed below, and available on my Powershell repo on GitHub.

Finding Duplicate Files using DataGravity FingerPrints

I love it when community feedback brings an idea to life.  I have had the benefit of seeing this first hand many times since joining the DataGravity family first as an Alpha customer, and for the last two years as a Solutions Architect.  The most recent example centers on the topic of duplicate files and stems from a conversation at Tech Field Day Extra - VMWorld 2014.  Several of the delegates were discussing the reality of just how many duplicate files exist within a given file system and how valuable it would be to be able to identify those to provide space and performance savings.  In the words of Hans De Leenheer - 'That is 101, finding what is duplicate'.

Imagine if you will for a minute how many duplicate copies of the exact same file live on a department share, virtual machine or home directory.  Copies of office templates, time reporting spreadsheets, company wide memos, or department powerpoints.  All the exact same files saved to different locations, by different people on the storage system. Howard Marks proposed a use case to find just how many copies of the same marketing powerpoint have been saved.

File Fingerprinting

DataGravity now creates a file fingerprint for every supported file.  A SHA-1 cryptographic hash value of the file provides the file's "fingerprint" as a 40 character hexadecimal value.  Each file has a unique SHA-1 value associated with file contents allowing inspection with far more accuracy then being only able to look at simple file meta-data such as file name and size.

The file fingerprint is unique to the contents of a file to allow the following:

  • Locate a file on any mount point / share / VM based on its unique content.
  • Find all files with identical content, even if the files have different names or reside in different locations.
  • Ensure that a file has not changed over time, by viewing the file fingerprint from different DiscoveryPoints.
  • Ensure that a file containing specific content, as identified by the file SHA-1 value, does not reside on the DataGravity Discovery system.

Finding Duplicates

Finding duplicate files all with the same unique fingerprint is extended to DataGravity's search and discovery. Let's search for all duplicates of the recent marketing presentation using the file's fingerprint.

It is easy to see that indeed there are duplicates of the presentation being saved by multiple people, to multiple locations.  In fact some of these files appear to be copied by the same user into different directories on their home share, but are the EXACT SAME file.

Using the preview function from the search confirms our duplicates.


There is a growing number of examples of how file fingerprinting is useful, many of which I will continue to share here on the blog.  Identifying duplicate files is one of my favorite uses of the feature, mostly because of how useful it is, but also because it demonstrates how DataGravity listens and incorporates feedback to enhance the product.

Deleting Dormant Data with Powershell

One of my favorite forms of managing data is to DELETE it.  One of my favorite ways to delete things is with SPEED and CONFIDENCE.

I have been quoted as saying that "DELETE is the best form of de-duplication" - in fact it is 100% dedupe. Some of the best data to DELETE is the stuff that no one is using: dormant data.  So putting my automation hat on, let's explore a script that helps DELETE things quickly but still provides us with the ability to UNDO using DataGravity File Analytics for Dormant Data.

The workflow:

  1. Export Dormant Data to CSV File List
  2. Run the ArchiveDormantData.ps1 powershell script
  3. Optionally create an archive txt file to notify it has been deleted
  4. Validate space savings and recover individual files if required.  

Identify Dormant Data:

DataGravity makes it easy to identify and download a list of all the dormant data.  In this example we are going to grab anything that hasn't been updated, read or touched within a year or more on the Marketing share.

The script:

ArchiveDormantData.ps1 -ShareFilePath "\\CorporateDrive\Marketing" -csvFilePath "c:\temp\Marketing.csv" -logFile "C:\Temp\DormantDataDelete.log" -ArchiveStub

Script parameters:

-ShareFilePath is the path to the data where the dormant data lives to be deleted.  In our example it is the Marketing share.

-csvFilePath is the path to the CSV file we downloaded in the first step which contains a list of the files to be deleted.  This is an export from the DataGravity's Dormant Data.

-logFile is an optional location for where we want to log the activity of what has been removed.

-ArchiveStub optional parameter that if specified will create a TXT stub in the place of the deleted file

Validate and Recover if necessary (The undo button)

If you get anything wrong or delete the wrong thing, it is always handy to have an UNDO button.  There are several ways to do that using backup/recovery tools, and in this case since we are already using DataGravity, we can crate a manual discovery point to changes and restore any files if required.

The full powershell script is listed below, and available on my Powershell repo on GitHub.  Big thanks to Will Urban for the heavy lifting on this one.  Happy DELETING.