The MetaData Threat

There is a lot of attention paid these days to the end result of an attack.  The media and bloggers like myself tend to use the sensational impacts of a data breach to get the security message across.  It isn’t safe enough out there to go strolling through the internet with no clothes on.  Pack some bullet-proof pants.

One way to protect yourself from data breach is to eliminate metadata from your personal and business documents.  What the heck is metadata, you may ask, and why is it a threat?  Let’s find out…

When you create any kind of document, it will typically contain some hidden data elements.  When you fire up Microsoft Word, or some similar package, save your draft document, and re-open it at some later point in time to edit, refine and re-save it, you update the hidden metadata.  Text and even comments that you have deleted or changed are not completely removed when you hit delete.  Many of your changes remain hidden away within the file, and can be recalled and read using the right tools.  There are also a number of “attributes” that the document quietly stores in special fields for tracking purposes.  Things like, original author, last 10 saves, original and edit dates, storage location, etc.  Office documents often contain the complete path to the folder in which the file was located during edits and saves, providing the Windows logon name, project names, server names, operating system, and software version used, etc.  In some cases, even information on printers and internally used domain names is available.  This is metadata.

Some file formats are more revealing than others.  Testing shows that PowerPoint files retain more information than PDF files, partly due to the fact that some metadata is discarded during format conversion.  The PDF format is generally considered a more “permanent” format than DOC.  Metadata elements are not necessarily completely removed when a document is converted to PDF format as is commonly thought.  Any metadata from photographs embedded in documents can be very revealing, even if the image is masked or blacked out in the document.  EXIF data usually contains a thumbnail of the original photograph, which often fails to reflect any changes made to the image in the document.  Deliberately obscured areas of a photograph may be clearly visible in the thumbnail.

These bits of information can aid an attacker in discovery efforts when they are attempting to learn about your environment, or preparing a targeted attack plan.  The information contained in a single document may not be enough to build a targeted attack strategy, however, the more intelligence an attacker has, the more likely an attack will succeed.  With enough pieces to the puzzle, it is possible to see a picture from which points of interest can be identified.  This information is useful for carrying out targeted technical or social engineering attacks, allowing attackers to assess the potential vulnerability of a system, targeting a specific user with an exploit for their specific platform or software version.

The best way to protect yourself from this kind of reconnaissance is to remove metadata from your shared or published files as completely as possible, or to fill the metadata spaces with decoy data.  Microsoft has published instructions for manual metadata removal, but I like to use third-party tools to automate and validate these efforts.

More Metadata Removal Information: