Skip to main content

The Office Document Metadata You Forget to Remove Before Sharing

A DOCX, XLSX, or PPTX is a ZIP of XML that quietly carries author names, edit history, comments, and tracked changes. Inspect and clear them before you share.

Published By Li Lei
#office #metadata #privacy #docx #security

The Office Document Metadata You Forget to Remove Before Sharing

You finish a proposal in Word, export nothing, and email the .docx straight to a client. The document looks clean. The metadata stapled to it is not. Inside that one file sits the name of every person who touched it, how many times it was saved, the company it was authored under, and sometimes comments and tracked changes you thought were deleted. None of that shows up in the printed page, but all of it travels with the file.

The reason is structural, and once you see it you can never unsee it.

A DOCX Is a ZIP File Wearing a Costume

A modern .docx, .xlsx, or .pptx is not a single opaque blob. It is an Open XML package, which is a fancy way of saying it is a ZIP archive with a specific folder layout. Rename report.docx to report.zip, double-click it, and you can browse the parts directly: word/document.xml holds the text, word/media/ holds images, docProps/core.xml holds the document properties, and [Content_Types].xml describes the whole package.

That docProps/core.xml part is where the surprises live. It records fields like:

  • creator and lastModifiedBy (author names)
  • company (often the org you copy-pasted a template from)
  • revision count (how many save cycles the file went through)
  • created and modified timestamps
  • application name and version

A second part, docProps/app.xml, can list manager names, template titles, and document statistics. And if anyone left comments or used tracked changes, those sit in word/comments.xml and inline <w:ins> / <w:del> markup inside document.xml. Accepting all changes in the Word UI does not always strip the comment threads if you only toggled the display, and old revision markup can survive a careless export. The XML carries author, company, revision count, comments, and tracked changes, and every one of those can leak the moment you send the file.

For spreadsheets it can be worse. An .xlsx may hold an xl/externalLinks/ folder pointing at network paths from the machine it was built on, hidden sheets, and defined names referencing data you never meant to ship.

What Actually Leaks, in Practice

The classic failure mode is the "we resubmitted the same template" leak. A company sends a bid, and the creator field still says the name of a rival firm because the file started life as their proposal. Newsrooms have outed document sources this way. Legal teams have accidentally disclosed opposing counsel's names left in lastModifiedBy. None of it required hacking. It required opening a ZIP.

Comments and tracked changes are the other big one. A draft contract with a margin note like "we can probably drop to 8% if they push back" is the kind of sentence that ends a negotiation badly. If that note lives in comments.xml and you send the file before clearing it, the other side can read your floor.

A Worked Example: Finding the Author and Tracked Changes Before You Send

Say you are about to email q3-proposal.docx to a client. Run it through the Office Document Inspector, which reads the package as an Open XML ZIP locally and prints a Markdown report. A realistic report for a file with leftovers looks like this:

# Office Document Inspector report
Detected package type: Word (DOCX)

Parts:
- word/document.xml            142 KB
- word/comments.xml            6 KB      ← comments present
- word/media/image1.png        1.4 MB
- docProps/core.xml            2 KB
- docProps/app.xml             1 KB

Flags:
- Comments part present (word/comments.xml)
- Custom XML detected (customXml/)
- Tracked-change markup found in document.xml

The two lines that matter are the comments part and the tracked-change markup. To read the actual author, open docProps/core.xml:

<dc:creator>Acme Legal Drafting Team</dc:creator>
<cp:lastModifiedBy>j.zhang@partner-firm.com</cp:lastModifiedBy>
<cp:revision>14</cp:revision>

There is your problem in three lines: the file was created by a different team, last edited by someone at a partner firm, and saved fourteen times. Now you know to do three things before sending: in Word, run File → Info → Check for Issues → Inspect Document, remove document properties and personal information, and accept-or-reject every tracked change with the markup actually turned on. Re-run the inspector. When comments.xml is gone and no tracked-change markup is flagged, the file is safe to send.

My Own Embarrassing Near-Miss

I learned this the unglamorous way. I was packaging a pricing deck as a .pptx to send to a partner, and out of habit I unzipped it first to check why it was 40 MB. The size came from a stock photo nobody had compressed. But while I was in there, I noticed ppt/slideLayouts/ still referenced an internal-only layout master, and docProps/core.xml listed a colleague who had left the company a year earlier as the last editor. Neither was catastrophic, but the second one would have looked odd to the partner and the first leaked our internal template naming. I cleared both, re-exported, and shipped a deck that said nothing it shouldn't. Now I never send an Office file outside the company without looking inside the ZIP first.

Inspection Stays on Your Machine

The thing that makes this practical is that you never have to hand a sensitive draft to a third-party website to check it. The Office Document Inspector parses the package directory entirely in your browser. The bytes of your contract, your salary spreadsheet, or your unreleased deck never leave your machine. The report it produces lists the internal filenames inside the package, so treat the report itself as sensitive, but the document content is never uploaded anywhere.

That local-only design is the whole point. Auditing a file for leaks should not itself create a new place for the file to leak.

A Pre-Send Checklist

Before any Office file goes to someone outside your team:

  1. Inspect the package and read the flags. Comments, tracked changes, custom XML, and external links are the ones to act on.
  2. Open docProps/core.xml and confirm creator, lastModifiedBy, and company say what you want them to say.
  3. Use the built-in Document Inspector in Word, Excel, or PowerPoint to strip properties and personal data.
  4. Resolve tracked changes with markup visible, then delete comment threads.
  5. Re-export and re-inspect. A clean report is your sign-off.

If you are also deduplicating or verifying a batch of files you are about to send, the File Hash Calculator pairs well with this workflow: inspect for metadata first, then hash the cleaned file so you can prove later which exact version you shared.

Metadata leaks are not exotic attacks. They are the default behavior of a file format that bundles your edit history with your words. The fix is not paranoia. It is a thirty-second habit of looking inside the ZIP before you hit send.


Made by Toolora · Updated 2026-06-13