General security

Playing Hide-and-Seek with Metadata

Charlie Greenberg
July 17, 2015 by
Charlie Greenberg

We often go on at length about master data, reference data and hierarchies. After all, we discover and analyze these datasets, avoid duplicating them, insist on governance—and advocate their re-use, adoption and enterprise sharing. But as of late there is a national preoccupation with metadata, even viewing its usage and creation, as potentially dangerous and threatening to our civil liberties.

Despite the fact that the Office of Personnel Management is still recovering from an alleged Chinese master data cyber-attack impacting at least 18 million unique social security numbers, it's the manipulation of metadata (or "meta-dahta," the hold-out pronunciation of choice by U.S. cable newscasters and analysts), which outrages and frightens us most. Is metadata getting a bum rap?

What should you learn next?

What should you learn next?

From SOC Analyst to Secure Coder to Security Manager — our team of experts has 12 free training plans to help you hit your goals. Get your free copy now.

Prior to the recent revision of the Patriot Act stipulating phone records can no longer be stored by the NSA, the Electronic Frontier Foundation excoriated the Obama administration with this: "If the President's administration really welcomes a robust debate on the government's surveillance power, it needs to start being honest about the invasiveness of collecting your metadata."

The article refers to a telephonic form of system-generated metadata, which collects call durations, locations and the phone numbers of call recipients. While mining for this kind of metadata gives an incomplete picture, smart forensics and a reverse telephone directory can contextually paint a picture of a highly revealing personal nature.

It's not that metadata once collected is inherently more disturbing or personal than master or reference data. (Master data in the wrong hands is certainly as equally destructive.) Rather it's about the potentially insidious nature of how system metadata or sensor data is generated. As customers, citizens and patients, we consciously and agreeably volunteer our master data—the data about ourselves, our assets and our locations. During an e-commerce transaction, for example, we're offered (or should be offered), options to control the invasive use of our personal data. No, you cannot share my customer information with third parties and so on.

But while cell phones, E-ZPass® and GPS have made our lives easier and more productive, they've also encouraged us to passively and probably unknowingly share personal information, while at the same time relinquishing control over how that information is actually mined, used or analyzed. Of course, there's still the Fourth Amendment, which governs and regulates the conditions under which this kind of system metadata can be leveraged. But who wants to worry about that?

Happily however, there are many more mission-critical but decidedly less-controversial forms of system or technical metadata. For example, the kind of system metadata we typically view about the capacity of a disk drive, the contents of an MP3 or even a data model (such as captions, columns and field length). This kind of system metadata is managed as an asset, rather than being governed for improved accuracy or standardization.

Business metadata, however, while providing user-friendly and practical instructions about other business data and its usage (e.g., master data) must be governed. Business metadata is pervasive, consumed by external users and customers, and often assembled in data dictionaries or glossaries supporting cross-industry standards or business processes, such as clinical testing for life sciences.

Unlike sensor or system metadata, business metadata is subject to the same data-quality challenges that afflict master data, reference data and hierarchies. As required by day-to-day business processes, metadata's creation and management is often based on loose or even ad hoc data standards, resulting in the propagation of redundant, inaccurate and misleading information. Since business metadata is often realized as a series of terms and definitions, a lack of governance and integrity can compromise use cases and functions.

Consider the following fictional examples categorizing government employees. Here, terms can be duplicated but with different definitions:

Term: Retired Employees (age >65 years) vs. Term: Retired Employees (Age 65-79 years)

Or, one precise definition might be linked with multiple terms:

Term: Hispanic Employees (30-58 years) vs. Term: Latino Employees (30-58 years)

But the absence of data governance standards sometimes pales against an organization's inability to locate or inventory previously created metadata assets. This is a result of unregulated data creation in general (think: spreadsheets) that continuously undermines data standardization and in turn undermines knowledge sharing and re-use. In a life sciences company, for example, poor metadata management encumbers re-use, making it difficult to not only share similar clinical test results between departments but ultimately resulting in unnecessary testing and substantial delays in new drug product delivery.

Consequently, before the president was accused of hiding our metadata, he was actually seeking to share government-generated metadata openly among government agencies. On Dec. 8, 2009, the White House "issued an unprecedented Open Government Directive requiring federal agencies to take immediate, specific steps to achieve key milestones in transparency, participation, and collaboration." In other words, it made inter-agency metadata available for sharing, re-use and public consumption.

The ISO/11179 Metadata Registry Standard

Long before the president's directive, however, government agencies were actively creating data dictionaries to govern and manage metadata terms and definitions in support of documentation registries. The U.S. Federal Aviation Administration has organized metadata terms and definitions around aviation forecasts, commercial space data and passengers & cargo, to name a very few examples of defined and managed terminologies. In fact, the FAA has made public thousands of data items, terms, definitions and taxonomies at the Federal Data Registry (www.fdr.gov). The granularity of the agency's painstakingly administered aviation metadata is catalogued on this site.

But with Metadata Registry (MDR) governance of terms and definitions, especially those with cross-agency commonality, comes the need to adopt an agency wide data management standard. The International Standards Organization's (ISO) creation of the 11179 standard is intended not only to help organizations (or government agencies) propagate internal consistency for metadata creation but also establish the means of supporting data interchange between those organizations.

In addition to providing an extremely comprehensive data model or template for common organizational attributes, the ISO/11179 implementation is based on a six-part process or structure that methodically classifies data elements and attributes—while maintaining a strict data governance six-part framework for administering or workflowing the approval of new elements:

Part 1 – Framework – Contains an overview of the standard and describes the basic concepts.

Part 2 – Classification– Describes how to manage a classification scheme in a metadata registry.

Part 3 – Registry meta-model and basic attributes– Provides the basic conceptual model for a metadata registry, including the basic attributes and relationships.

Part 4 – Formulation of data definitions– Rules and guidelines for forming quality definitions

for data elements and their components.

Part 5 – Naming and identification principles– Describes how to form conventions for naming data elements and their components.

Part 6 – Registration– Specifies the roles and requirements for the registration process.

Once having embraced the ISO/11179 standard (now in its third edition), organizations need to automate and orchestrate the metadata governance process as thoroughly as possible.

Consequently (as with the case of webMethods OneData), a flexible Master Data Management (MDM) solution architecture provides most of what's necessary for supporting a metadata registry. As with MDM, users will probably favor tools that provide the ISO/11179 standard as an out-of-the-box template or model.

When considering 11179 compliant Metadata Registry tools, there are Six key functionalities that should be presented, out-of-the-box:

  • ISO/11179 template, with the flexibility to extend or change the model, as necessary
  • A metadata hub supporting merging and matching in order to identify redundancy
  • Stewardship controls providing role-based security and permission levels for changing and administering metadata.
  • Workflow/approval process that delegates responsibility for controlling and governing the creation and updating of new metadata.
  • Configurable and productive accessible views of metadata that mask the technical complexity of the 11179 standard for business users.
  • Enterprise integration allowing the acquisition and distribution of metadata across all relevant systems.

For additional information regarding Software AG's Metadata Registry tool, please click here.

Charlie Greenberg
Charlie Greenberg

Charlie Greenberg is Software AG's Sr. Global Product Marketing Manager for webMethods OneData MDM and has supported the OneData product since March of 2008. In addition to being a speaker and panelist at events sponsored by DAMA, IDMA, Data Management Forum, FIMA and the MDM Institute, Charlie's writings on MDM can be viewed on "Database Trends & Analysis", "Sand Hill", "Dashboard Insights" and Software AG’s “Reality Check​."