PRGX



Perspectives

Discovering the Value of Unstructured Data

Orrin Edenfield, Business Information Analyst, Jr

“Fortunately, there are many tools and approaches available in the market to manage unstructured datasets”

Gartner research estimates that 80% of business is conducted on unstructured data. This may be shocking considering the amount of resources and energy put towards trying to better understand and analyze the vast amount of structured data that exists in enterprises today. While there is not a lack of solutions for analyzing unstructured data, the goal of analyzing this data is to find those pieces of the unstructured data that are relevant to the structured dataset.

Catalog the Data:

The first step to analyzing unstructured data is to catalog the data. Any data element or document that is not stored in a structured database is a prime candidate to be included in an unstructured data catalog. Unstructured data can range from word processing documents, spreadsheets, report files, communications data and even email. Some considerations when cataloging unstructured data is the age of the data as well as the source from which it came. Just about everyone in an enterprise generates unstructured data, but experience shows that certain areas or departments often will have a greater likelihood producing valuable unstructured data that impacts the structured dataset.  But, there is a fine line to balance. Many organizations find themselves coping with a vast amount of seemingly valuable unstructured data that may be far too old to be relevant to an up-to-the-minute OLTP dataset.

Consolidate the Data:

Consolidation is the next key step in any unstructured data analysis approach. While some documents may have a similar structure (email data for instance) the relative value of each document may not be known until it is properly managed. An organization may have terabytes of similarly formatted reports that span multiple years.  While the data might be stored in a structured format, the amount of time and resources to review and evaluate this data - if its even possible - to find its value may outweigh the benefits. Rather than trying to make the reports fit into a structure, it is best to treat all unstructured or semi-structured data the same and consolidate it all into one management system.

Manage the Data:

Unstructured data must be properly managed.  Fortunately, there are many tools and approaches available in the market to manage unstructured datasets. Many of these data analysis tools use a searchable database of business metadata, combined with versioning and workflow capabilities. An important goal is to categorize and reduce the size of the dataset to a size that is manageable given the resources available. While there is great value in the unstructured dataset there is also a large amount of data that is not relevant to structured transactions. The better the management system - the better chances are to find all relevant data.

Structured data is the blood of an enterprise; it is where defined facts live and do the job of running the business. Unstructured data works behind the scenes many times to help make the structured data what it is. True value is created when new insights behind the structured dataset can be exposed using the unstructured dataset.