在对分布式存储系统进行分类前，有必要了解一下分布式存储系统所存储数据结构。每种分布式存储系统，都用于存储一种或多种数据结构的数据。根据数据是否具有结构，可将数据分为两类：结构化数据（Structured Data）和非结构化数据（Unstructured Data）。而结构化数据中又包含一类特殊的数据——半结构化数据（Semi-structured Data）。下述的“结构化数据”均指不包含“半结构化数据”的其他所有结构化数据。
The data in those neat columns and rows is what’s referred to as structured data.
Structured data is comprised of clearly defined data types whose pattern makes them easily searchable.
Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well.
Gartner defines unstructured data as content that does not conform to a specific, pre-defined data model. It tends to be the human-generated and people-oriented content that does not fit neatly into database tables. Within the enterprise unstructured content takes many forms, chief amongst which are business documents (reports, presentations, spreadsheets and the like), email and web content.
Unstructured data is essentially everything else. Unstructured data has internal structure but is not structured via pre-defined data models or schema. It may be textual or non-textual, and human- or machine-generated.
Unstructured data is heterogeneous and variable in nature and comes in many formats, including text, document, image, video, and more. Unstructured data is growing faster than structured data. According to a 2011 IDC study, it will account for 90 percent of all data created in the next decade. As a new, relatively untapped source of insight, unstructured data analytics can reveal important interrelationships that were previously difficult or impossible to determine.
Typical human-generated unstructured data includes:
Text files: Word processing, spreadsheets, presentations, email, logs.
Email: Email has some internal structure thanks to its metadata, and we sometimes refer to it as semi-structured. However, its message field is unstructured and traditional analytics tools cannot parse it.
Social Media: Data from Facebook, Twitter, LinkedIn.
Website: YouTube, Instagram, photo sharing sites.
Mobile data: Text messages, locations.
Communications: Chat, IM, phone recordings, collaboration software.
Media: MP3, digital photos, audio and video files.
Business applications: MS Office documents, productivity applications.
Typical machine-generated unstructured data includes:
Satellite imagery: Weather data, land forms, military movements.
Scientific data: Oil and gas exploration, space exploration, imagery, atmospheric data.
Digital surveillance: Surveillance photos and video.
Sensor data: Traffic, weather, oceanographic sensors.
Semi-structured data is a form of structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables, but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as self-describing structure.
Such information is typically “semistructured” in that there is some structure in the documents but not exactly a formal structure such as that imposed by a database schema or an XML DTD.
http://infolab.stanford.edu/~maluf/papers/ideas05.pdf Semi-structured Data Management in the Enterprise: A Nimble, HighThroughput, and Scalable Approach
https://www.w3schools.com/xml/xml_dtd_intro.asp XML DTD intro
Semi structured data does not have the same level of organization and predictability of structured data. The data does not reside in fixed fields or records, but does contain elements that can separate the data into various hiearchies.