TechTalk Daily
March 29, 2018 by Daniel W. Rasmus
The debate over the relationship between data, information, knowledge and wisdom continues to evolve as new forms of representation emerge. My personal perspective runs contrary to IT and philosophical definitions. I find for knowledge management, and many IT applications, my definitions provide a clarity that doesn’t exist in conflated definitions of data and information, knowledge and wisdom. This discussion will restrict itself to technology though extensions to include biology will become increasingly important.
Outside the presence of knowledge, everything is data. Raw data, or summaries of data, including charts and graphs, commonly called information, isn’t information if not actively informing anything.
Data in the absence of analysis or another form of engagement proves meaningless. Information requires transformation and awareness.
Data is passive. Bits, data stored in database rows, e-mail, spreadsheets, word processing documents and images all create a collection that promises the potential for “information.” No matter how organized these collections, data not being viewed becomes inert.
Inert data does not imply a lack of value. Personal interpretation or market forces ultimately determine the absolute value of any data. If the data, when used creates value, then it holds that value regardless of if it is being active or not. Even data considered highly valuable, however, only manifest that value while it is being used.
Most data is incomplete. Transaction data from an online store much may be gleaned about user preferences and buying trends, but transactions continue to occur. No inference will ever be informed by the very latest information. As soon as an extract occurs, new records create more data.
The only data that may be considered complete would be a model for which some arbitrary definition of completeness exists, such as a computer emulator. When emulator runs, the data being interpreted represents a complete model of the target system. No additional information would make the data better, so the data can be considered complete.
Most data is historical. Beyond data that can be considered complete, all data is historical. There is no such thing as realtime data. Even data delivered in a continuous stream arrives at a process aged by pico-, nano- or microseconds. New data sits behind it in the stream that is more recent than the data currently being interpreted.
Data may be copied and exist in two states at the same time. If data is being used to offer navigation guidance for example, and that data sits simultaneously on a server not being used, the instances of that data are inert and active at the same time. This remains true for datasets pulled into a computer’s working memory as the input to an application or algorithm, or if a person actively reasoning over the data. With current computing architectures, many copies of the same data may exist simultaneously in inert and actively used states. Rarely is the primary source data employed for anything beyond being the active repository for additional accumulation or individual record changes, which renders it perpetually inert.
Stored procedures act as a knowledge when they execute, otherwise, they are also data.
Data should not be trusted without context. Data, for instance, that modeled the toy market ahead of the demise of Toys’R Us can no longer predict the activity of the toy market because no data exists about consumer behavior, supply chains to other related activities or processes in a post-Toys’R Us world. New models will need to be built as data arrives to inform them. Data, therefore, requires context. Data stewards should clearly mark toy store market data needs to as pre and post-Toys’R Us.
There is no data from the future. All predictions are guesses.
Big Data modifies none of the above attributes of data. Big Data simply applies a label to large collections of data. Websites, social media platforms, and transaction systems constantly collect data, and they do so in a structured way that makes that data easy to use for queries, reports, and visualizations. These massive collections have become known as Big Data.
Nothing in Big Data implies any structures or attributes that distinguish it from other data. Machine learning (ML) benefits from Big Data because the ML algorithms require substantial training sets to reach reasonable accuracy in their recognition of patterns. Big Data remains relatively narrow in each instance of machine learning and therefore does not contribute to any synthesis of cognitive abilities by computers. More data about a topic does not increase the likelihood of a computer making inferences about another topic.
Continue reading the full article at Serious Insights >
Daniel W. Rasmus, the author of Listening to the Future, is a strategist and industry analyst who has helped clients put their future in context. Rasmus uses scenarios to analyze trends in society, technology, economics, the environment, and politics in order to discover implications used to develop and refine products, services and experiences. He leverages this work and methodology for content development, workshops and for professional development.