IT Management: Drilling for data

Big data is poised to overwhelm the capabilities of modern business to analyze and manage, but there are both challenges and opportunities.

Rob Sobers

The phenomenon of human-generated big data encompasses the petabytes and exabytes of structured and unstructured data generated by today’s enterprises. The big question about big data remains: Is this going to be another oil rush with a few winners and many losers, or will it enrich us all?

Human-generated content includes all the files and e-mails we create every day. There are presentations, word processing documents, spreadsheets, audio files and other documents we generate hour-by-hour. These are the files that take up the vast majority of digital storage space in most organizations. You have to keep them for significant amounts of time and they have huge amounts of metadata associated with them.

Human-generated content is enormous, and its metadata is even bigger. Metadata is the information about a file—who created the file and when, what type of file it is, what folder it’s stored in, who has been reading it and who has access. The content and metadata together make up the universe of human-generated big data.

Data avalanche

The problem is most large organizations are not yet equipped with the tools to exploit human-generated big data. A recent survey of more than 1,000 Internet experts and other Internet users, published by the Pew Research Center and the Imagining the Internet Center at Elon University, concluded the world might not be ready to properly handle and understand big data.

These experts have come to the conclusion that the huge quantities of data—which they term “digital exhaust”—that will be created by the year 2020 could very well enhance productivity, improve organizational transparency and expand the frontier of the “knowable future.” However, they are also concerned about who has access to this information, who controls that access, and whether government or corporate entities will use this information wisely.

According to the survey: “Human and machine analysis of big data could improve social, political and economic intelligence by 2020. The rise of what is known as big data will facilitate things like real-time forecasting of events; the development of ‘inferential software’ that assesses data patterns to project outcomes; and the creation of algorithms for advanced correlations that enable new understanding of the world.”

Of those surveyed, 39 percent of the Internet experts agreed with the counter-argument to the benefits of big data. This countering viewpoint posits: “Human and machine analysis of big data will cause more problems than it solves by 2020. The existence of huge data sets for analysis will engender false confidence in our predictive powers and will lead many to make significant and hurtful mistakes. Moreover, analysis of big data will be misused by powerful people and institutions with selfish agendas who manipulate findings to make the case for what they want.”

One of the study’s participants was entrepreneur Bryan Trogdon. “Big data is the new oil,” he says. “The companies, governments and organizations that are able to mine this resource will have an enormous advantage over those that don’t. With speed, agility, and innovation determining the winners and losers, big data lets us move from a mindset of ‘measure twice, cut once’ to one of ‘place small bets fast.’”

Another survey respondent, Jeff Jarvis, a professor and blogger, says: “Media and regulators are demonizing big data and its supposed threat to privacy. Such moral panics have occurred often thanks to changes in technology. But the moral of the story remains: There is value to be found in this data, value in our newfound ability to share.

“Google’s founders have urged government regulators not to require them to quickly delete searches because, in their patterns and anomalies, they’ve found the ability to track the outbreak of the flu before health officials could and they believe that by similarly tracking a pandemic, millions of lives could be saved,” Jarvis continues. “Demonizing data, big or small, is demonizing knowledge, and that is never wise.”

Sean Mead is director of analytics at Mead, Mead & Clark, Interbrand. “Large, publicly available data sets, easier tools, wider distribution of analytics skills, and early stage artificial intelligence software will lead to a burst of economic activity and increased productivity comparable to that of the Internet and PC revolutions of the mid- to late-1990s,” Mead says. “Social movements will arise to free up access to large data repositories, to restrict the development and use of AI, and to ‘liberate’ AI.”

Beyond analysis

These are interesting arguments, and they do start to get to the heart of the matter. Our data sets have grown beyond our ability to analyze and process them without sophisticated automation. We have to rely on technology to analyze and cope with this enormous wave of content and metadata.

Analyzing human-generated big data has enormous potential. Furthermore, harnessing the power of metadata has become essential to manage and protect human-generated content. File shares, e-mails and intranets have made it so easy for business users to save and share files that most organizations now have more human-generated content than they can sustainably manage and protect using small-data thinking.

Many businesses face real problems because they can no longer answer questions they used to be able to answer 15 years ago on smaller, static data sets. These types of questions include: Where does critical data reside? Who has access? Who should have access to it? As a consequence, industry researcher IDC estimates that only half the data that should be protected is protected.

The problem is compounded with cloud-based file sharing. These services create yet another growing store of human-generated content requiring management and protection. And cloud content lies outside corporate infrastructure with different controls and management processes, adding additional layers of complexity.

David Weinberger of Harvard University’s Berkman Center says, “We’re just beginning to understand the range of problems big data can solve, even though it means acknowledging that we’re less unpredictable, free, madcap creatures than we’d like to think. If harnessing the power of human-generated big data can make data protection and management less unpredictable, free and madcap, organizations will be grateful.”

The concept of human-generated big data will certainly pose an equal measure of challenges and opportunities for businesses over the next few years.

Rob Sobers

Rob Sobers* is a designer, Web developer and technical strategist for Varonis Systems. He writes a popular blog on software development and security at accidentalhacker.com and is coauthor of the e-book, “Learn Ruby the Hard Way” (ruby.learncodethehardway.org, 2011). He is a 12-year technology industry veteran and prior to joining Varonis, Sobers held positions in software engineering, design and professional services.*

Related Content