Organizations make use of Big Data, but are they leveraging the data properly? What are the common problems that companies come across Big Data services? And, what are the solutions to those problems?
- 1 Big data problem No. 1: Loose Integration
- 2 Big data problem No. 2: Undefined goals
- 3 Big data problem No. 3: The skills gap
- 4 Big data problem No. 4: The tech generation gap
Big data problem No. 1: Loose Integration
Because the organizations are working in silos or creating data lakes that are just data swamps, they’re just scratching the surface of what they could accomplish, said Alan Morrison, a senior research fellow with PwC. “They don’t understand all the relationships in data that need to be mined or inferred and made explicit so machines can adequately interpret that data. They need to create a knowledge graph layer so that machines can interpret all the instance data that’s mapped underneath. Otherwise, you’ve just got a data lake that’s a data swamp,” he said.
Big data problem No. 2: Undefined goals
You would think most people undertaking a big data project would have a goal in mind, but a surprising number don’t. They just launch the project with the goal as an afterthought.
“You have to scope the problem well. People think they can connect structured and unstructured data and get the insight you need. You must define the problem well upfront. What’s the insight you want to get? It’s having a clear definition of the problem and defining it well upfront,” said Ray Christopher, product marketing manager with Talend, a data-integration software company.
Joshua Greenbaum, a principal analyst at Enterprise Application Consulting, said part of what has bedeviled both big data and data warehousing projects is the main guiding criteria is typically the accumulation of large amounts of data and not the solving of discrete business problems.
“If you pull together large amounts of data you get a data dump. I call it a sanitary landfill. Dumps are not a good place to find solutions,” Greenbaum said. “I always tell clients to decide what discrete business problem needs to be solved first and go with that, and then look at the quality of data available and solve the data problem once the business problem has been identified.”
“Why do most big data projects fail? For starters, most big data project leaders lack vision,” said PwC’s Morrison. “Enterprises are confused about big data. Most just think about numerical data or black box NLP and recognition engines and that do simple text mining and other kinds of pattern recognition.”
Big data problem No. 3: The skills gap
Too often, companies think the in-house skills they have built for data warehousing will translate to big data when that is not the case. For starters, data warehousing and big data handle data in the total opposite fashion: Data warehousing does schema on write, which means the data is cleaned, processed, structured, and organized before it ever goes into the data warehouse.
In big data, data is accumulated and schema on reading is applied, where the data is processed as it is read. So, if data processing goes backward from one methodology to another, you can bet that skills and tools are as well. And that’s just one example.
“Skills are always going to be a challenge. If we’re talking about big data 30 years from now, there will still be a challenge,” Gartner analyst, Nick Heudecker said. “A lot of people hang their hat on Hadoop. My clients are challenged by finding Hadoop resources. Spark is a little better because that stack is smaller and easier to train up. Hadoop is dozens of software components.”
Big data problem No. 4: The tech generation gap
Big data projects frequently take from older data silos and try to merge them with new data sources, like sensors or web traffic or social media. That’s not entirely the fault of the enterprise, which collected that data in a time before the idea of big data analytics, but it is a problem, nonetheless.
“Almost the biggest skill missing is the skill to understand how to blend these two stakeholders to get them to work together to solve complex problems,” consultant Greenbaum said. “Data silos can be a barrier to big data projects because there is no standard anything. So, when they start to look at planning, they find these systems have not been implemented with any fashion that this data would be reused,” he said.
“With different architectures, you need to do processing differently,” said Talend’s Christopher. “Tech skills and architecture differences were a common reason why you can’t take current tools for an on-premises data warehouse and integrate it with a big data project—because those technologies will become too costly to process new data. So, you need Hadoop and Spark, and you need to learn new languages.”
Big data solution No. 1: Plan ahead
It’s an old cliché but applicable here: If you fail to plan, plan to fail. “Successful companies are the ones who have an outcome,” Gartner’s Heudecker said. “Pick something small and achievable and new. Don’t take legacy use case because you get limitations.”
“They need to think about the data first and model their organizations in a machine-readable way so the data serves that organization,” PwC’s Morrison said.
Big data solution No. 2: Work Together
All too often, stakeholders are left out of big data projects—the very people who would use the results. If all the stakeholders collaborate, they can overcome many roadblocks, Heudecker said. “If the skilled people are working together and working with the business side to deliver an actionable outcome, that can help,” he said.
Heudecker noted that the companies succeeding in big data invest heavily in the necessary skills. He sees this the most in data-driven companies, like financial services, Uber, Lyft, and Netflix, where the company’s fortune is based on having good, actionable data.
“Make it a team sport to help curate and collect data and cleanse it. Doing that can increase the integrity of the data as well,” Talend’s Christopher said.
Big data solution No. 3: Focus
People seem to have the mindset that a big data project needs to be massive and ambitious. Like anything you are learning for the first time, the best way to succeed is to start small then gradually expand in ambition and scope.
“They should very narrowly define what they are doing,” Heudecker said. “They should pick a problem domain and own it, like fraud detection, micro segmenting customers, or figuring out what new product to introduce in a Millennial marketplace.”
“At the end of the day, you have to ask the insight you want or the business process to be digitized,” said Christopher. “You don’t just throw technology at a business problem; you have to define it upfront. The data lake is a necessity, but you don’t want to collect data if it’s not going to be used by anyone in the business.”
In many cases, that also means not overinflating your own company. “In every company I’ve ever analyzed, there are only a few hundred key relationships and concepts that the entire organization runs on. Once you understand them, you realize all these million distinctions are just small variations of those few hundred important points,” PwC’s Morrison said. “You notice that many of those variations aren’t variations at all. They’re the same things with different names, different labels, or different structures,” he added.
Big data solution No. 4: Jettison the legacy
While you may want to use those terabytes of data collected and stored in your data warehouse, the fact is you might be better served just focusing on newly gathered data in storage systems designed for big data and designed to be un-siloed.
“I would advise not necessarily being beholden to an existing technology infrastructure just because your company as a license for it,” consultant Greenbaum said. “Often, new complex problems may require new complex solutions. Falling back on old tools around the corporation for a decade isn’t the right way to go. Many companies use old tools, and it kills the project.”
Morrison noted, “Enterprises need to stop getting their feet tangled in their underwear and just jettison the legacy architecture that creates more silos.” He also said they need to stop expecting vendors to solve their complex system problems for them. “For decades, many seem to assume they can buy their way out of a big data problem. Any big data problem is a systemic problem. When it comes to any complex systems change, you must build your way out,” he said.