In 2013, Judith Hurwitz and other market experts heralded the beginning of the Big Data era. They realized that “big data allows organizations to store, manage and manipulate large amounts of data at the right speed and at the right time to obtain the right information.”
They were honest that Big Data did not represent a single technology and was instead a heterogeneous set of data management technologies with its roots in several previous technological transformations.
The question now is: Where is Big Data today? And what does it take to mature your application?
To be fair, recent analyst polls have found that big data has yet to deliver great business results. Despite all the hype, most corporate employees still don’t have easy access to information to get their jobs done. The problem remains centered on getting the right information to the right people at the right time as the number of information sources, uses and users increases.
Data warehouses vs. data lakes vs. data structure
To house all this data, storage and management systems have emerged, such as the data warehouse, the data lake and the data structure, “organizations will need some form of all three,” says the former CIO. Tim McBreen. “But a Data Fabric will be required as an umbrella for all data integration, management and governance across the enterprise at the solution and platform levels. Cohesion between companies is essential “.
“It is often not feasible to centralize the data,” adds the CIO. Carrie Schumaker. “Or, the analysis is prototyped using services to access disparate data sources, and then if it’s successful and business needs dictate. Centralization is done later. “
Hurwitz Analyst Dan Kirsch sees a connection between the data decentralization trend and the data fabric. “We have seen a data fabric approach growing in popularity because it is unrealistic to have a central repository where all your data can be up to date, governed and clean,” he shares. “For this reason, data structures must allow for heterogeneous data locations. I believe that a data structure approach helps with the challenge of shared responsibility: each team is responsible for their own data and then connects it instead of downloading it into a data lake. AWS can say that a data lake is the only path to analytics success. And, of course, they want organizations to put all their data in the AWS cloud. “
Former Vice President of Data and Analytics at Gartner, Nick heudecker, agrees and maintains that all these trends are important. “Each concept serves different users and use cases,” he says. High-throughput, repeatable analytics data warehouses. Data Lakes for question development / experimentation. Data mesh for distributed data consumption with governance oversight. “To avoid confusion, Gartner considers data lakes and data meshes to be equivalent concepts.
Centralize your Big Data strategy on a single platform
Experts take advantage of dual strategies but stick to one platform. Former CIO McBreen says he likes to have “two strategies. One strategy is for productions and the other is for analysis. Each has its own central platform and support for multiple data repositories. Then there is an ETL platform (real, close, batch) between the 2 central cores. “
But which provider provides most of these services? “I haven’t seen any yet that I think is good enough on its own to be the complete platform,” laments McBreen.
Shumaker agrees when he jokes: “Do multiple data repositories usually include some spreadsheets?” For this reason, CIO Deb Gildersleeve He says, “In many ways it’s less about centralizing data and more about integrating it. How can you integrate all your data so that you can visualize it and connect it to your other systems (either on premises or in the cloud)? “
“Centralizing all your data creates cost, governance and security headaches,” shares Kirsch. “Data is locked in line-of-business applications, on premises, and within cloud ecosystems. Connecting to the data where it resides helps eliminate risk and increase the speed of information. “
“I don’t think this is the story of a single vendor solution,” agrees Heudecker. “Some provide consultation capabilities, but no one has yet developed the history of governance. The ‘big’ in big data makes moving things a challenge. Multiple platforms is the norm. If you’re lucky, you can normalize around tools and skills. “
A data fabric, therefore, is a data management concept to achieve flexible, reusable, and augmented data integration pipelines, services, and semantics, in support of various operational and analytical use cases delivered across multiple implementations and orchestration platforms.
Ensure compliance with privacy and data governance standards
To govern data effectively, companies must have a clear idea of the data they have. Organizations must “understand what types of data are in their data lake or data structure,” says Kirsch. “If the PII is involved in a specific application or a new endeavor, companies must assign an executive to oversee the appropriate use of personal data. The executive can also help address the question of what is possible with the data versus what is appropriate. “
Delegates play a critical role in governance. Therefore, it is not surprising that McBreen says that it is important to define “administrators whose entire job is to access and manage corrections to information in its initial source. They rotate out of the commercial teams and the KPIs are in place. We review monthly and adjust as necessary. “
“It’s important to define delegates up front and know how to communicate with them along the way,” says Gildersleeve. “It’s also important to get feedback from administrators on UX design. Shumaker adds that he likes that” data stewards approve the high-level design. Depending on the type of data, there is mandatory access and compliance training to gain access to any data set, and for more specialized data sets there may be additional training. “
Impact of the cloud on Big Data strategy?
“The cloud is becoming another form of computing and storage rather than a separate environment,” Kirsch insists. “Cloud visibility and management are important. Assume that the cloud is a quick way to spend a budget. In many cases, there is no reason to move some applications to the cloud. Being able to instantly do proof of concepts and experimentation in the cloud is huge. Acquiring GPUs, for example, in the cloud versus buying physical infrastructure.
Gildersleeve agrees, saying that “the cloud enables organizations to try new things, as well as add and remove computing power as needed without having to wait for physical work to get done.”
Where are data processes maturing?
Processes require a clearly defined base of terms. For Gildersleeve, “starting in transactional systems is essential. If the data starts out badly, a lot of time is spent debugging and improving it. Shumaker agrees, saying “it’s not sexy, but organizations must agree on the definitions of data that is shared and maintained.”
For this reason, Kirsch suggests that it is time to “change data processes by adopting processes like DataOps. These will be important for data-driven organizations. It won’t be overnight. Businesses are still struggling with DevOps. Data literacy is also critical to success. Business school students shouldn’t get their MBA without some understanding of the data. “
Heudecker doesn’t disagree when he says, “You need the greatest maturity in areas that make it easy to share context around data, so things like data literacy. DataOps can help with resilience, but it is still an overwhelmingly technical practice. “
Clearly, Big Data is in what analysts call the “channel of disillusionment.” While data-driven businesses will be winners in the long run, there is work to be done.
Winners must implement the necessary data governance so that the data is sufficient for the task and is protected. They also need to improve their data processes. Together, DataOps and Data Governance can help. To do this, the data winners will create what Jeanne Ross and Martin Mocker call “Operational and Digital Backbones.”