By Terry Ray,Chief Product Strategist at Imperva
With more data being collected by companies than ever before, securing it is no small task.There are many factors that need to be taken into consideration such as; are the environment and the data vulnerable to cyber threats and who has access to the data? And there’s also the issue of compliance. Big data deployments are subject to the same compliance mandates and require the same protection against breaches as traditional databases and their associated applications and infrastructure.
Typical security hygiene practices for data security are still applicable for big data environments. The issue is figuring out how to achieve security and compliance for big data environments given the unique challenges they present.Much of the challenge of security big data is the nature of the data itself. Enormous volumes of data require security solutions built to handle them. This means incredibly scalable solutions that are, at a minimum, an order of magnitude beyond that for traditional data environments.
Additionally, your security solutions must be able to keep up with big data speeds. You’ll need to focus on data parsing and collection throughput, the degree of automation that is available, and the ability to deliver real-time visibility of policy violations and other events.Mixing multiple sources and types of data with different access permissions compounds classification and policy-setting challenges, also elevates the need for robust audit capabilities.
The multiplicity of big data environments is what makes big data difficult to secure, not necessarily the associated infrastructure and technology. For example, the open source Hadoop framework has different layers of the stack serving a variety of purposes, from distributed storage at the bottom, to table and schema management, distributed programming, and querying/interface options at the middle tiers, and a wide range of management tools along the top. There is no single logical point of entry or resource to guard, but many different ones, each with an independent lifecycle.
Often big data environments will use multiple technologies for data storage and retrieval. For example, it’s not uncommon for an implementation to include either or both relational stores and query tools to support analytical workloads/purposes and non-relational technologies—also known as NoSQL technologies—for real-time, interactive workloads.
Many big data environments include multiple instances or versions of the same core building blocks, except from different vendors, such as different Hadoop distributions and NoSQL offerings. This means a greater amount of diversity and complexity to be addressed by security tools and staff.Big data deployments typically have a multitude of geographically distributed data stores and, therefore, numerous physical nodes requiring protection. This inherently increases the potential for inconsistent security policies and practices, suggesting the need for solutions that feature strong, centralized administration capabilities.
There’s also the challenge presented by the lack of security knowledge and understanding in the people working most closely with the data: data scientists and developers. Data scientists, with their skills and experience working with structured and unstructured data to deliver new insights, don’t necessarily think about the security of the data. It’s not surprising given that new technologies have encouraged data scientists to view big data as a giant sandbox where they are the owners and can decide how the data will be used.
While most development projects rely on access to non-sensitive, test data instead of live, production data, big data application development by its nature often falls outside of the more secure processes set up within IT. And with higher-access privileges than many others in the organization, developers also present a greater security risk either through accidental means or malicious intent.
The number and breadth of data breaches continues to grow unabated, with a 40% increase in data breaches in 2016 reported by the Identity Theft Resource Center. Therefore, it is crucial that everyone from the CIO on down understands and prioritizes implementing better security for big data—after all, the last thing you want is your company in the headlines for the latest data breach.