Cybersecurity: A Big Data Problem

by Rob Carey

Posted in Technical | October 20, 2022 2 min read

Information technology has been at the heart of governments around the world, enabling them to deliver vital citizen services, such as healthcare, transportation, employment, and national security. All of these functions rest on technology and share a valuable commodity: data.

Data is produced and consumed in ever-increasing amounts and therefore must be protected. After all, we believe everything that we see on our computer screens to be true, don’t we? When we consider that there are bad actors around the world that seek to disrupt the very technology (data) that serves the people, cybersecurity becomes a ubiquitous problem around the globe.

To put the risk into perspective, in 2020, “The number of cybersecurity incident reports by federal agencies in the United States alone was over 30,000, approximately an 8% increase from the previous year,” according to Statista.

Government networks are managed by CIOs and CISOs, with the CDO—the newest CXO position—shaping policies to handle data in support of government missions. Most CISOs have a rather standard set of cybersecurity tools that handle identity management, encryption, edge device log data management, vulnerability scanning, deep packet inspection, network security monitoring and intrusion detection, and of course, antivirus. These tools are used to analyze a plethora of network data. Typically CISOs have the tools their predecessors left them with and achieve generally the same results.

As stated in my recent interview on the FedScoop Daily Podcast, cybersecurity has been done essentially the same way for the past 30 years. More notably, progress and success in defensive cyber has been both slow and evolutionary over this time. Bad actors only have to be right one time, and the defenders need to be right all the time in real time, so doing something “different” is a must. AI and machine learning (ML) are technologies that demonstrate promise toward automating malware disposition functions and enabling humans to perform higher level functions—moving past signature tracking as the only way to begin to get ahead of malicious cyber threats.

Much work has been done here, but much work remains, as no one technology is a silver bullet. But AI and ML technologies are potentially game changing. Big data platforms (BDP) such as Cloudera Data Platform (CDP) can easily consume, store, manage, and analyze very large amounts of data, such as log files, application status, and containers. They can also correlate in near real time expected activity against actual activity and trust, ergo, support zero-trust architectures. BDPs can also hold data for longer periods of time and examine it to enable pattern correlation.

Cybersecurity is a big data problem. Understanding activity in real time is what cyber security is all about—ranging from endpoint files to identity management digital handshakes to container executions to event detections. Expecting different outcomes doing essentially the same thing probably won’t materialize.

Learn more about the intersection of cybersecurity and big data at my fireside chat at the MeriTalk Cyber Central on October 27 in Washington, DC. I look forward to seeing you there.

Rob Carey

President, Cloudera Government Solutions

More by this author