When the news broke that the National Security Agency has been collecting and analyzing email and phone records from major service providers, many Americans were shocked. They decried what they felt was an intrusion into privacy, believing the government overstepped its bounds by effectively spying on citizens and collecting information without any cause or authorization.
As a result, the notions of big data and data mining for information have become hot-button issues. Media coverage has honed in on the idea that every move a person makes, whether on his personal computer or mobile device, is tracked by government officials. The popular perception is that nothing is private; every click you make or message you send is subject to scrutiny — and if you do something they perceive as suspicious, there will be consequences.
While big data does rely on the collection of millions of terabytes of information every day, few people understand the benefits of the process — and how it really affects them. In fact, the analysis of big data is vital to Internet security and addressing threats from everything from hackers to malware.
How Big Data Works
Each day, billions of terabytes of data are shared over networks around the world. Emails, instant messages, surfing the Web — all online activities leave behind an electronic trail. There is so much data that some experts note two days’ worth of data in today’s connected world is equal to all the conversations throughout history.
Although hacktivists and other cyber criminals continue to develop new and more insidious means of attacking computer networks and interrupting the flow of data, they generally follow defined patterns or create anomalies in the usual creation and consumption of data. It’s those patterns and anomalies analysts look for when using big data for security purposes.
For example, the pattern of online use at a particular company is generally fairly consistent; employees usually follow similar routines each day and usually visit the same websites. So when a group of employees suddenly starts heading to new websites, there is an anomaly in the data. Now, that anomaly could simply be that employees are working on a project and need to access a particular site. Or it could be something more nefarious, such as a botnet redirecting machines to a harmful site. Without big data analysis to create a “baseline” of normal behavior, the anomalous new pattern may go unnoticed until it’s too late.
In fact, Internet security experts note big data has given them an edge in one major problem that’s plagued them for years: staying a step ahead of the criminals. Instead of waiting until malware has taken hold to develop virus definitions and block infections, security teams can now stay ahead of the game. By identifying patterns and comparing them against previous-known threats, harmful security breaches can be stopped before they happen.
Protecting Your Privacy
How do you collect all of that data without compromising personal privacy? Some argue that collecting such data, even for noble purposes like keeping networks safe, has the potential to create a slippery slope in which our usage patterns create potentially incorrect assumptions and lead to punitive or discriminatory actions.
However, security firms that use big data are actively developing ways to protect privacy while still developing the protections consumers want. Protocols protect data and the privacy of users at both the endpoint, where the data is collected, as well as during the transmission, storage and analysis phases. Personally identifiable information, such as names and birth-dates, is generally scrubbed before analysis. Once the data has been analyzed, it is destroyed — most companies do not store data for an indefinite period, given that the threat landscape changes daily.
Another comfort for people concerned about privacy is that the sheer volume of data makes pinpointing a person’s activities exceedingly difficult. Unlike surveillance programs that target an individual or group, big data in the security realm is about looking for patterns on a large scale — not finding out what a particular person is doing. Attributing data to one particular person in this big data environment is a lot like finding one particular grain of sand on the beach: possible, but unlikely.
The issue of privacy and data collection is not going away any time soon, and it’s reasonable to expect that the debate will lead to new laws and restrictions regarding how different entities can collect and use data. For now, feel confident that security professionals are leaving no stone unturned when it comes to finding and stopping threats to your network.