One facet of my job is Big Data, that for those unaware consist in computationally analyze enormous data sets to reveal patterns, trends, and associations. In other words, is about determining human behavior from data collected and use it for making predictions.
To make sense of such complex data ranging in billions, sometimes you need to extract a subset of some subjects, and “manually” explore the data to create a coherent story, and later use that logic for build and automated process.
I remember one day I was tracking the behavior of an individual selected randomly. I used his points of connection of his cellphone ranging across different locations in his area of residence, between his workplace and what I believed was his house. In his daily itinerary, there was a place he visited after work which called my attention because the eccentric name. When I Google it, I realized it the place was a massage parlor.
After my realization, different questions came into my mind: was he married? If so, did his wife know it about his visits to that place? Maybe not, and maybe this was his little secret.
So, the questions here are: did users overreacted? Or they were right in being worried about a company registering, and probably exploiting, their digital footprint for commercial good?
First thing we need to understand is: we don’t own the internet. We don’t own the websites that we visit, nor we own our social media profiles. Unless you have your own website and pay for your own hosting (server) for publishing your comments (like I do for this blog), your posts in Facebook are not really yours. All those companies are private, and they own the infrastructure and intellectual property (code) on which they build the social network. Internet companies can’t commercialize your content, at most they might use it for self-advertising, but may track your behavior and use it for commercial good as I mentioned.
Is this intrinsically bad? I don’t think so, most of the features came into life for increase engagement for business and can simplify our lives. Let’s take the example of Behavioral Retargeting, that must of you should have experienced it in the shape of an Amazon banner you receive on Facebook displaying the exact same product you googled for. Many of you can find “scary” the fact that our behavior is “surveilled,” and for others like me, could use this as a practical reminder of the gift I need to buy that now is at discounted price.
But at the end, like in my example I shared at the beginning, we might be concerned about somebody “spying” on our little secrets. Should we be concerned of somebody be looking at what we do on internet?
I believe that most people are unaware of the massive scale of the data recorded on the internet and the computational requirements needed in order to “spy” on everybody. And for this, I would like to use an analogy.
I really enjoy seafood, and if I ask for you to fish something for me, you might be successful. You can sit at the end of a quay with a fishing rod and wait to see what bites the bait. You might return to me with a tilapia hanging from the fishing hook, happy about fulfilling your endeavor. But what if I ask for a yellow-striped fish instead? Or for lanternfish, which is a deep-sea fish? The result for your fishing trip might not be as satisfactory as the previous one.
The data collected on the internet is as vast as the ocean. Billions of fishes inhabit the immense body of water that turns darker as we go deeper. Fishermen, like marketers, understand how the different species congregate, in which regions of the sea, the depths, and what currents they follow. This is an equivalent of having a profile of a segment of internauts, classified just like fish, by groups of age, preferences, and other demographics. Using the information collected, they can “predict” their behavior. This type of study is simpler because it requires fewer resources. It’s simpler to follow the movement of a school of fish than an individual fish.
In my real-life example I shared, we usually analyze and make predictions based on the behavior of millions or at least thousands of users to be meaningful. Yes, I can go deep into the depth, dark waters of the data sets and analyze a single individual, just like scientists install trackers on the dorsal fins of a dolphin for study their movement across the ocean. We can track one, or maybe a few, not billions.
In summary, even if our behavior on internet is exposed, the crowd provides anonymity, and you might become a target only if like a dolphin, you show yourself above the water surface and at the same time somebody is looking on that specific spot of the endless ocean.
What I want to convey with this? In terms of cyber security, the critical aspect is not the fact that you leave a digital footprint behind, but the type of footprint you leave. The type of content you share on internet might put you at risk not from the fishing company which cast a fishing net blindly over the sea to commercialize with your information, but rather for the predator inside the sea, that lurks close to you. Facebook will not envy you for the pictures you share of your brand-new car, your newly born baby, your costly house, or expensive trips, but somebody else might.
Be wary about the type of internet persona you build of yourself, but don’t worry about being spied by a government or corporation… unless you truly have something to hide? Otherwise, people like me, who study web behavior, will not care if you visit a massage parlor after work. Trust me. We are just numbers more swimming in the endless binary ocean.
M. Ch. Landa