Making Data Work for Us
A pragmatic view of privacy should encourage data collection that benefits users and innovators alike.
The art of selling has always been in knowing the customer. When markets were smaller and more local, sellers knew buyers by name, knew their kids, knew their hobbies, knew when they’d just bought a new car. The multinational corporation is still trying to catch up, collecting and studying the scraps of data that users make available online.
For privacy advocates, such data collection poses endless dangers, the specific contours of which are never made quite clear. Are “targeted ads” manipulative? Maybe. But what would it even mean for an ad to be untargeted? Should it be illegal for truck companies to spend most of their television advertising dollars on NFL games because they know that football fans are more likely to buy their products? Football fans, at least, appear unbothered.
In the industry, placing ads where particular audiences are most likely to see them is called “contextual” advertising and can be distinguished from “behavioral” advertising that is targeted on the basis of individual characteristics—say, demographic data or web-browsing history. Here, consumers don’t seem to know what they want. In surveys, they say that they value their privacy; but in experiments (and in real-world settings), they trade it away for small tangible benefits. In one study, the vast majority of participants were willing to reveal their monthly income to a video rental store in exchange for a one-euro discount on a DVD. (Without the discount, about half still shared this private information in exchange for no benefit.) Another study found that most subjects would happily sell their personal information for just 25 cents, and almost all of them waived their right to shield their information.
As venture capitalist Benedict Evans has observed, “We don’t want irrelevant ads or ads that are too relevant. We don’t want anyone to know what we bought but we want the advertiser to know we already bought that. And we refuse (mostly) to pay but we don’t really want ads anyway. Our feelings about online ads are pretty unresolved.”
Without targeting, advertising would be less efficient, and companies that rely on advertising revenue would be forced to raise prices for consumers and reduce investment in their platforms.
Unfortunately, there is no free lunch here. Targeting ads with behavioral data increases revenue for platforms and publishers alike (both Google and the New York Times sell targeted advertising)—doubling or tripling it, according to a literature review by marketing professor Garrett Johnson. Without targeting, advertising would be less efficient, and companies that rely on advertising revenue would be forced to raise prices for consumers and reduce investment in their platforms.
Some activists argue that the trade-off is worthwhile and that subscription-based business models would be preferable to advertising-based ones. Although that may be their opinion, no evidence exists that any critical mass of users agree. Nor is it clear how such a transition could be brought about, even if policymakers attempted to mandate it.
Markets in Data
One leading idea for giving users greater control of their data and obstructing expropriation and manipulation by companies is to grant them formal “ownership” and allow them to sell it. Before we can talk about the pros and cons of data markets—and how policy might need to change to make them better—we need to know where they are and what they are. Where is the market for data? Can you go to a website or download an app that has “data” for sale? And once you find the market, what’s for sale there?
Immediately, a number of problems become apparent. First, while people often say that “data is the new oil,” it’s a terrible analogy and leads policymakers to view data within a flawed commodity-like framework. Data is much closer to a public good—one person using it does not preclude someone else from using it, too (“non-rival”); and stopping someone from using it is hard (“non-excludable”). Data can be used over and over without being diminished (it’s just 1s and 0s!), and once it’s shared publicly, it’s difficult to prevent it from being shared with others in unauthorized ways. If anything, then, we are likely underinvesting in data collection. Building the necessary infrastructure to collect and process data for profitable use is very expensive, and companies know that they won’t be able to realize all the gains from doing so.
Some activists argue that the trade-off is worthwhile and that subscription-based business models would be preferable to advertising-based ones. Although that may be their opinion, no evidence exists that any critical mass of users agree.
Second, people seeking to maintain their own privacy must rely on others not to share information about them. When someone else shares personal data that includes them or can be linked to them (say, a group picture or an email), then they have lost some measure of privacy. Remember that the Facebook–Cambridge Analytica scandal was a scandal because Facebook users opted to share data about their Facebook friends with the third-party app (not just data about themselves). As MIT professor Daron Acemoglu and his colleagues show in a recent paper, when others share data about someone, that person has less reason to protect his own privacy. He becomes more willing to share additional personal data, too—because at the margin, it doesn’t make much difference.
Third, and perhaps most significantly, data has little value on the open market. When most people see the word “data” next to the word “market,” they often think of “Big Tech.” If anyone is participating in—and profiting enormously from—data markets, surely it must be Facebook and Google (and, to a lesser extent, Microsoft, Amazon, and Apple).
But as these companies constantly point out, they don’t sell personal data; they sell targeted advertising. If a small business wants to show an ad to moms between the ages of 30 and 40, with a household income above $100,000, living in the suburbs of Cincinnati, the tech giants are more than happy to help. But the data about those individuals never leaves the companies’ hands. The data—along with platforms that users want to spend time on—is their competitive advantage, and they have no intention of sharing it.
Yes, some companies known as “data brokers” sell personal data directly, and some tech companies are buyers of that data, but few consumers have even heard of these brokers—and they are small potatoes compared with the major platforms. So when people talk about “data markets,” what they really mean is “platforms that offer you services (often at no monetary cost) in exchange for your time and personal data.”
Such a 'data theory of value' is as much a fallacy as the labor theory of value. Data is but one input in addition to highly skilled labor (machine-learning engineers aren't cheap!) and data centers.
In fact, notwithstanding the oft-cited claim that a family of four’s personal data could be worth $20,000 in annual income in the near future, that personal data is nearly worthless today. The Financial Times provides the analysis:
- “General information about a person, such as their age, gender and location is worth a mere $0.0005 per person, or $0.50 per 1,000 people.”
- “A person who is shopping for a car, a financial product or a vacation is more valuable to companies eager to pitch those goods. Auto buyers, for instance, are worth about $0.0021 a pop, or $2.11 per 1,000 people.”
- “Knowing that a woman is expecting a baby and is in her second trimester of pregnancy, for instance, sends the price tag for that information about her to $0.11.”
- “For $0.26 per person, buyers can access lists of people with specific health conditions or taking certain prescriptions.”
- “[T]he sum total for most individuals often is less than a dollar.”
So why are Facebook and Google worth hundreds of billions of dollars if personal data, a key input to their business models, is relatively worthless? It’s because, as Stratechery’s Ben Thompson explains, these companies are actually “data factories”:
Facebook quite clearly isn’t an industrial site (although it operates multiple data centers with lots of buildings and machinery), but it most certainly processes data from its raw form to something uniquely valuable both to Facebook’s products (and by extension its users and content suppliers) and also advertisers (and again, all of this analysis applies to Google as well)…. Data comes in from anywhere, and value—also in the form of data—flows out, transformed by the data factory.
Because they are factories, the entire value of the output cannot be ascribed to users’ personal data. Such a “data theory of value” is as much a fallacy as the labor theory of value. Data is but one input in addition to highly skilled labor (machine-learning engineers aren’t cheap!) and data centers. This is why “data dividend” schemes don’t add up. Facebook’s average revenue per user in the U.S. and Canada in 2020 was $163.86; giving the user 20% of that total would amount to only a few dollars per month.
Privacy as an Instrumental Right
One way to sort through this mess is to start thinking of privacy as an instrumental right—one “meant to achieve certain social goals in fairness, safety, and autonomy,” as law professor Jane Bambauer put it last year, “not an end in itself.” This commonsense approach lacks the rhetorical flair of our modern-day Patrick Henrys who insist: “Give me privacy, or give me death.” But it better comports with most people’s values and with a realistic assessment of the trade-offs that they face.
Thinking of privacy as an instrumental right—as something that helps us get more of what we want in terms of other values—makes some of the problems around data markets more tractable. We could see that it is in everyone’s interest to make data more alienable—easier to buy and sell—which would, in turn, make our data more valuable. Policymakers should look for ways to subsidize the creation of publicly available data sets, facilitate data exchange, and implement strong privacy protections that limit third-party sharing of personal information.
Unfortunately, recent privacy laws such as the General Data Protection Regulation (GDPR) in the EU are going in the wrong direction, framing privacy as an inalienable right that must never be traded away, even if doing so would make individuals better off. An unconventional coalition of liberals and conservatives skeptical of technological progress and worried about the power of Big Tech seem eager to follow. Doing so would stifle innovation, reduce investment, and harm consumers, while delivering no tangible benefit to anyone.See more from this series
More from this series Collapse 〉