All-Knowing Algorithms: A Primer

What happens to personal data as the digital age deepens their quality, widens their availability, and creates new uses for them?

Businesses have always sought to collect data on their customers and operations to improve their products and increase their profits. In 1956, International Business Machines (IBM) introduced the first magnetic hard disk, weighing more than a ton and able to store 5 megabytes of data; the global market exceeded 1,000 units.

Today, Google and Facebook store nearly 5,000 megabytes of data on the typical 20-year-old Internet user. Data points range from search history and online shopping cart to face, voice, location, career, hobbies, relationships, finances, and so on. Many online services are designed specifically to facilitate the gathering of such data—Google, for instance, extracts information from every calendar entry logged by a user to learn more about that user. These data then allow for careful targeting of services and advertisements. Facebook, for instance, allows advertisers to reach users of a particular education level engaged in long-distance relationships who listen to the radio.

By allowing companies to market their products and services to specific customers, the collection and analysis of personal data provide the economic basis for much of the digital economy. Individual targeting doubles the effectiveness of online advertising, and advertisers have flocked to the format. Total digital marketing spending increased from $26 billion in 2010 to $140 billion in 2020, while print spending fell from $122 billion to $14 billion. Google and Facebook alone generated $230 billion in advertising sales in 2020, accounting for 80% and 98% of their respective revenue. As a result, these services and many others are effectively free to users.

User data also allow for the tailoring of products and experiences, including the coverage appearing in the news feeds where a majority of Americans get their news and the recommendations made for further reading and viewing. Advertisements and special offers are eerily relevant—the retail chain Target infamously used shopping data to identify women who were pregnant and send coupons for prenatal supplies. Dating sites helpfully identify potential life partners.

On the one hand, consumers generally seem appreciative of the quality and convenience that companies use their data to provide, as well as the free services made possible by the targeting of ads. They consent to all manner of agreements and invite their apps to use their location, and many fail to exercise the control that they have. On the other hand, they do not necessarily understand what they are agreeing to or how their data will be used. Controversies emerge from time to time when companies employ practices that consumers consider invasive or manipulative.

One concern arises over how companies use data. For instance, customer profiles permit the delivery of more effective customer service, as well as prioritization among customers. Colleges track whether prospective students read informational emails and factor those data into a student’s “demonstrated interest,” which, in turn, influences admissions decisions. Life-insurance companies set personalized insurance rates by analyzing their customers’ social media posts. In 2014, the Department of Transportation approved a proposal to allow airlines and travel agents to collect customer data, such as a customer’s age and zip code, to offer “more agile pricing,” a euphemism for charging some customers more than others. The travel agency Orbitz found that MacBook users have a higher price tolerance and charged them more for hotel bookings.

Another concern is that collected data quickly become intermingled. In the $200 billion data brokerage industry, firms assemble data from disparate sources to create individual profiles available for purchase. When a customer makes an online purchase, his name, email, physical address, and phone number are often sold to data brokers. Cell-phone companies sell customer location data, allowing third parties to track a user’s location at any moment. Many apps sell location data to advertisers. A 2014 Federal Trade Commission (FTC) report found that one data broker amassed 3,000 data points for nearly every U.S. consumer.

Data brokers face few constraints on how they gather, combine, or distribute their information. For instance, the U.S. government and foreign governments purchase data on American consumers from data brokers. Even the medical privacy law, HIPAA, only limits the sale of medical data that include a person’s name and home address. Companies can buy and sell data on purchases of medicines, hospital records, and insurance claims, as well as on records that include a patient’s ZIP code, age, and gender; and they can provide sufficient specificity for companies to match those records to individuals.

Issues for Policymakers

While over 80% of both Republicans and Democrats believe that data privacy should be a federal priority, action taken thus far has occurred mostly at the state level. Under a 2008 law, Illinois restricts the collection of biometric data, including scans of people’s faces, voices, or typing rhythm. Public as well as private entities must inform individuals before collecting biometric data, and companies may not profit from it. The California Consumer Privacy Act (CCPA) creates several consumer rights, including the right to know that personal data are being sold and to whom; the right to opt out of such sale; the right to demand that a business delete personal information; and a prohibition on discrimination for exercising those rights.

The first question for policymakers concerns the requirements for user consent. When must users be notified before their data are collected, used, or transmitted? What form must this notification take, and what consent is required? How do users grant or withdraw their consent? And can withdrawal of consent include a requirement that businesses destroy data already gathered? One proposal that has attracted significant attention is to grant individuals an explicit property right in their data, requiring companies to pay for acquisition and use. The contours of such a framework would still depend on answers to these questions.

Independent of decisions made by individuals, policymakers must also consider what (if any) constraints to impose on business practices. Regardless of whether a customer consents to the collection and use of data, policymakers might identify ways in which those data should not be combined, analyzed, or retained. Businesses might also be free to use data however they wish but face constraints in the practices that they can use such data to employ—for instance, in modifying pricing or denying service.

Finally, policymakers will have to decide how government itself can use data. China has garnered enormous attention for its “social credit score” system, which uses data gathered from monitoring countless human interactions and behaviors to award and withdraw state-controlled privileges. While Americans are obviously not contemplating such a system today, recent policy debates have highlighted the difficult decisions that await. For instance, calls in the aftermath of the Capitol riot on January 6 to prohibit participants from flying on planes raised the specter of using data gathered from personal devices to limit free movement absent any due process. The concept of a “vaccine passport” has been controversial partly because it would condition access to public spaces on willingness to engage in a behavior deemed socially desirable.

These cases seem obvious to some—of course we should punish people whose devices show that they were inside the Capitol on January 6; of course we should exclude people from large crowds who might transmit COVID-19—but in an era of technocratic “nudging,” the temptation will always exist to go further. Why not have surveillance cameras on every corner to deter crime, send healthy-recipe texts to people who consistently overspend on fatty foods, or reduce unemployment benefits for people who play video games all day? More aggressive use of data will almost always appear to deliver more efficient outcomes, which markets and the state will both pursue. To the extent that other values matter, they will have to be asserted through the political process.

See more from this series
×