Missing Data and Anti-Discrimination Laws

In February, the Massachusetts Attorney General’s Office published an analysis concluding that drivers living in the state’s predominantly minority communities are charged higher auto insurance premiums than similar drivers living in majority white communities. In fact, the study found that experienced drivers with good driving records in the largest minority population areas pay more for auto insurance than those with recent accidents living in areas with the smallest percentage of minorities. These findings are the latest in a series of reports expressing concern that the formulas and factors insurers use to underwrite risk produce a significant disparate impact on minorities.

Most states, including Massachusetts, prohibit “unfair discrimination” in insurance pricing, meaning that insurers may not discriminate among consumers unless based upon actuarial risk. But despite observing that racial disparities exist in the cost of auto insurance, the Massachusetts Attorney General could not determine if insurers unfairly discriminated because it lacked requisite data on the insurers’ losses and payouts.

Even though regulators must compare the price of premiums with the covered risk in order to assess whether unfair discrimination is afoot, most states do not collect individual insurers’ payouts, claims, or even premiums data—at least not at the granular level necessary to understand whether price disparities can be explained by differences in the cost of providing insurance. Insurers argue, and some state courts and regulators appear to agree, that requiring insurers to share disaggregated policy information would amount to forcing them to divulge trade secrets. The resulting regulatory regime is one in which regulators are tasked with identifying and preventing unlawfully discriminatory insurance practices but are poorly equipped to do so.

Missing data is not limited to insurance anti-discrimination regimes. It hampers enforcement of anti-discrimination laws whenever individuals have trouble knowing if they were treated differently from other similarly situated people. For instance, a panoply of federal and state laws prohibit paying men and women differently for the same work. But according to a 2014 memo from former-President Obama, enforcement of these laws has been severely “impeded by a lack of sufficiently robust and reliable data on employee compensation, including data by sex and race.” The Equal Credit Opportunity Act prohibits credit discrimination based on race, but it simultaneously forbids lenders (except for mortgage lenders) from collecting race-based data. As a result, regulators and consumer advocates investigating non-mortgage credit discrimination cases have had to rely on imperfect proxies for race, such as borrowers’ last names and addresses. These workarounds have, in turn, left the results of the investigations open to intense criticism. And the Batson doctrine, which protects against discrimination in the jury-selection process, allows courts to take into account voir dire patterns exhibited by prosecutorial offices as a whole. Yet, as Professor Andrew Crespo has noted in the Harvard Law Review, this type of evidence is almost never presented in practice because litigants rarely have access to the necessary office-wide data. Most Batson litigation instead relies on evidence about jury strikes in a single case, a process that has widely been called a farce in detecting Batson violations.

Regulators’ experience with the Home Mortgage Disclosure Act (HMDA) demonstrates just how valuable data collection can be when it comes to identifying discriminatory practices and regulatory gaps. HMDA itself does not establish any anti-discrimination mandates or prohibitions; instead, it serves as a companion to substantive fair lending laws by requiring most mortgage lenders to collect, report, and publicly disclose loan-level data about home mortgage originations and purchases. Armed with HMDA data, regulators have substantially improved their capability to recognize practices that violate substantive fair lending laws, to target their investigations, and to ensure accountability. HMDA data enables regulators to detect broad patterns in loan decisions and use statistical techniques to determine whether non-prohibited factors, such as borrowers’ income, can explain those patterns. More important, while HMDA data is too limited by itself to prove illegal discrimination, it allows regulators to screen and identify lenders that warrant closer review.

Finally, because most HMDA data is publicly available, it enables market intermediaries to hold regulators as well as regulated businesses accountable. Journalists at the Atlanta Journal-Constitution famously used the data to call attention to racial disparities in home mortgage lending activity in the Atlanta area, which eventually led the U.S. Department of Justice to file its first case alleging a pattern or practice of mortgage discrimination, brought against the Decatur Federal Savings and Loan Association in 1992.

Unfortunately, despite the historical success of HMDA, mortgage discrimination enforcement may soon face the problem of missing data as well. Congress is currently considering the Economic Growth, Regulatory Relief, and Consumer Protection Act (S. 2155), which would exempt lenders that issue fewer than 500 home mortgage loans annually — an estimated 85% of lenders — from critical aspects of HMDA’s reporting requirements. Notably, the exempt lenders would no longer have to report data concerning borrowers’ creditworthiness and property value, data that regulators need to determine if loan price disparities are explainable by differences in the borrowers’ risk profiles. The absence of this data prior to the 2008 financial crisis prevented regulators from effectively policing discriminatory lenders. Were the current version of S. 2155 to pass, regulators would once again be constrained in their ability to detect misconduct and toxic lending practices.

The challenges of identifying and understanding discriminatory practices without sufficient data are getting worse over time; the value of information-enhancing laws like HMDA and the harm from information-restricting laws like S. 2155 are likewise becoming more pronounced. Data-mining technologies are increasingly sophisticated in extracting our personal and behavioral information, and businesses incorporate this information into complex decision-making algorithms. For example, some insurers now use information about a consumer’s shopping behavior to predict the likelihood that the consumer will look around for insurance from a competitor, and vary the premium offered to that consumer accordingly — a practice commonly known as “price optimization.” While price optimization almost certainly discriminates unfairly, regulators paid attention only after consumer groups brought it to light. Without detailed data on insurance premiums and payouts, regulators cannot monitor — indeed they cannot see — the real implications of this emerging practice.

When we charge regulators with enforcing anti-discrimination laws, we should ensure that they have the tools to fulfill that responsibility effectively. Data collection is a necessary first step.

More from the Blog

Originalism Makes Sense: A Response

A Thought Experiment: Does Originalism Make Sense?

NYT v. OpenAI: The Times’s About-Face