A commercial marketing database lets you peak inside



There is an old entry in my backlog of ideas for the blog which went along the lines of try to get a copy of what data mining marketing firms have on file for you. Nothing ever came of that until I saw Ed Felten's post on one of those companies doing just that, with a spiffy site to boot, and I'm really surprised about this.

A quick primer

Most everyone knows that companies as a whole are tracking purchasing habits and customer preferences and correlating them with socio-demographic information to sell you more stuff. The who and what is much less widely know though.

Most often these collection companies are specialized businesses, which sell subscriptions to their databases, which they feed from multiple sources, especially in the U.S. private as well as public sources. At the time I jotted down a few notes on the topic ChoicePoint was one of them.

Most of these companies do not simply stop at providing information on marketing data to business and government (yes, the U.S. government is not prohibited from using private databases, even if they are not allowed to create those on citizens in the first place). They often have a close relationship to data based on or provided by credit scoring agencies, such as Altegrity, TransUnion or ISU; which are listed as direct competitors of ChoicePoint by Hoovers. The differences in their aggregate datasets are probably small, the impact of the result on individuals often not. ChoicePoint was later acquired by LexisNexis, which primarily focuses on publication databases,  and some divisions by Acxiom. It shows the further agglomeration of databases of different types within these data brokers.

Why aboutthedata.com?

If I had been asked a week ago, I would have concluded that such data brokers have no interest in providing a site such as aboutthedata.com. It seems intuitive that such brokers would want to collect as much data as possible on their subjects without giving them reason to share less data with them. Ideally, to not be noticed at all.

This could be an initiative of Acxiom to improve the public perception of data mining firms such as themselves. It could be an attempt to preempt any negative coverage on data brokers due to the continuing public discourse on surveillance et al. Or it could just be an attempt to further improve their data.

Let's try it out

Since I did spent several years in the U.S. I figured that they should have at least something on me. I entered my personal data from my last residence and was surprised to find that in the categories shown above they had basically nothing, except for an inferred marital status and income bracket, which matched, but that would have been possible to extrapolate from the address and age itself I entered to register. Ed Felten was more successful but also apparently underwhelmed by the level of detail shown.

My speculations on why this is so fall into two basic groups:

1. They are only showing a minimal set

It's possible that Acxiom is only providing their basic result data and keeping any other further (and more creepy) analysis results to themselves and their clients, but since that's pure speculation I'm going to ignore this avenue for now.

Also, I have become in general skeptical of the predictive power of big data in shaping or determining customer behavior, which is in my opinion often at best on par with a trained sales representative.

2. I didn't provide enough data

Two of the primary categories provided by Acxiom are basically public databases with vehicle and house ownership, neither applied to me at the time. Also, I generally did not sign up to loyalty cards. It's possible and likely that Acxiom then either really does not have much more information on myself or that they are unable to merge incomplete datasets on my person into a consistent record.

The latter case highlights the main problem with abouthedata.com:  Basically, I can be pretty certain that I'm not seeing the full list of entries Acxiom has on me, simply due to the fact that a human operator would have to make a judgment call, whether a fragment refers to the same canonical person or not. They are unlikely to be able to manually merge a significant number of entries, which would mean that there are have to be significant numbers of entries in their database which they cannot bring together in this web application by algorithm alone. They cannot ask me "is this you as well" without accidentally disclosing data from someone else in many cases, they have to err on the side of caution, more so than is probably necessary for most of their customers.

Thus, I'm still left wondering what the site is supposed to accomplish. Calm me? Improve the paper spam selection? Not sure.