January 20, 2004

New horizons in the war on you. Calpundit links to this report of the Bush Administration using traditionally-confidential census data in their forthcoming airline-passenger screening program, and remarks:
Black helicopter conspiracy theorists have been screeching for years that census information isn’t really private, and of course the census folks have responded by swearing on stacks of Bibles that yes it is private. Every bit of it. Absolutely.

But they were lying. Someone needs to be fired if this report turns out to be true.

To quote Teresa Nielsen Hayden: “I deeply resent the way this administration makes me feel like a nutbar conspiracy theorist.”

UPDATE: Calpundit has further information that suggests that what happened here was less dire than it might appear. So does Alex R, in the comments at Calpundit and here. Good to know.

Teresa’s point is still germane, though. [03:54 PM]

Stefan Jones ::: (view all by) ::: January 20, 2004, 04:11 PM:

It really, really, REALLY ticks me off that *NASA* is involved in this crap.

Isn't there some government spook outfit, like the Ministry of Peace and Freedom, that could do this?

So that, like, space exploration is not forever after associated with sinister spy programs run by ex-admirals and twitchy math post-docs?

Claude Muncey ::: (view all by) ::: January 20, 2004, 04:23 PM:

Excuse me for a moment while I overreact . . .

Let's see here:

Census detail data that legally is supposed to be stricly confidential for 72 years -- yeah, let's load that stuff up and play with it.

Gun purchase and ownership records, which were collected for the specific purpose being able to look up who puchased what gun -- no, too sensitive, we can't even look at that data to see if terrorists have bought guns or if some gunshops are selling habitually to criminals.

Look, I have used Census data for a variethy of purposes over the years and you damm well rely on it being accurate for some purposes. (Counting homeless amd/or undocumented persons is not one of those things.) It does not take much distrust of the Census to destroy the utility of that information.

Damm, I had that tinfoil hat around here somewhere . . .

Xopher ::: (view all by) ::: January 20, 2004, 05:52 PM:

Well, unless heads roll and buildings burn about this, I'm pitching my 2010 Census form in the recycle bin.

But wait, if Bush gets reelected, I'll be in a work camp, won't I? They won't NEED me to cooperate with a Census...

Alex R ::: (view all by) ::: January 20, 2004, 06:02 PM:

(Note to Electrolite readers: I also posted this comment at Calpundit...)

This report is almost certaintly bad reporting by the Washington Times. (The fact that this report appeared there should have given a clue!)

Executive summary: The NASA report uses **publicly available** Census data, not private data that can track individuals.

I went to the relevant page on EPIC's website, and found a reference to this NASA study.

The study (which *did* use the improperly released NWA data, by the way) had this to say about Census data:

This data set contains the responses from the 1990 decennial Census in the United States. The data has information on both households and individuals. ..... Our data comes from the 5% State public use microdata samples and we used the short variable list [24].

Reference [24] is to the Minnesota Population Center Integrated Public Use Microdata Series.

Here's what the Minnesota Population Center has to say about their dataset:

Most population data - especially historical census data - have traditionally been available only in aggregated tabular form. The IPUMS is microdata, which means that it provides information about individual persons and households. This makes it possible for researchers to create tabulations tailored to their particular questions. Since the IPUMS includes nearly all the detail originally recorded by the census enumerations, users can construct a great variety of tabulations interrelating any desired set of variables. The flexibility offered by microdata is particularly important for historical research because the aggregate tabulations produced by the Census Bureau are often not comparable across time, and until recently the subject coverage of census publications was limited.

Microdata do pose some limitations, however. For the period since 1940 census microdata are subject to strict confidentiality measures that limit their usefulness for some applications. The IPUMS samples for these years include no names, addresses or other potentially identifying information. To further ensure that no individuals can be identified, the Census Bureau limits the detail on place of residence, place of work, very high incomes, and several other variables. Most important, the microdata records for the period since 1940 identify no geographic areas with fewer than 100,000 inhabitants (250,000 in 1960 and 1970). Therefore the IPUMS is inappropriate for research that requires the identification of specific small geographic areas in those census years.
(Emphasis added)

So as far as I can tell, there is no evidence that the Census department released inappropriate data to NASA for this project. It may be that their confidentiality restrictions are insufficient -- see in particular the study's description of outliers in the census dataset. But this dataset is available to anyone who wants it, including you and me, and has at least been somewhat trimmed to avoid privacy violations.

Chris Sandvick ::: (view all by) ::: January 20, 2004, 06:29 PM:

Seeing administration conspiracies in every news story, thinking you are going to reveal the "truth" about George W. to the public, that the United States is sliding into one party autocracy, and relating with Joe Hill? Folks, you ARE conspiracy theorists.

Patrick Nielsen Hayden ::: (view all by) ::: January 20, 2004, 06:44 PM:

As it happens, actual conspiracies happen quite frequently in real life; which is to say, powerful people tend to act to preserve and extend their power, and not being fools, they often cooperate in secret to this end. As no less distinguished a conservative philosopher as Adam Smith observed, "People of the same trade seldom meet together, even for merriment and diversion, but the conversation ends in a conspiracy against the public." Likewise, Daniel Davies observed that many things are very poorly described by the fashionable insistence that "if we have to choose between explaining something as a cock-up or a conspiracy, we should choose cock-up every time." As Davies notes, most changes of government in Africa between 1960 and 1988 were coups d'etat; which is to say, conspiracies.

Of course, Daniel Davies is an intellectual and a skeptic, as was Adam Smith; both of them would handwaved out of the room by the modern power-worshippers who have kidnapped the term "conservative."

However, Alex R is right that any information from the Moon-owned Washington Times should be treated with suspicion, and the post on which we are commenting has been suitably amended.

Teresa Nielsen Hayden ::: (view all by) ::: January 20, 2004, 06:48 PM:

Alex, I hope you're right, because what Claude said is absolutely true: The minute people start thinking they can't trust the confidentiality of census data, we can kiss its accuracy goodbye.

Personally, I don't like the idea of microdata samples being available, even when they're "subject to strict confidentiality measures". I know that without that identifying information, I'm indistinguishable from every other narcoleptic ex-Mormon editor who for the last quarter-century has been married to another editor; but it worries me all the same.

Chris Sandvick, you are in violation of the "you folks" rule, and are currently held suspended between leniency and judgement like a spider in Jonathan Edwards' hands.

Claude Muncey ::: (view all by) ::: January 20, 2004, 08:28 PM:

I feel a little better. Maybe. Sort of.

The study in question (skip down to page 45) was part of a workshop last May on data mining and counterterrorism work. This is similar to some of the work that I have done with datamarts and data mining in the commercial sphere.

The problem the two researchers were addressing is how to figure out how unusual one individual or case is compared to a very large database in real time. (This has obvious applications to a system like CAPPS II.) The current CAPPS system, like many similar profiling systems, is reported to use a fairly staightforward scoring and weighting sytem. You look at a number of factors for a particular passenger, and based (we hope) on experience and analysis, assign weights to particular characteristics. A 30 year old unmarried non-citizen male who is traveling without baggage on a one-way cash ticket purchased that morning should damm well score higher than your 90 year old maiden aunt traveling to see her sister. Persons below a certain threshold are not questioned, while those over a certain threshold may get to be a guest of TSA for a while.

The advantage with this approach is speed and a certain transparency -- combined with a decent watchlist system (something that for some reason or other we have had a hard time putting together) You should have a verdict with (if you want it) a reason very quickly once you enter the data. You can even set it up as a branching tree of questions, with extra questions added if you have some early hits, or very few questions if you check out correctly in the early ones. The problem is that it only finds the people you already know to look for -- the people that meet criteria that you have put together from previous problems. The problem is, what about the next attack, using a different approach?

The authors' ideas (which seem consistent with what we have heard about CAPPS II) involve using outlier analysis to determnine just how unusual a particular traveler is based on existing databases. Such algorithms can be powerful but very time consuming -- minutes or hours. What we need is a score right now. In their case they use a number of techniques to get relatively fast results from large data sets. (CAPPS II has been reported to analyze information from a number of large commercially available databases -- and the big ones are larger than what was used here by orders of magnitude. But the production system will presumably be using bigger hardware as well.)

A possible result, if properly designed, would be a system that could score just how unusual a particular passenger was for that flight for that day in defined but fuzzy ways when compared to databases that are continuously updated. Sort of like a box that could say "this person seems funny -- for some reason or other. Ask some more questions about how they paid for their trip and when they are coming home." Data mining can really work this way -- you find trends (sometimes) that you can't explain at first, that you have to dig to explain, but that prove out as significant when investigated. If this technology is being used on a test basis (who knows?) it would explain the reports of people who check out one day and are held on another -- the thresholds could shift from day to day.

The special Census sampled microdata data sets are publicly available (something I didn't know) and appear to have been used because they were big, current, and had the kind of characteristics CAPPS II would need. In the case of the Census data, the information is scrambled and in some cases aggregated before release to make it difficult to match individual data items to real people. Names and addresses have been scrubbed. The big news, actually, was the passenger data set -- this is the airline passenger information that certain airlines denied passing over but finally admitted that they supplied when they should not have.

I have a lot of other questions about this approach as well as the other things in the document, but that will take a little more time, reflection, and some professional reading. I am reassured. Sort of.

Simon ::: (view all by) ::: January 20, 2004, 11:11 PM:

"A 30 year old unmarried non-citizen male who is traveling without baggage on a one-way cash ticket purchased that morning should damm well score higher than your 90 year old maiden aunt traveling to see her sister."

OK. So terrorists of the non-stupid variety will do their best to look like 90 year old maiden aunts on innocent visits. They may not be able to do much about the 90 year old maiden aunt part, but they sure enough can avoid buying one-way tickets in cash on the morning of the flight.

Meanwhile, out here in the real world, anyone who buys a one-way ticket or in cash or on the morning of the flight - any one of which can be a reasonable thing to do in certain circumstances - is treated like the grand prize trophy in the terrorist hunt.

I know someone whose travel plans regularly lead her to fly to city A and home from city B on one airline, and in the middle of the trip taking a one-way flight between the two on another airline. She is pretty close to being a 90 year old maiden aunt, but these one-way tickets have led to so much security hassle that she's thinking next time of buying a round trip and just not using the second half. I advised her not to: not using your ticket will probably get you on even more security watch lists.

Did you see Dilbert today? Panel 2: "There are some people you should never phone from a plane." Panel 3, earlier that day, on a plane: "Hi, Jack!"

Xopher ::: (view all by) ::: January 20, 2004, 11:45 PM:

And again, unmarried people are treated with suspicion...heh.

Steve Taylor ::: (view all by) ::: January 21, 2004, 01:31 AM:

Simon wrote:
> OK. So terrorists of the non-stupid variety will
> do their best to look like 90 year old maiden
> aunts on innocent visits.

In younger days I was 'randomly' searched a fair few times, presumably on the basis of "young male not wearing business gear, entering the country from South East Asia". I never quite realised why this irritated me until I heard someone else talking about Customs searching people who fitted the profile of "the dumbest imaginable drug smuggler"

Erik V. Olson ::: (view all by) ::: January 21, 2004, 08:38 AM:

Did you see Dilbert today? Panel 2: "There are some people you should never phone from a plane." Panel 3, earlier that day, on a plane: "Hi, Jack!

(This is from memory, but I was shown the NTSB writeup on the incident, and was assured that this wasn't FOAF, this was "I was there, see, my name is in the report" -- this was at-the-time DFW gate agent.)

Not a joke. Happened to an AA flight at DFW in the early 80s. The FO was deadheading in, delayed by storms. When the Captain got word that the FO was landing, he went out, did the walkaround, and started working some of the preflight list, to try and get the flight out as least late as possible.

Since he's now in a powered up aircraft, the beacons are on. Once they've been on for over 15 minutes, he calls a tower (DFW has three) as he is supposed to, to tell them that, no, he's not moving for some time, but he is under power. While he's doing that, the FO walks in, and seeing his old buddy, utters the friendly greeting, "Hi, Jack."

Yes, the tower heard. Yes, there was much consternation. Yes, the flight went out a bit later than intended. Thankfully, it became very clear that this wasn't a hijack attempt, but the tower had pressed lots of Loud Buttons when they heard that transmission, and much paperwork was required.

Thus, AA now has many flight crew members named John, but none called Jack, and you say "hello" when you walk into a cockpit.

Patrick Nielsen Hayden ::: (view all by) ::: January 21, 2004, 10:34 AM:

Noah Shachtman, Wired News reporter and maintainer of the very useful weblog Defense Tech, has a Wired News story about NASA's involvement with "homeland security" data mining. You may already be a terrorist!

Jeremy Leader ::: (view all by) ::: January 21, 2004, 12:02 PM:

I'm actually somewhat reassured by the descriptions of this study. Looking for something out of the ordinary seems much more productive than looking for "the dumbest terrorist imaginable". After all, any criteria that security officials can dream up, terrorists can probably also think of, and find ways to avoid. Having thresholds that vary from day to day and flight to flight lowers the ability of terrorists to do dry runs to find out who'll be scrutinized and who won't.

It's also a lot more useful than "watch lists", given how easy it is to obtain sufficient false identification to purchase a ticket and board a plane. However, I'm still a little concerned about identity theft, though such a system might flag a terrorist because "a person like that (the victim of the theft) is unlikely to be doing this (the choice of flight, payment method, etc. of the terrorist)".

Varia ::: (view all by) ::: January 21, 2004, 12:19 PM:

Pardon if this is a stupid question, but is the "you folks" rule a joke or not? I poked around a bit to see if there actually were rules for the comment threads, but I didn't see much. Are there?

Jeremy Leader ::: (view all by) ::: January 21, 2004, 12:56 PM:

Varia, it's a joke of the "Ha ha, only serious" kind.

A while back, someone here (or possibly on Making Light, Teresa's blog) pointed out that people who use the phrase "you folks" in their first post generally only have rude, uninformative things to say.

Teresa's the moderator here, and her usual punishment for excessive rudeness is disemvowelment. I think she was warning Chris that while his post itself wasn't over the line, the wording suggested he might be headed in that direction.

Patrick Nielsen Hayden ::: (view all by) ::: January 21, 2004, 01:09 PM:

Actually, they often have very informative things to say, but it's a good bet that anyone who comes into an online conversation with the string "you people" in their first post is toting some kind of chip on their shoulder.

However, Teresa's warning should be taken less as a threat than as a signal to be more amusing. When Teresa is actually annoyed, the signs will be much clearer.

Jim Henry ::: (view all by) ::: January 21, 2004, 01:17 PM:

The outlier analysis of passengers vs. large data sets sounds similar (in an analogical way) to the way statistical spam filtering works.

Simon ::: (view all by) ::: January 21, 2004, 01:20 PM:

Erik V. Olson, you should send that story to Scott Adams.