DANIEL B. PRIETO: (In progress)—information age. Please note that our comments today are solely our own and do not represent the views or positions of the Markle task force or our respective employers.
So with that, let’s begin.
I don’t know how many of you have seen the recent movie, The Constant Gardener, based on the John le Carre spy novel. But there is an interesting exchange in the middle of the movie where one of the main characters says to another, “I thought you spies knew everything.” The second person responds—the spy—and says: “Only God knows everything. He works for Mossad.” (Laughter.) In light of recent events, I’m sure that many of you feel that maybe God also works for the NSA.
Let’s kick off with Bryan.
Bryan, in light of today’s topic and obviously the hearings on the Hill, what are the challenges and opportunities that face the intelligence community as a result of information technology change—the Internet, mobile devices, data mining? And how is the ongoing debate at the NSA a reflection or microcosm of those challenges and opportunities.
BRYAN CUNNINGHAM: Thanks. Well, like all lawyers, I need to do a couple of disclaimers before I say anything. The first one is, as far as I’m aware, Jeff and I, neither Jeff nor Dan or I, have ever been read into any of these NSA programs. We don’t have any classified or other information about them. So that makes us perfectly qualified to comment on them.
Secondly, I’m recovering from a cold, so I apologize if my voice gives out a little bit here. But people will probably appreciate that being that lawyers like to talk and talk and talk and talk.
I divide up these issues of the changes in law and technology since let’s just say 1978, when the Foreign Intelligence Surveillance Act was passed, into essentially three baskets or three buckets.
The first one is the line at the border problem. Almost all of our law and executive orders and directives and policy on the collection of intelligence are based on essentially two ways to divide up the world.
The first is, was the information collected inside the United States physically, geographically, or not? And does the intelligence pertain, or do you think it pertains, to a United States Person, that is a citizen or a permanent resident alien?
That way to divide up the world probably made perfect sense in 1978, and perhaps for some years after that. But with the Internet, with Skype-encrypted telephone calls that can be had from anywhere worldwide, with the ability to sit in Iraq or Afghanistan and be on an Internet service provider in California, all of those legal distinctions that are based on geography, in my judgment, have become and are becoming increasingly outdated.
And similarly the U.S. person issue: How do you determine if you’re thinking about listening in to a Skype-encrypted phone call that you don’t know where on the globe it’s taking place, how do you determine that the person you may be wanting to listen to is or is not a U.S. person?
So those legal distinctions, it seems to me, need to be looked at and need to be retooled for the 21st century.
The second is what I would call the line at the courthouse problem. Are you going to use the information you’re going to collect for purposes of intelligence operations, for purposes of trying to unravel a network of terrorists, for purposes of trying to get a terrorist suspect to flip and report on his comrades overseas? Or are you going to use the information to prosecute somebody in a court of law?
And the third, and I think in some ways the most intriguing issue, which I think we’re going to talk about extensively today—and my prediction, being an information and privacy lawyer now in private practice, will be the major issue for the next four or five years in this area—is what I call the line at the person problem.
And that is, if you have machines, whether you’re the government or a health care institution or an insurance company or a bank, if you have machines that are looking at information, and that information is not attached to the name or address or other personal identifier of a human being, at what point is your privacy even at issue if no human being looks at the information?
You will get people who will debate that on all sides, but I think that is the critical issue, because both from an operational and a privacy and civil liberties standpoint, we just cannot do this by a human beings.
Computers are going to have to look at all this information, and if you have 2 trillion records, select out the 10 records that may be talking to the terrorists, or whatever—or the 10 health care records that match patient X to patient Y in a different hospital.
And let me just give you—I had three or four hypotheticals, but let me just give you one and then shut up and let Jeff talk.
Suppose you are living in the country of Narnia, and you are an officer with the Narnia Support Agency and you have to determine whether or not the White Witch or the Queen of Narnia, depending on your political point of view, is talking to the bad guys.
So you may want to get the records of the communications that are being had between one bad guy in one part of Narnia and another bad guy in another part of Narnia.
If you could—as the government of Narnia, if you could have the records be created and be accessed in such a way that it would be impossible for you as the government to know the names and addresses of those people and you could have your computer search against those records, and only when you get hits on those records would they come back to you and you would seek legal authority to, in effect, unmask who those people are, wouldn’t you have both increased operational efficiency, because now you don’t have to look at every single telephone call in the country; you only have to look at a few. And wouldn’t you also have increased your privacy and civil liberties protection because the government is not in fact looking at trillions of records; they’re only look at a few.
And I think Jeff has some technology ideas of how that may be possible.
PRIETO: Jeff, in light of a couple of the things that Bryan has brought up, sort of the U.S. persons distinction, the line at the person, and his implication in his hypothetical, the power of data analysis and network analysis, what are some real-world examples from what you do at IBM or from your prior software activities on what’s the art of the possible with technology to find needles in the haystack?
JEFF JONAS: All right. Well, let me start with this notion that there’s a lot of conversation about information sharing, and I think that that’s kind of like a second principle.
And I have kind of come up with what I’m calling the information-sharing paradox, and I think it’s—if you start with information sharing, we’re likely to fail. And the reason is that it’s not really practical to share everything with everybody. And if you can’t share everything with everybody, your next option is, can you ask everybody every question? And it turns out that’s not practical either.
So this information sharing paradox is, if you can’t share everything with everybody and you can’t ask everyone every question every day, how is someone going to find something? And that’s the information sharing paradox.
And I think that the solution is discovery. You have to know who to ask for what. So thinking about this in terms of the card catalog at the Library of Congress, the Reader’s Digest version is no one goes to the library and roams the hall to look for the book; you go to a card file and the card file tells you where to go. If someone were to be putting books in the halls or one of the aisles of the library and not put a card in the card file it would be nondiscoverable.
So a first principle is, holders of data should be publishing some subset of the data, subject, title, author, to card files, and card files are used for discovery, and then you know who to ask.
So the first principle is discovery.
And from a policy standpoint, the question then is, how do you get people to contribute data to the card file? What data are they putting in the card file? And when I think about that, I think about motivating data holders. If you have an owner of a system, their value to the enterprise is the degree to which their data is useable. But before it’s useable, it must be discoverable.
So if you quantified people’s contributions to the card file, if you had an aisle at the library, a system, and they contributed no cards to the card file, their enterprise value would be less.
So I’m speaking here to a metric about how one would quantify discoverability.
And it turns out as you create these things that the card file itself becomes the target. When you put a few billion things in there that point to the documents in the holdings and the assets across many different systems or silos, after awhile it starts to feel like the card file is the risk, the risk of unintended disclosure, the risk of that running away from you, having an insider run off with it.
So one of the things that I’ve been pursuing is the ability to anonymize the card file so that even if the database administrator who oversees this card file is corrupt, he or she can’t actually look through it and shop or scan for names or addresses. The data in it has been scrambled in a way that it’s—when I use the word anonymize, by the way—some of the recent news has indicated that a phone number by itself is anonymized; I would differ. My view of anonymize is that it—to me, anonymized data is data that is nonhuman-interpretable, and nonreversible. So therefore if there is a match, no single person can unlock it. They actually have to go back to the holder of it and request the record, which in an information-sharing model with public sector to private sector, that request would then come out as consent or a subpoena, NSL FISA.
So I thought I’d start there.
PRIETO: There are a couple of big themes here, obviously. You’re implying in your discussion, Jeff, the notion of—beyond information sharing, the notion of pattern recognition in lots and lots of data, and this is really at the crux of, I think, this most recent NSA disclosure.
I think the further details of what you discussed introduce the notion of while you have this ability to sift through massive amounts of data, you can also use technology to protect privacy and provide oversight. You can do it by anonymizing records, you can do it by encrypting records, and you can do it by implementing a system of audits whereby you watch what people are doing with all of this data.
Another major theme, I think, is that, as Bryan mentioned, Internet devices and mobile devices challenge a whole host of existing laws, including wiretap rules and FISA rules, and globalization in technology further challenge the current legal framework.
Bryan, back to you for a second: Jeff started out talking about information sharing. And certainly from the 9/11 commission report, this notion of everyone needing to focus on connecting the dots has really driven a lot of the intelligence reform since 9/11. But where we find ourselves today—even though connecting the dots has been the focus of reform, the real quandary, the real controversy seemed not to be on the sharing side, even though that still needs to be implemented; the controversies of is the CIA capable to go into the future and fight terrorism, and is the NSA capable and legally justified in what they’re doing—it seems that real controversy seemed to be on the collection side in some of these things.
Can you comment on that?
CUNNINGHAM: Well, of course, it depends on what you mean by collection versus sharing. And as—let me do a little lawyer’s trick on you and answer the question I wanted you to have asked, not the question you actually asked.
You said, I believe, Dan, to Jeff, that one of the issues is, in terms of technology, how do you decide when information has actually been seen? And I just want to draw a distinction between two different kinds of what’s now being sort of loosely called data mining, and I’m going to plagiarize from some of Jeff’s work here.
But there is what I could call predicate-based link analysis, which is what prosecutors and law enforcement officers do all the time. I was a federal prosecutor for a couple of years doing international drug cases. You have the phone number of a Cali cartel leader and you want to know in very short order who he’s talking to in the United States, not only because you want to find those people and investigate whether or not they’re actually part of his network or if it’s just sort of incidental that he talked to them, but you also want to understand what their tradecraft is, to use an intelligence term. You want to know, do they call in the middle of the night? Do they always call a pay phone and have their local cell member call back from that phone? Do they keep their calls under 30 seconds?
You can gather an immense amount of useful what we used to call leads in law enforcement but what now would be called intelligence in terrorism cases by just understanding how the bad guys do their business.
But the key is you have a data entry point; you have a point of data; you have the terrorist’s phone number overseas; you have his address; you have his phone card number; you have his credit card number—something to start the search. So I would call that predicate-based link analysis.
Now contrast that with—for example, there was an editorial in The New York Times a couple of days ago by a mathematician who talked about his view of the sort of futility of pattern-based what he called data mining, where you really don’t have any data to start with, but you’re sort of looking around through records trying to figure out generally what are the factors that make somebody look like a terrorist in that particular case. That, I think Jeff will agree, is fundamentally less effective. That raises many, many more privacy and civil liberties concerns.
My own prediction, not knowing anything about the programs that the government is actually engaged in, is that’s probably not what they’re doing. They probably are in a position where they want to be able to when they get the captured terrorist’s cell phone number to actually in a very quick and robust fashion understand who that person is talking to in the United States to prevent attacks, to understand what the network is, to understand what their tradecraft is.
Now, after two minutes I’ll get to your actual question.
The—I think the CIA and the National Security Agency are salvageable. I don’t think we’ve reached the point of vanishing for those agencies. But I think it’s a serious challenge. And I think it’s a serious challenge for a couple of reasons.
One, as I said at the beginning, and we can talk more about this in detail later if you want, the laws and the regulations that govern these agencies were not remotely built for the kind of technology and the kind of communications and the kind of threat that we face today. And I can’t tell you how many hundreds of hours I spent when I was in the government, both in the Clinton administration and in the Bush administration, trying to help the lawyers at these agencies figure out how you could adapt the words of these laws, which were intended to prevent the Soviets from launching ballistic missiles across the North Pole, to the world of counternarcotics and the world of counterterrorism, et cetera.
And so I think the government officers are basically good people; the capabilities are there; but they need sort of a major jump start to drag them into the 21 st century.
Jeff, back to you for a minute. Clearly there’s a dilemma of going through lots of data while also trying to stay on the right side of the law, as well as going through lots of data and trying to stay on the right side of your customer, if you’re a company.
Can you talk about that a little bit? What solutions are out there? Clearly, AT&T is now being sued for its role in this NSA stuff. The other companies are under reputational and potential legal threats for their role.
(At ?) IBM, what solutions can you provide to help them with this quandary, to be helpful to the feds while at the same time protecting, you know, sensitive corporate data and customer privacy?
JONAS: Using Bryan’s trick of first answering anything I want to answer and then answering the question –
PRIETO: Which is very clever. (Laughs.)
JONAS: Which was very clever. I have to learn to use that one more. (Laughter.) I have this notion, and I think a lot of people might think this to be true as well, is that one in a million things happens millions of times a day—like we’re all unique in our very own way. So when we think about data mining and looking for anomalies in the data to find a few bad guys, it is a problematic thing because you end up with too many false positives.
And nobody has spent more on data mining than the world of direct marketers, whether it’s been hundreds of millions or billions spent. But they have huge systems and huge resources have been applied to this.
And the goal is to, you know, if you send direct mail out blindly, you get a 1 or 2 percent response rate. But with data mining where you’ve taken your customers and you’ve found out what magazines they subscribe to, the information they provided on their warranty card, and there is a lot of rich information out there available to marketers, with all of that data and all of these enhanced tools, the best you can get today is roughly maybe 6 to 8 percent response rates; you’re basically talking 94 percent false positives.
It’s still quite useful when you’re dealing with direct marketing and improving your, you know, response rates. But it’s not useful when you’re trying to find a few bad guys. So I wanted to just touch upon that point.
And then your question.
PRIETO: Do you see a solution for the companies and the government to help you assist in this gray area of providing information while protecting—
JONAS: So, you know, it’s funny: We see all these mandates for information sharing, like Executive Order 13356, which was later redrafted; I don’t know the next number that came after that. But you know, thou shalt share.
The interesting thing is, organizations have difficulty even sharing with themselves. You go to any one organization, behind one door they’re working on counterdrug, one door down they’re working on anti-money laundering, and the door down from that they’re working on counterterrorism. Those three doors, the only way they have to discover whether or not they’re dealing with subjects in common, is go fish. You call them up and you say, do you have Majed Moked (ph)? Do you have Khalid al-Mihdhar? How about any threes? Tens? Jacks? This is a highly inefficient way of discovery.
So this notion of trying to find data across different compartments is very challenging. So one of the things that I’ve been working on is a technique that leverages anonymization that allows multiple data holders—and you could use your imagination, but let’s use my example of three different doors on the same floor at the same agency all of whom are unable to connect their own dots.
And the notion is that it’s now possible—and this is being done actually in—I kind of call this cross-compartment exploitation—it’s possible that each of those three units can grind their data, shred their data, basically anonymize it. Kind of like turning a pig into a sausage; if you get the grinder and the sausage, you’re not going backwards to make—you can’t go backwards and make the pig. The three compartments each grind their data into an anonymized form; it’s non-human-readable and nonreversible. But each of these anonymized values has its pedigree on it about from where it came, and who—where did it get made from, which compartment, what record.
And the trick that’s now possible is, you can grind the data first and match it after it’s been ground up. Usually in encryption technologies, you encrypt it, I would send it to you, and you would decrypt it to use it. In contrast, I’m proposing a technique where the parties encrypt it and the analysis is done while it’s encrypted. And when dots are connected, the holder of this connected information can’t learn anything. There’s no names. There’s no risk—or there is a greatly reduced risk of unintended disclosure of this process of connecting the dots. And what you end up with is, you know, you get ESP, go fish, which is, basically, you know exactly what question to ask which other compartment. You say, do you have a three of spades? Do you have Khalid al-Mihdhar? And again, this is a form of discovery. It’s a form of a card file. And it’s a form of using anonymization to do discovery across silos. So that would be an example.
PRIETO: Okay. Let’s switch gears for a second.
Bryan, a close colleague at the Defense Department likes to call the Internet a force multiplier for modern insurgencies, for terrorists and other asymmetric threats. Terrorists have been highly successful in using the Internet for operational planning, propaganda, recruiting. You know, this phenomenon that was highlighted most recently in the recently released State Department Country Report on Terrorism, and in testimony on the Hill by Ambassador Crumpton.
Is our intelligence community paying adequate attention to this phenomenon? Are we using technology against our enemies as well as they are against us? And if not, what more could we be doing?
CUNNINGHAM: Well, first the short answer is, I don’t know. I left the White House in August of 2004, and I have no special access to what’s going on in the government.
I think, though, from what you read in the press—clearly, the government understands that this ability and desire of terrorists and other asymmetric threats to use the Internet as a way to not only evade surveillance but also conduct propaganda operations, conduct recruiting operations, conduct planning operations, is well under way. They’re probably as sophisticated or more sophisticated as we are at using the Internet and other 21st-century communication tools to get their business done.
And it seems to me there are at least two elements to how you counter that. The first one, which I’m quite confident the government is working hard at—and some people would say to bad effect, and some people would say to good effect—is tracking people across the Internet, understanding what they’re saying to each other, using it as a way to collect intelligence. Now whether or not we have the legal and policy and technical tools to win that fight, is another issue. I’m not sure we do yet. But we’re on it. I mean the government is doing that.
The larger issue that I think is probably largely unaddressed by the government is how do you counter the propaganda and the recruiting effect that’s being—efforts that are being conducted across the Internet? Do we have the right public diplomacy? Do we have the right message? Are we putting it out in the right places? Do we remotely have enough cultural understanding and sophistication to tailor the American message, if you will, to the various places where this propaganda is being put out? And I think probably the answer to that is a resounding no. I think from what I’ve read we’re probably doing better at that than we used to be. I think the State Department is devoting a lot of effort to it. But I don’t think we’re there yet.
Let me just touch on something that Jeff said, though, before we entirely leave the subject. This notion of being able to anonymize or shred or hash—or whatever words you want to use—data and be able to search against the data when it’s anonymized is, in my judgment, a huge, revolutionary idea, because what has been reported in the press about the toll—the phone toll records program that the NSA allegedly is doing is, in my view, and I think in the view of the task force that we all worked on, a baby step in the right direction.
That is to say that the government has now understood that you can get a lot of operational benefits even if you don’t have immediately accessible the names and addresses of the people that you’re looking at. But it’s just a baby step. The idea that the FBI and the CIA could search every single bit and byte of each other’s databases in a way that would not be threatening from a civil liberties standpoint and would not be threatening from a turf standpoint, is—will change everything.
Because on the privacy and civil liberties side—and I know there are some folks in this room that practice this everyday—if you have the ability to have data talk to data without a risk that government, whether it’s the FBI or the CIA in my example, can on its own with no other authorization decrypt that data and have the clear text is a significant thing.
From the operational standpoint, the idea that you can have computers talk to each other and fish out of 2.5 trillion records the 10 that you need is also a good deal. I mean, I don’t think anybody here believes that whatever their views of the NSA toll records program are, that the phone companies back up 15 semis of paper records to Ft. Meade and unload them for a human being to look at.
And likewise I doubt anybody believes that they deliver 1,000 CD-ROMs for a human being to look at. The—you couldn’t do that. I mean even the U.S. government doesn’t have that kind of capability.
So from both the operator’s standpoint—efficiency, speed, operational success, efficient use of resources—and the privacy and the civil liberties standpoint, this idea that Jeff talked about that you can search data in a way that has fidelity, that you’re going to get the right matches; you’re not going to miss Mihdhar in San Diego, but also you don’t expose all of the clear-text unencrypted data to the government is a big deal. And it’s a big deal not only for the government. It’s a big deal for the health care industry; it’s a big deal for the banking industry; it’s a big deal for higher education, all of which are sectors that I work in. To be able to have confidence that you can get the right results but you’re not exposing people’s privacy in the meantime is a huge, revolutionary idea, in my judgment.
JONAS: I’d like to just jump on—
PRIETO: Any quick comments on that? And then we’ll move to audience.
JONAS: One of your—your question was directed at how do bad guys maybe use and take advantage of technology. And you know it’s—I just see the world right now as there’s just a big competition. Technology is advancing so fast because, you know, in the corporate world companies are trying to do—you know beat out their competition. And then governments are competing and governments compete with asymmetric threats and whatnot.
And if you look at 1996, 10 years ago, the Cali drug cartel took—you know, their bad guys are the moles; they’re our good guys. So they had counterintelligence operation to find their bad guys. And what did they do? They took all of the data out of the Colombia’s national phone system for the city of Cali.
CUNNINGHAM: Did they have a warrant for that?
JONAS: (Laughs.) Probably not, sir. (Laughter.)
They ran link analysis, not looking for generalized patterns, but they ran link analysis looking for members of the Cali cartel related to American and Colombian drug—you know, government counternarcotics folks. And the result of this helped them narrow down their bad guys, over which 12 people lost their lives.
So this is 10 years ago—10-year-ago tradecraft. So I wanted to mention that.
And then this notion—
PRIETO: (Off mike)—quick, so we can—
JONAS: Yeah, yeah, real fast. Minute and a half? Okay.
This notion of anonymizing U.S. persons data to prevent—or to improve the privacy enhancement—
CUNNINGHAM: (Off mike)—NSA.
JONAS: If you say so. I guess you’re not going to give me that time back. It turns out if you anonymize the U.S. persons data, then in that form, it’s not easily relatable to—this could be—
PRIETO: Wrap up.
JONAS: Yeah. So wrapping up, if you are going to anonymize the data of U.S. persons, it turns out it’s more efficient to also anonymize the rest of the records from the rest of the world. And I think that’s a good thing too from a privacy perspective.
PRIETO: Great. We now invite council members to join in the discussion. Please wait for the microphone and speak directly into it.
When you do get the microphone, please stand, state your name and affiliation, and keep questions concise, please, to allow as many members to speak as possible.
QUESTIONER: Rob Quartel. And I should preface this that I’m involved in managing a lot of data for risk assessment on container security.
It seems to me there are really two questions here. One is, lawyers always, for the government, say, it’s lawful. One question is, when do we get past lawful and what’s right? And I think a lot of the discomfort in society today is a lot of what maybe is being done is lawful but doesn’t feel right.
And then the second part of that question is, where is the line between it being easy to track through trillions of records for a terrorist who is trying to hide versus all of a sudden the government on its whim deciding to go after people who are not trying to hide from something that they make unlawful.
So where is the new paradigm? You know the Church commission came up with a paradigm and kind of a metaphor here for what the line was. Given those two things I just suggested, where are we going and what’s the new metaphor?
CUNNINGHAM: That’s a great question. And it’s not only government lawyers that obscure the policy versus legal implications; I do it in private practice all the time. (Laughter.)
Look, the concept you’re talking about, in my judgment, which is, what is “right,” quote-unquote, is really perfectly embodied in the Fourth Amendment to the Constitution. Those guys were pretty smart; they knew what they were doing.
And what the Supreme Court said in 1979, for example, in a case called Smith v. Maryland, is, essentially, Americans do not have any legitimate expectation of privacy in not only toll records, which is what we’re talking about today with the reported new NSA toll records collection, but also the real-time transmission of who you’re calling and who’s calling you, the so-called pen-registered trap-and-trace data.
And that was not just a legal analysis; that was a societal judgment. They were saying what do Americans expect, and I think the poll numbers about this NSA program sort of bear that out. I mean most Americans—you’ll get different results depending on how you ask the question, but most Americans say, look, these records are available to the phone companies; who knows what they’re doing with them with telemarketing and et cetera. And if the government can look at these in a way that protects the privacy of innocent Americans and helps fight the war on terror, I’m okay with that.
Now part of the issue is, how informed are people when they make that judgment? Do we really know everything about what this program is? Do people understand it? Or are they just sort of reacting to the way the question is phrased?
I think we really need to have a debate in this country and a discussion about particularly the issue I talked about earlier: If a machine sees your data, is your privacy invaded or not? You’re going to get—if we talked to everybody in this room you’d get 100 different answers to that question. But we need to talk about it. And I think you can talk about it without revealing sources and methods, without revealing classified information. We need to have that debate, and we need to I think redraw the line in a way that people in the 21st century are comfortable with.
And I forgot the other question.
PRIETO: Just quickly, the polls that Bryan references, the one I saw was Washington Post-ABC poll, which showed 63 percent of respondents in favor of this program. And I think the dilemma about what we can do and what we should do is certainly clear between those poll numbers and the hue and cry immediately following on the Hill.
CUNNINGHAM: Oh, let me just say, I remember the second part of your question—how do we know the government is not going to use this for bad purposes, even though they may have collected it for what most people I think would say are good purposes? The answer to that is, of course, we don’t. I mean if a bad FBI agent or a bad CIA officer wants to track down their ex-wife or their ex-husband or their mistress, there is not a law you can write that’s going to stop them from doing that.
But what you can do is have greatly enhanced oversight, including technical oversight. I think Jeff hinted earlier at the notion of immutable audit, which—we didn’t use those words, but if you have all of these transactions, all of—the government’s looking at things being recorded electronically, you can go back and figure out whether people are violating the rules, whether people are trying to chase down their ex-wives instead of fighting terrorism. And you can punish people that do it wrong and you can refine the rules and make the rules better.
JONAS: Just a really quick response to this is that my general rule that I’ve been advising governments and companies on as they think about how to take better advantage of information, the rule is: avoid consumer surprise. And that means, regardless of what the law is, do something in a way that you’re less likely to—have everyone be shocked when it makes the front page.
As I synthesize—40 percent of my time is involved with the private community, and that’s how they synthesize now all of that conversation.
PRIETO: David Ensor in the back.
QUESTIONER: Hi. David Ensor of CNN.
On anonymized, or “made anonymous,” you rightly said, sir, that the newspapers have lately talked about it in a way that’s inaccurate. Obviously bunches of phone numbers—I mean I’ve got Google; I can find out—and other tools—I can find out who somebody is if I just have their phone number.
So I’d be interested in what you really mean by anonymize, and how sure can we really be that anonymous is really anonymous? I mean, right now we’re talking in the newspaper about anonymous, and you and I both know it isn’t. I can find a tool that will tell me what a phone number leads to.
And on another point, you, sir, said you don’t believe that the government is really using the data that it gets, the phone call data, to just look for patterns. You believe it’s using it with some knowledge, some external call by a terrorist or something.
I’ve talked to officials who tell me that they are using it; they are looking for patterns without that kind of information. They won’t tell me exactly how, or whether it’s in reference to past knowledge of patterns, information perhaps about Khalid al-Mihdhar’s phone calls or whatever. But they say they are using patterns.
So I’d just ask the panel, how might that work? I have to try to explain it to people, and I’m not sure I can.
PRIETO: Jeff. (Laughter.)
JONAS: Thank you.
Clearly—like I said, phone number alone is to me not anonymizing. It is a trivial task to take a phone number and figure out the name and address. There’s—it’s called a reverse directory, and you can buy databases that are cheap to do that. So I don’t call that anonymized.
When I use the word anonymize, I’m talking about a cryptographic function, a function that an organization like NIST has certified it’s part of the kinds of cryptographic—you know, when you encrypt an e-mail and send it to somebody, the same mechanisms that are used to do that are used for anonymizing.
Now the question is there’s a variety of attacks for cryptographic data. So I’m careful to not say that it’s you know bullet-proof, but I do offer up that if it is—the degree of effort it takes to go and unlock it and attack it is higher. So your risk of having the data run away then is less.
So I have this view that if you’re going to—this other view is, if you’re going to share data with somebody, and if I can tell you that you can share it in an anonymized form and get a similar result, why would anyone share their data any other way? I mean, the risk of unintended disclosure is just too great. In the corporate America sense, the bank’s customer list suddenly escapes, this is really bad news. So I just see the anonymized world as better than sharing clear text.
CUNNINGHAM: To both your questions, David: First of all, I’m not going to concede that it is so easy to de-anonymize trillions of phone call records. I have run my own phone numbers through Google. I’ve run my own phone numbers through 411.com. My home number, my land line number, is pretty easy to re-engineer. My cell phone number I can’t. I’ve paid the subscription. My cell phone number does not pop up in a reverse directory.
More importantly than that, though, to my mind is, if—it’s a question of operational efficiency. Doesn’t make any sense, even if the government could essentially de-anonymize these numbers, for them to take 260 million—however many people there are in the United States—and put together a database that has those things matched up. I contend it doesn’t make any operational sense. It only makes operational sense to go to that effort even if it’s a minor effort when you’ve narrowed the universe down to the 10 phone calls between Detroit and Osama bin Laden, or whatever it is. Number one.
Number two, as a legal matter, not a policy matter, not a societal matter, I don’t think that question makes any difference. If the government has access to the records in a way that they don’t have the names, the fact that they could go and put the names with the numbers doesn’t matter legally.
Just like if you have corrupt government officials, they could always call the IRS up and bully them and say, give me all the tax records that attach to these phone numbers. Again, if you’re going to have people that are going to do the wrong thing, you’re not going to be able to write a law that’s going to enforce that. You have to have internal audit policies; you have to have oversight to do it.
On the pattern question, I don’t know who your sources are, obviously, and I wasn’t read into this program when I was in the government. I would be extremely surprised from an operational standpoint if this data is being looked at purely on the basis of pattern analysis.
Now where you are going to have a dispute is, what is a predicate? My hypothetical is, you have Khalid Shaikh Mohammed’s cell phone, you run his phone numbers. That’s fairly obvious and easy. I’d be—if they’re not doing that, the president should be impeached, in my view.
If you don’t have that phone number, you still might have data that would give you a predicate for searching a database. For example, suppose—and I’m making this up; I have no idea whether this is true or not—but suppose the tradecraft that al Qaeda gets trained to do in the camps in Afghanistan is you don’t ever call somebody directly in the United States. You call a person; you talk for less than 30 seconds; you give them a pay phone number, they call back from the pay phone.
If you were to search the database for calls that fit that tradecraft, that mode of operations, you might get some people that would say that was pattern-based. I wouldn’t say that. I would say that’s predicate-based. You have some intelligence; you’re searching against the intelligence.
You know, it depends on how your sources are using their terminology. I doubt they’re just saying, hey, let’s look at all these records and try to figure out what a terrorist looks like, not because I think they’re sitting around every day thinking of the privacy and civil liberties implications of that, although I hope they are, but because it makes no sense operationally. You would waste so much effort and so many resources to do that, you’d never do it. You’re not going to catch terrorists that way.
PRIETO: Just quickly, before I take a second question, I think there is a counter-hypothetical to Bryan’s case, which is—
CUNNINGHAM: Well, I’m sure there’s not.
PRIETO: (Laughs.) This notion of fishing expeditions—can you find patterns in large amounts of data without a predicate? Just quickly, I don’t want to interject too long here, but you know, scanning all the calls in Lackawanna and then seeing within two or three steps if those calls go out to Afghanistan or Syria. And if that’s a fishing expedition, I raise the general question, is it okay as a matter of policy, and maybe is the public comfortable with that, as long as it’s a computer doing the searching and it’s in some black box—therefore it’s completely anonymized—raising the next question, if as a policy matter or as a social matter, if folks are okay with that, because there is some protection in the anonymity, you can then—you need to get into the discussion of what are the rules for escalation. If the boss tells me something suspicious is there, who do I then go to ask for a human to look inside of it?
So that’s part of an alternative case—well, maybe there’s knowledge coming out of fishing expeditions without a predicate, but which there is protection, and if you put the right rules in place, you have escalation procedures that include the Congress and the courts before a human actually takes a look at them.
With that, Shane.
QUESTIONER: Thanks. Shane Green at the Map Network. And I’m in the search business, and I really want to build on Dan’s question. I wanted to ask, assuming all of the things he just said and that anonymized data does live in a black box and so forth and everyone else in the United States is comfortable with this kind of process of searching data, I wanted to ask you, Jeff, about the sort of search algorithms that the Googles of the world are developing and how that applies to this sort of encrypted data.
Google’s founders themselves say they’re about 5 percent as efficient as they’d like to be in search and they’re developing whole new worlds of search algorithms that track behavior. And they literally say we want to know what you want to find better than you do. And with access to your past behavior, we’ll be able to tell you better than you could know yourself what it is you’re looking for. And that’s a pretty daunting proposition, but the more I explore it, I think there is a lot of validity to. Trying to apply that to what mostly sounds like a very binary type of search process that you’re doing—and I’m not asking for inside information, just how you apply this sort of new, robust search methodology and processes, algorithms to a world where maybe this encryption makes most of that data pretty blind and pretty hard to make sense of.
JONAS: Oh, I could go about this forever, but let me try to bring it up to the—(inaudible)—level. I see this convergence of data and queries. This notion that queries are different than data I think is a misconception. I see queries as data, and I see data as queries. And you see things like Google; you see things like Amazon, where the query—what you’ve been looking at, what you’ve been clicking on—helps tune these engines to give you better results. So we’re going to see this convergence, I think, of data and queries.
I think technology is going to take us to a place where we’re—right now this funny notion that we expect the analysts to think of the question. You can’t expect these analysts to think of every question every day. So I think where we’re going is where systems are connected and the data is finding the data and the relevance is finding the user. And I think this is the direction. And when you see that that’s the direction, then you immediately have to stand back and say, what are the policies? What do we consider the observer, the person or the system? What data are you willing to let systems observe?
MR. : (Off mike.)
CUNNINGHAM: Can I just add one thing to that? I think it’s a very important, powerful point that needs to be debated and discussed in the future.
This sort of paradigm that I think we’re suggesting of being able to electronically and in an immutable, non-changeable way monitor what the government’s doing, what questions they’re asking, what responses they’re getting—even if it’s all done by a system and no human being sees it—is—we’ve already talked about it—in my view helpful from a civil liberties standpoint; it’s helpful from an operational standpoint. But it also can create, in and of itself, powerful intelligence.
Suppose the system before 9/11 had known that, over the past 10 years, 15 analysts in different parts of the intelligence community had looked at the question of whether you could fly airplanes into buildings. Remember there was this big debate, which I really don’t want to get in to, between Condi and Dick Clark about whether or not anybody knew that could be done. Well, of course there were 10 people who, at some point—I mean, Tom Clancy put it in his book. But there was no system to tell the President or to tell the National Security Adviser or to tell the DNI that people cared about this.
If you have an electronic system that is monitoring queries and results, and a year from now, 15 analysts out of the 10,000, or whatever there are in the government, are asking the question, how does this particular strain of Ebola virus work, that’s some damn important information that the director of National Intelligence, it seems to me, would want to know. It doesn’t matter what the answer is. What matters is that a significant number of analysts are all asking the question. You could create a top 10 list every morning for the DNI. Here’s the top 10 things that your analysts around the government are looking at. That’s a big deal. That’s new intelligence. And in my view, that’s not invading anybody’s privacy.
PRIETO: In the gray jacket in the fourth row.
QUESTIONER: Thank you. Miriam Sapiro from Summit Strategies International. I do a lot of work on Internet policy and governance issues.
I have a question for each speaker.
For Bryan: On the international front, we’re not the only country that is doing this or struggling with these issues—some for good purposes, some for not. So my question is, do you think there are any current legal, international legal regimes or frameworks or even guidelines that we either have or we should think about developing? It may not work, but at least there would be some constraints on the ability of others—other countries to mine our data, domestic data, that belongs to all of us.
And a question for Jeff: There’s an irony here in that we’re so concerned about anonymizing data because we don’t want most of the people to have their data out there. Yet, at the same time, we really want to be able to drill down on the data of certain individuals that we do need to know more about. My question is, as a policy matter, who is deciding on anonymizing the data and what the constraints would be and what they wouldn’t be?
And then I was really intrigued with your observation that you can’t make a pig out of sausage. Everybody understands why we want to anonymize data, but to what extent are we then constrained from rebuilding the data profiles for those people that we do care about?
CUNNINGHAM: Let me just touch on the international legal issue here. We, before the Narnia Support Agency program was revealed, I think we were planning to talk about a couple of examples of international issues between the United States and, particularly, the European Union, where these technology issues come up. And the most obvious one is the fight that we’ve been having, the discussion we’ve been having for the last five years with the European Union on passenger data—airline passenger data.
Now, I’ll be a little cynical here and say that my general view, and I do a lot of work with European clients, is that the Europeans have much better laws than we do, but they ignore them. We have weaker laws but we, generally speaking, tend to follow them. And I’m not making a political statement about any administration. But in the private sector and in the government, generally we tend to try to follow our laws.
Their privacy and information-sharing regimes are much stricter than ours. But I don’t think the governments follow them, and I don’t think they enforce them against their own companies. So there are sort of two ways you can square that circle: one is you try to make our laws fit theirs, which will never work, because for one thing, we’ve got 50 states that are making different laws all the time.
My day job is information security. So I advise a lot of companies on these information security breach disclosure laws, and there are more than 20 of them in different states. You try to talk to a British company about how they have to comply with the one in Illinois versus the one in California, it makes your head spin. And you’re never going to rationalize that. There may be a federal law, but the federal law will probably say, you do everything we tell you as the U.S. government, and then if any state has stricter laws, you’ve got to comply with those too.
So circling back to answer your question, I think this sort of technology that we’re talking about of truly anonymizing data and allowing in my example of the French airlines and our TSA to search each other’s data in a way that the U.S. government can’t see the results unless you go to some sort third party to decrypt it is a major way—maybe THE way—that you’re going to deal with these different legal regimes. Because in that hypothetical, the Europeans could say, well, we’re not going to turn the key and let the U.S. government see all this information until we think they’ve met our privacy laws. And we can do the same thing. But you don’t fail to search the data because you haven’t resolved all the legal issues.
You know, I am a lawyer, but lawyers are in some ways the worst enemy of everybody in the sense that we want every single potential legal question to be answered before we let anybody do anything. And if you can gather the intelligence and then answer the legal question, once you’ve narrowed the universe down to 10 records instead of 10 billion, I think that’s a big deal.
PRIETO: Back row, gentleman in the pink shirt.
QUESTIONER: Thank you. My name is Mike Haltzel, DLA Piper Rudnick.
First, an historical point before I ask a question about whether we knew that anybody was thinking about the feasibility of flying a plane into a building. I mean, in 1995 a terrorist group hijacked a plane in North Africa with the intent of flying it into the Eiffel Tower. They had the misfortune, and the French government had the good fortune, that they landed in southern France to refuel—I think Marseilles. They were boarded by French commandos and the operation was over. So we knew that people were thinking about it. That’s not even debatable.
But my question is kind of—and I’m not a techie like everybody else here, so this is sort of a lame and simple-minded question—it strikes me that open-source data on the Internet is an area that our counterterrorism people ought to be spending a lot more time on.
I know there’s the well-known case that a Norwegian institute that scrutinized Arabic-language websites—radical Arabic-language websites—found that on September 12 th, 2001, essentially tremendously relevant information that could have led to uncovering September 11 th.
My question is this: Are we—first of all, would you in general agree that if we posit that the government, in spite of all evidence to the contrary, has finite resources, would you agree that looking at these sort of open-source things, instead of the kind of data mining or whatever you call it with a 94 percent false-positive rate, would be a better way to allocate resources?
And secondly, technically, are we able to hack into, let’s just say, radical Islamist websites in order to re-engineer and figure out who’s making the hits on these websites? Is this feasible?
PRIETO: Let me interject quickly.
I have, according to my watch, about four and a half minutes left. So let me do this, take your question, but also take two or three as a last grab bag and let the speakers address what they will in closing remarks.
Jim Landé, make them brief, please.
QUESTIONER: Thank you. My name is Jim Landé. I’m in the Office of the DNI.
My question is similar to Mike’s. This discussion began and focused, as they often do, on hot pursuit of terrorists. But intelligence reform was also driven by the need for better analysis. In fact, the WMD commission focused on that.
And so Jeff mentioned and referred to analysts. And Bryan had one of the most interesting ideas I’ve ever heard, which is advise the DNI of what’s the top 10 list being thought about by analysts? But with reference to open-source and all this technical collection and other collection, there’s just so much out there—perhaps millions of bits of information per day—what can technology provide to filter and provide lenses to bring the information down to a level where your expert analyst can actually use it and come up with these sorts of important themes?
PRIETO: One last very quick question. Make it as punchy as you can, and I let these guys close.
In the blue shirt and the black suit jacket. The woman in the next-to-last row.
QUESTIONER: Hi. I’m Linda Robinson with U.S. News.
For Bryan: Can you say what changes in the law specifically need to be made?
And Jeff, I understand that you developed the technology that really does permit this anonymized data. Can you say that the government is using this in these programs that have become controversial?
MR PRIETO: Okay. That’s it for questions.
MR CUNNINGHAM: All right. Let me try to do four bullets—
CUNNINGHAM:—and then shut up, which is hard.
On the 1995 historical example—of course. I mean, there were many, many other examples, including Tom Clancy’s book. The issue isn’t did “we” collectively know it, in my view. The issue is was there any mechanism to sift that wheat out of the massive amount of chaff of everything else we knew about what terrorists would do and bring that as a priority issue to the national leadership. I don’t think that was done to President Clinton; I don’t think it was done to President Bush. You’re not going to get a president to internalize everything that people have ever known. You have to have a mechanism to prioritize.
On the open-source question, -I completely agree with you; we need to be exploiting more open-source. I would just simply quarrel with what I think is your premise, that it’s some sort of binary choice, that, we should be doing open-source more instead of trying to understand what phone records mean. You’ve got to be doing both.
On the technology for better analysis, I’ll throw that one over to Jeff.
But again, I think a lot of the game here is having mechanisms to let us know what we know. There are thousands of analysts in the U.S. Government and in state and local government. We need to know what issues they’re looking at, and I think that will give us a lot of intelligence and there’s technology to do that.
Changes in the law specifically: I think, as a general matter, without getting into FISA and the dozens of other statutes, we really need to have a serious discussion in this country about whether it makes sense anymore to divide up the world between what was collected physically in the United States verses overseas and whether information pertains to a quote-unquote “U.S. person.” And what are the transaction costs and the time loss for the government in trying to make those determinations?
Now if we change the law to not make that determining factor, though, we need to have something in its place that will equally protect civil liberties and privacy. And I think there’s a lot of good ideas about that out there, and I’ll talk to you about them afterwards if you want.
But I’ll throw it to Jeff.
PRIETO: Jeff, over to you for a minute or two.
JONAS: Whether you’re talking about structured data in databases or open-source, it’s my observation that if you are looking for anomalies and you have no entrance point, the risk of false positives makes such systems of less use.
I therefore believe that the best way to find a few bad guys is when you have a starting point and then you’re pulling those threads. It’s also more privacy and civil-liberty protective. There are governments that are already using some of this anonymization work that I created. I’ve been very careful to make sure that when you talk about anonymization, it’s not so that you can bypass an existing policy or law. This would be a misuse of the technology. The notion is if you are sharing the data or you’re going to, then using anonymization is just a better way to do it.
There; how’s that?
It seems that our time has come to a close. Let me close by thanking everyone for coming today on what is a very interesting topic. And when it comes to technology and national security, to quote—to paraphrase; excuse me—President Eisenhower, I think it’s essential that we have an alert and knowledgeable citizenry, which you all now are, to ensure that security and liberty prosper together.
Thanks very much. (Applause.)
© COPYRIGHT 2006, FEDERAL NEWS SERVICE, INC., 1000 VERMONT AVE.
NW; 5 TH FLOOR; WASHINGTON, DC - 20005, USA. ALL RIGHTS RESERVED. ANY REPRODUCTION, REDISTRIBUTION OR RETRANSMISSION IS EXPRESSLY PROHIBITED.
FEDERAL NEWS SERVICE, INC. IS A PRIVATE FIRM AND IS NOT AFFILIATED WITH THE FEDERAL GOVERNMENT. NO COPYRIGHT IS CLAIMED AS TO ANY PART OF THE ORIGINAL WORK PREPARED BY A UNITED STATES GOVERNMENT OFFICER OR EMPLOYEE AS PART OF THAT PERSON’S OFFICIAL DUTIES.
FOR INFORMATION ON SUBSCRIBING TO FNS, PLEASE CALL JACK GRAEME AT 202-347-1400.
THIS IS A RUSH TRANSCRIPT.