PrintPrint CiteCite
Style: MLAAPAChicago Close


Foreign Affairs Media Conference Call: Kenneth Cukier and Michael Flowers on "Big Data"

Speakers: Michael Flowers, Director, Financial Crime Task Force, City of New York, and Kenneth Neil Cukier, Co-author "Big Data: A Revolution That Will Transform How We Live, Work and Think,", Data Editor, the Economist
Presider: Gideon Rose, Editor, Foreign Affairs and Peter G. Peterson Chair
May 9, 2013
Council on Foreign Relations



OPERATOR: I would now like to turn this conference call over to Mr. Gideon Rose. Sir, please begin.

GIDEON ROSE: Thanks. Hi, everybody. Gideon Rose here, editor of Foreign Affairs. And we have a great call for you today. We have all heard about big data. We've all started to read about it. But very few people know exactly what it actually is, what it means, what the consequences of using it will be. And we are here to talk about that today.

Our participants are two leading experts on the subject. Ken Cukier is the data editor of the Economist and one of the co-authors of "Big Data: A Revolution That Will Transform How we Live, Work and Think," a great new book that's come out, and also the author of -- co-author of "The Rise of Big Data: How It's changing the Way We Think About the World" in the latest issue of Foreign Affairs.

Mike Flowers is director of Mayor Bloomberg's Financial Crime Task Force of the City of New York, and actually an expert in the practical use of big data to address major issues in public policy. And just a slightly different perspective and can talk with us about that. So let's get right to it. We'll have Q-and-A with the participants and then I'll throw it open to you guys for your questions and bring everybody in.

Ken, very briefly, what is big data and why indeed is it so important?

KENNETH CUKIER: Sure. Thanks, Gideon. Yeah, so there's no one hard and fast, concrete definition of big data. It refers to a basket of new technologies, but it all sort of points in the same direction, and that is to say that there's something new happening, that the amount of information in society is vastly larger than ever before. And we're learning that with new tools, that we can learn something from a huge amount of information that we couldn't when we were only dealing with smaller amounts.

Now, in some instances, we can look at some of these techniques, and they have weird names like nonlinear correlations or network mapping or something called machine learning, and all of these are facets of it. But it doesn't matter what domain you look at, whether it's, you know, biotechnology or whether it's how to spot a structural fire in the five boroughs of New York City, that the world is being transformed because finally for the first time we can really collect a lot of information, more than we ever could before and learn from it in new ways.

ROSE: OK. Mike, let me bring you in here. You were a DA in Manhattan, and you were doing everything from prosecuting homicides to financial crimes, and then you went into corporate law. And now you've realized in your last several years that you could use things like big data and various kinds of analysis to combat crime in new ways and in interesting ways and that's what you're doing inside the city. So why don't you tell us what this kind of big data stuff means for actual practical people in public policy.

MICHAEL FLOWERS: What it means essentially -- and, you know, the -- what it means essentially is that we can send our resources, our ever-stretched resources to where they need to go in a way that will actually get us a return on that dollar much more effective than we had previously. And in a sense, the morphing of my titles reflects the morphing of the role in the city. I started out as a director of the Financial Crime Task Force and still am that, but was recently appointed the chief analytics officer for the city, which is really reflective of the fact that we see the utility of this approach going well beyond crime to code enforcement, economic incentives, disaster recovery, et cetera, et cetera. So what it really means is for us to be able to stretch our tax dollar a lot more effectively than we previously had.

ROSE: So how in practice do you use it? So how do you -- what is -- how is your job different now because of this in ways -- in a practical basis than it would have been a decade ago?

FLOWERS: A decade or two ago, I would have -- when I was a prosecutor, it was an extremely paper-driven environment, it had to do with face-to-face interviews, trying to catch people, make people slip up, you know, finding places where there should have been entries when there were no ones, if you're talking about a financial matter, whereas now that practice -- all of those hours that were allocated towards doing that can be shrunk down to minutes, and then, moreover, because prosecution by its nature is reactive, to wit, something happened and we're trying to respond to it, law enforcement can move to the proactive, which is, you know, detecting things before they actually cause the damage that the crimes themselves cause, thus saving ourselves and the citizenry a lot of pain.

ROSE: OK. So Ken, this sounds suspiciously like the pre-cogs and "Minority Report"-type stuff of science fiction. Is that where we're going, in your view?

CUKIER: I think, yes, we are going in that direction. And there's reasons to be fearful of it, but there's also reasons to be at ease with it and realize, hey, this is a great tool that we can use to help prevent crime and help to learn things. And we just need to use this tool in the right way.

So I'll give you an example, bundling on what Mike said, you can imagine that in the past, to understand cartel behavior, you had to have someone as a turncoat, you had to have someone wearing a wire or you had to have someone (stink ?) on the others. And now what you can do is you might be able to do a big data analysis of, say, the price of gasoline in a certain metropolitan area and the distribution of gas stations, and by looking at it, you can see what is the normal distribution and range of prices and how should the prices fluctuate given a certain concentration of gas stations. And if you see something that's out of the norm and considerably out of the norm, you might then have the suspicion that there's a cartel-like behavior happening in that area. It was naked to the human eye, but you can see it when you process it, you know, because you have the data. So that's how big data can, if you will, predict that a crime might be happening and you could take preventive steps.

Now what do you do with that information? In a world of "Minority Report," you're Tom Cruise, you bust in, crash through the window and you arrest everyone and you throw them into jail with a weird halo on their head. Now that, of course, I think is an affront to people who think deeply about the law and realize that that -- we don't know whether a crime has been committed. We only have a probabilistic reason to believe that that's the case. So the solution here would be you tell the investigators, here's where you want to go. This is directional. It points where we then need to build a case in the traditional way that we always did with hitting the leather of our shoes on the pavement and then working out. But this had helped us because it put us in the direction of where we wanted to go.

ROSE: OK. Ken, I mean, I hear about these things and it all sounds -- and it makes logical sense, but it seems in some ways pretty familiar. I mean, for a century and a half at least we've been starting to apply statistical analysis to public policy. You know, we all watched Nate Silver get the polls better than the qualitative analysts in terms of election predictions last November. Is that the kind of thing you're talking about? Is big data just sort of wise use of statistical analysis with modern computing behind it, or is there something more and different?

CUKIER: There's something more and different, but it builds on the wise use of statistical analysis. Statistics are an ever-more important, except in the world of big data because the data sets are so large that this is the way that we make sense of large data sets. Traditionally in statistics, however, what we try to do is to get the biggest powerful finding from the smallest sample of observations. That was axiomatic, and the reason why was because the cost of collecting the data, storing it and processing it was just so high that the only thing we could do is defer to samples. And we tried to optimize on the sampling by going to random samples, realizing that we could do better by increasing the randomness than we could by increasing the number of observations by collecting more data. And the reason for that is that we learn marginally less with more information, but that's because we were asking really blunt questions. When we collect all the data, or at least all the data that is available relative to the phenomenon that we want to look at, we can now ask different questions. For example, we can drill down into the granular that traditional sampling always missed, and so that's new.

But there's a second thing -- and I'll be quick about it, as well -- which is that in the past, when we wanted to do analyses like this, we traditionally looked at factors that were intrinsic to what we were studying. So think of a credit score, when we look at a credit score today, we're looking at maybe 30 known variables that all relate to the financial property of the person that we're looking at. Right? They have their loan repayment record, whether there's delays, whether there's been suspensions, et cetera.

Now, in a big data world, it might be different. First, what we're looking at might be completely extrinsic to what we're we think we're -- we're analyzing. We might be looking at someone's hair color or whether -- what browser they're coming from, because that happens to be a powerful signal that improves our model of whether they'll repay a loan or not. These are real examples, by the way -- not hair color, but in the case of the browser you use is a predictor of certain things, whether you type in all-caps or all small lowercase or normally is also a predicative model used by a company called Just (ph) Finance to identify whether someone's likely to repay a loan or not. It's, if you will, an extrinsic factor to loan repayment, but it correlates well.

And then finally, there's going to be a lot more variables than ever before. It's fine in a small-data world to think about, say, 30 variables that we can explain to someone else. So what happens if we have a thousand variables? Where is the explainability? In that instance, big data is a black box.

ROSE: OK. So let me ask you a question about that. It seems to me you're describing something else that's a black box that aggregates a vast amount of information almost sort of instantaneously and unconsciously beyond the bounds of individual control or analysis, and that's prices in the market. Is, in effect, pricing a form of instinctive, unconscious use of big data by the market?

CUKIER: Not really. It's very bounded, right, because you have a buyer and a seller. You have a good that's being transacted, and so there's -- if a marketplace -- I mean, it's an interesting point. A marketplace is different because the rules, if you will, are very bounded of what it is. If you will, you can imagine saying checkers or chess looks like there's an infinite number of moves that you can make. But of course, the number's really not infinite; it's just a really, really large number that we normally couldn't otherwise comprehend. Now, with computers, we've sort of cracked chess and we've cracked checkers because we can learn from data; we can process it.

And actually, machine learning algorithms first apply themselves to games, right? So if you will, the market starts to look a little bit more like a game than it does like the great hurlyburly of reality in which we need to -- we can't use our intuition to know how to respond, if you will. In this instance, how do we know if someone's going to repay a loan or not? We would never have thought, in an online application, to look at something like whether they typed the application in all-caps, all lowercase or in the normal punctuation, but that turns out to improve the power of the predicative model, (invest cashes and invest ?) finances model. And so it's being used today.

ROSE: It could be just my father-in-law who doesn't know how to use a computer yet.

Mike, so the Obama administration just released some new rules today about data and transparency. Is this -- is this going to help facilitate the processing of data, and does it relate to what we're talking about?

FLOWERS: I think it will certainly facilitate the leveraging of the data, and by that I mean it's a great thing. I mean, it's self-congratulatory to a certain degree to say, because that executive order tracks very closely a lot of what New York City's own open data law requires. But --

ROSE: Can you explain briefly what that is?

FLOWERS: Well, agencies have to make data available that can be made available statutorily in a machine-readable, readily useable format such that somebody could then upload it from the public side -- I think it's -- then -- and then embed it in their own applications or use it as they see fit to learn what they want to learn or build a product or whatever. And we see that a lot in New York City. People download our restaurant licenses constantly and then build apps off of that. so that's essentially what the executive order mandates.

The other part about it, leaving aside just, you know, relation to the citizenry and transparency out among the governed, is transparency between government actors, which is incredibly elusive. I've been involved in various stages and various branches of federal, state and local government my entire career, with a slight exception, and one of the hardest things to do was to go to another agency and say, please give me this information about whatever, and that took weeks and weeks and weeks and weeks, if ever, to get it. And now you just go the open data portal and you'll be able to pull what you need. So you're in many ways incenting entrepreneurialism inside the government as much as you are outside of it.

ROSE: This is going to be really, really interesting. We have a lot of people on the call, and they have a lot of good questions. So I'm going to cut my time short and throw it open to our audience, to let them get in on the action and have at you two guys.

OPERATOR: (Gives queuing instructions.)


ROSE: My god, you guys have explained it so well, there's not a single question.

OPERATOR: No, sir. We do have one. Our first question comes from Brian Cutson (sp) from PBS "Frontline."

QUESTIONER: Can you guys hear me?

MR. : Yes, sir.

QUESTIONER: Well, I guess a couple-part questions which I can toss out, since I appear to be the only one. But this is I -- for more Mike Flowers, and how does the -- what resources do you have when it comes to big data? I mean, what kinds of databases are you able to access, and how does what you guys have access to and what you guys are able to do compared to some of the private firms, such as K2?

FLOWERS: Well, I have access, under the executive order that the mayor signed, to any data in New York City, but then the question is how is it being used, because we have to -- that access must be consistent with our federal, state and local legal obligations. So by way of example, I have full access to every ambulance transaction that occurs in New York City, and we get, you know, tens of thousands of them a day. However, those transactions are HIPAA-protected, meaning it's a -- the patient privacy law, and so I only have access to those insofar as I am assisting the EMS division of the fire department in terms of improving the delivery of medical services. But I would say generally speaking that about 90 to 95 percent of what we end up playing with is what we push out to the public.

QUESTIONER: And what do you mean by push out to the public? You mean like what you actually put on -- available online so that people can download like you were talking about earlier?

FLOWERS: Yeah, varying degrees of that. So there's an open data portal where we aggressively push out to the public, some of it on a dynamic refresh, some of it quite static, the information I just described. But others of it -- but, you know -- and I use that primarily for static data, so that's like property tax law, et cetera. More dynamic things I'll tap directly into the back end of the agency systems. What we're trying to do as a city is marry up those two. I built a platform that enabled me to do scrapes of the back ends of about 85 different city systems and put into a central analytics platform. What we're doing now is we're going to put that platform upstream of our open data portal, you know, with obviously some security built into it so that data dynamically can be pushed out to the public in a more curated manner than it currently is.

ROSE: And how does this -- I mean, Mike -- or Ken gave the example earlier of the gas station looking for anomalies that might point towards the -- you know, the idea that there might be a crime going on. I mean, what kinds of databases are -- do you guys find useful there that might point toward more kinds of financial crime?

CUKIER: I mean, and so I guess -- I don't really focus so much on fraud anymore, right? It's more about operational efficiency. So I tend -- and a lot of that has to do early failures, right? So when I first got started, it was the director of the Financial Crime Task Force. The whole point of it was to try and get our arms around mortgage fraud, and we knew a tremendous -- the idea of being, if we could figure out what we knew before the fraud occurred, perhaps we could then, as Ken put it, you know, allocate our -- enforce our resources directionally to say look there instead of over there, right?

What -- the reason that didn't really go very far is that it failed to take in -- and I'm ashamed of this, to a certain degree, as a former prosecutor -- it failed to take into account all the other factors that come into play in the charging determination made by prosecutors' offices. You know, the exercise of prosecutorial discretion is not based solely on the evidence. There are a whole lot of other -- a host of other things that come into play. So we couldn't really get much feedback about how well it was doing in terms of predictiveness.

So then -- but it was because of that exercise, where we pulled everything we knew about our locations, that we just simply changed the cross tab from mortgage fraud to fire, or (grease ?), or building a business or whatever, and it was that same nucleus of data that enabled us to allocate our prophylactic resources in the form of an inspector or, you know, our other civil regulatory resources that we have in terms of expediting business licenses or things like that.

So that -- I mean, it's those kinds of things that we'll use it for. It's more -- I think as Ken put it perfectly, it's directional. So I guess the one thing I kind of want to stress is that this isn't really binary. The old system that I used to use when I was a DA is still in place. You still have people involved who go out and assess the situation, but we only have so many of those people, so we have to send them where they are likeliest to turn up things that we don't -- that we need to prevent as a city or incent as a city, and that's how we're using -- that's really how we're using the data.

FLOWERS: It's funny. I can actually build on that and point to an example that Mike didn't mention but I think it will -- it will jar his memory and he can tell the story, which is to think of it in terms of crime prevention or pointing in the direction, think of where would you want -- we know that pharmacies sometimes dispense drugs illegally. They're prescriptions, but they're sort of -- they might be for amphetamines and they're being dispensed illegally, and he's found a very clever way of knowing where you might be seeing a crime. And it's only probabilistic, but it points in the direction of what you might want to look at closer. Mike, why don't you tell that story?

CUKIER: Yeah, no, that's exactly right. I did forget about that. So, you know, New York City has pretty granular information on its Medicaid reimbursements to pharmacies for prescription drugs, so we were able to look at all 2,100 or so of our pharmacies and identify by reimbursement for which drug and for what type of drug and the purity of it and drill down on the fact that, you know, about 20-some-odd pharmacies out of that 2,150 or so were accounting for north of 75 percent of the highest dosage of oxycontin for Medicaid reimbursement, which, I mean, that just screams out at you. Now that doesn't necessarily mean that you send somebody in to make arrests. At that point, it went back to the human resources administration, which is responsible for Medicaid reimbursements, for them to then audit those 20 pharmacies. And of those 20 pharmacies, it turns out that there was shenanigans with about 18 of them. So it was a very effective way for us to say, this is a place where we should be looking.

ROSE: We've just learned in real time the questioner is about to drink a 32 ounce Mountain Dew.

MR. : (Laughs.)

OPERATOR: Thank you.

Our next question comes from Eamon Murphy from AOL DailyFinance.

QUESTIONER: Hi. Can you hear me?

MR. : Yep.

QUESTIONER: So I was wondering, since one of the speakers was involved in financial crime as a prosecutor, do you anticipate -- you know, there's so many cases in the past few years where people were expecting charges to be brought against, you know, large powerful institutions and no cases ever came. It seems to me that the public might feel not only just uncomfortable about the idea of pre-crime, but what about the fact that you -- that you see so many -- so much lack of prosecution for crimes previously committed and now resources are being shifted towards, you know, looking for patterns among sort of lower-level entities of criminal activity?

FLOWERS: Well, I mean, I would question the premise of the question, right, which is that they're looking for lower-level efforts, right? I mean, the patterns can detect anything. I mean, the same patterns were used to identify the fact that, you know, Lloyds Bank was involved in wire stripping and permitting the Iranians to use our financial system to help build -- to get around sanctions, just as it would be used for somebody, you know, stealing a thousand dollars from someone else. So there's that.

And the other part of it is, you know, I'm not involved in federal, state or local prosecution anymore. What I can say is that those cases are a lot harder to prosecute than a murder, because with murder you have physical evidence and, you know, the statues are fairly clear, the killing -- the unlawful killing of another person with malice and forethought, essentially -- versus with fraud, fraud is essentially, and financial crime generally, is the exploitation of informational asymmetry that is illegal, right? And that last part makes things really interesting because the statutes themselves aren't very clearly written. So I -- you know, in the defense of my former brethren, I would say that they probably -- if I know prosecutors at all, I know people were aggressively looking to bring somebody to account on this, and they just couldn't make the facts fit within the statute.

OPERATOR: Thank you.

Our next question comes from Brian Netson (sp) from PBS "Frontline."

QUESTIONER: Thanks. I guess it was sort of a follow-up along that line. I was wondering if there was any big data -- I mean, it sounds like what you guys do is a little bit more operational than what you were describing earlier, but is there any big data that's helping you guys look back at the causes of the financial crisis to look for possible cases to prosecute? And if so, like, what kind of data is helpful?

FLOWERS: Well, again, those would be broad -- I mean, we're the city of New York. The prosecutors' offices in the -- in New York City are each at the borough level, and so they're separately elected officials. So they're not connected to this office, or not connected to the city of New York except we do fund their operations through a city tax levy. And then, of course, the federal and state actors are their own entities. So I can't -- I can't speak to what they're doing.

What I can tell you is that there's a very interested market in our data. And as I said earlier, financial crime is the exploitation of informational asymmetry for financial gain that is unlawful. The fact that there is this much interest in the data of the city of New York from these various actors tells me that they're probably looking.

QUESTIONER: Well, can you expand on that a little bit? I mean, what kind of interest and in what kind of data?

FLOWERS: Well, I mean, we can check who's downloading data from our open data site. We kind of have to in order to make sure that the system doesn't crash. And there's been a lot of interest in businesses and property transactions and things of that nature, all of which would be reflective of -- I mean, when you steal money, you've got to put it somewhere. And some -- and, you know, a lot of times it ends up moving through the legitimate economy and real estate transactions as much as it does anywhere else. And those transactions may be reflected in the data that's available on our open data site.

QUESTIONER: And you said these are federal- and state-level investigators that are accessing this data?

FLOWERS: I can't say that. What I can tell you is that there's been a lot of interest among the enforcement community generally to leverage this information. But I -- you know, what we do at that point is just give it -- you know, give it to them just like we would give them to the regular public, because, like I said, about 90 to 95 percent of it is publicly available. I still can't turn over a tax return to a prosecuting office that -- you know, absent some very significant, robust hurdles that they have to jump through to get it, because that would be a violation of federal tax privacy law.

OPERATOR: Thank you.

Our next question comes Enga Turner (sp) from Polish Press Agency.

QUESTIONER: Hello. I have this question, actually: What does this all mean, this rise of big data, for personal data protection law? Thank you.

CUKIER: Sure. This is Ken Cukier. I'll take that first and if Mike wants to respond, he can, too.

Well, it's interesting. The personal privacy law, it was basically created a little over 30 years ago when the OECD enshrined its principles. And among those OECD principles that are nonbinding but are basically the bedrock of most Western privacy law is that you should discard the information after its primary purpose has been completed, and that made a lot of sense in a small data world. It made a lot of sense at the time at the beginning of data processing when you could have otherwise kept it.

So, you know, I don't disagree that that was probably the right answer at the right time. But things have changed now. Of course, the processing of information is low, but the storage and collection is very low as well.

And what we're seeing with big data is that the value of data is in its reuse. And you can never know today all of the myriad of ways you can use the data in the future. So it seems almost nonsensical to delete data today. It makes a lot more sense to keep it, store it -- you know, hopefully, you know, ascertain that it's in as good a quality as possible, and then you'll know in the future different ways to do this.

So we're seeing this everywhere we look. So one example of this is Google Flu Trends, in which the search engine looked at their searches over a five-year period and noticed that they could identify a certain number of terms that correlated strongly to where the outbreak of the flu was in the United States. Now, that made perfect sense. They had -- to do that -- they get, of course, about 3 billion search queries a day -- they used the top 50 million search terms as their -- sort of their end, the number that they were processing, and they had to run it through 450 million mathematical models.

But the most important, germane aspect of this is that not only is this beneficial for public health because they can actually spot flu outbreaks in real time, whereas there's a reporting lag from the public health authorities, which in a time of a pandemic would be a problem, but they had to tap data that they had saved from previous searches.

Now, there is some -- there will be -- people will bring up the fact, well, you know, Google Flu Trend didn't perform as well, you know, in the current flu season, a couple months ago, as it had in the past. Well, that's true. We could have a conversation about that. But the point is that they needed to use data that was generated in the past and they had to save that.

And that's just one example. We're going to find thousands of these examples in the next few years, and it's going to be a customary way of operating that we're going to be using past data to inform what we do and to save lives, make society better. And there's going to be dangers with that as well.

But the key thing is this. It's an affront, it actually clashes with the principle of current privacy law because current privacy law has a sort of timorous fear of saving data and reusing it. So I'm not saying that we should -- you know, I think it's interesting to put it forward to say we should save everything, but before we go forward in such a blunt way, I think we need to have a societal debate and ask ourselves the question, does current privacy law restrict this big-data world that we're walking into that we see a lot of gains to be had. So if that's the case, we need to redraw those boundaries.

OPERATOR: Thank you --

FLOWERS: Can I just add one thing to that? So pretty much all of my statisticians are in their early 20s, and, you know, I understand that data is not the plural of anecdote, but they have such a different take on it than -- you know, I'm in my mid-40s now -- than people of my contemporaries have on the privacy issue. I don't even have a Twitter account. I don't have a Facebook account. I haven't signed up for Google Plus or anything like that, and that's because -- I mean for two reasons. A, I don't think it's anybody's business what I'm up to; and B, I don't know why anybody would be interested. But from their standpoint, they totally don't care. When I talk to them about, you know, how they engage in social media and how people would using it to do X or Y, they don't -- they seem to think it's ridiculous to care about it at all.

OPERATOR: Thank you. Our next question comes from Joy Olson from WOLA.

QUESTIONER: Hi. I have a less technical question. I'm with the Washington Office on Latin America, and one of the things I've been trying to think about lately is the issue of kidnappings of migrants in Mexico where extortion payments are being paid in the United States. And it's -- and to figure out how to go about gathering the information that would be useful to prosecuting some of these cases, which are mostly going -- almost entirely going unprosecuted.

And I'm just wondering if from your world of big-data analysis you might have thoughts for me.

FLOWERS: I mean, I can opine about ways they would go about it, without knowing more about the activity, but, you know, I think the Department of Treasury has an intelligence division called the Financial Crimes Enforcement Network that collects things called Suspicious Activity Reports, Currency Transaction Reports and other things. The larger Treasury Department has a similar division engaged in enforcement.

In some ways, I think the issue would be you would -- the way I would go about it is I would find -- I would define outcomes, whatever outcomes you're talking about -- (inaudible) -- you have a kidnapping case where -- or whatever where somebody received funds and walked away, and you know that event, and you need, like, you know, maybe 50 to a hundred of those, and then you deconstruct those to find out if there's a pattern to them that you can then search through the relevant financial data to see where they're putting the money after they do what they do and then target that way. That would -- I mean, that's a -- that would be a global approach without knowing more.

QUESTIONER: OK, that's helpful. I mean, I think what's going on in is that there are actually thousands of these cases every year, and what I was trying to figure out is, you know, in the big data sense, is there an ability to access the kinds of information in terms of trend patterns that would identify locations in Mexico, say, where unusual levels of transfers were being made?

FLOWERS: Perhaps. Perhaps. Again, I just don't know enough about the issue, ma'am, but the -- you know, you would collect as much information as you could about the actors involved and the activity involved and the locations where they're doing what they're doing. So yeah, sure, if they're using wire services or cellphone activity or anything along those lines, that would be -- those would be relevant data points.

QUESTIONER: OK, thank you.

OPERATOR: Thank you. Our next question comes from Will Trouble (sp) from Zurich Insurance Group.

QUESTIONER: Hi. Thanks a lot. The question goes to the general group here and relates to -- more towards disaster preparedness and resiliency. I'm just curious, have you seen big data and analytics movement has accelerated statutorial reforms to drive disaster preparedness and resiliency, you know, and maybe using Storm Sandy as a -- as a recent event, as a frame of reference?

FLOWERS: Absolutely. Absolutely. In many ways, I think Sandy was the most aggressive example of why it is we need to know what we know more effectively. My office, because of the exercises we've been engaged in in terms of collecting all of this information and sorting it by location and business, we're really the only ones who are able to do the translation for the activity in the city.

The city of New York was -- about one-quarter was impacted by Sandy. I think about an eighth took on water and another eighth lost power, and we had businesses in those zones. There's obviously people in those zones. There are structures that were damaged in those zones. And then we had a massive cleanup after it that involved the Department of Sanitation working with the Army Corps of Engineers, et cetera, et cetera, et cetera. We had firemen fighting fires in 50-mile-an-hour winds while there were, you know, dire problems with water pressure.

So all of those things came from different sources that previously, absent the technology that Ken discussed earlier, we really wouldn't have been able to marry up. It's Con Ed who's the major utility, and LIPA is -- which is the utility provider out in the southern part of New York, the Rockaway specifically. They speak languages that didn't readily fit into the city lingua franca. And -- however, we needed to know who didn't have power when the temperature dipped below 35 a few days after the storm and then to figure out where to -- for the whole point being we need to send people out with trucks and blankets and whatnot to assist these people.

So -- and then in terms of resiliency, we are using the exact same approach in terms of how to respond and brace ourselves for the next storm. We have hurricane season in three months, and we are feverishly analyzing what happened and what we did during Sandy to ensure that we can, you know, meet or exceed expectations for the next one.

QUESTIONER: And is this -- is this output or data that you're gathering -- is this all publicly available?

FLOWERS: Much of it is. I wouldn't say all. I don't think -- I mean, I don't think a lot of the utility data is available, although it does raise an interesting question about smart grid technology. The only utility provider in the city of New York that gives us real-time understanding of the usage of that utility at the property level is the one that's owned by the city, which is the Department of Environmental Protection, our water department. Neither Con Ed nor LIPA nor National Grid, which is the gas provider, has that capacity. So even if we get the data in the right way and are able to use it to leverage our resources, it's still going to be stale because it's dependent on people going out to the actual house and seeing if the meter's on.

QUESTIONER: Has it taken form in terms of changing, you know, building codes already and, you know, what kind of requirements that -- you know, to provide power in this area, you are required, you know, to supply X information about those that you're -- your connectivity or, you know, tries -- you know, gets to those issues that you previously didn't have the information for but you know that it's a problem because people didn't have power when it's 35 degrees?

FLOWERS: Right. And so, you know, changes in the building code are under contemplation. A lot of it depends on -- you know, when you talk about a change in the building code, a lot of it means that we have to read it over all five boroughs and all million structures, et cetera, when in reality, it just needs to be reflective of the areas that we believe are within the 500-year flood zone as put forth by FEMA. So, you know, you could either do a building code change, or you could do variances based on location. It would be, I think -- we're not ready to change a building code yet, but what we are doing is changing things that are specific to the locations that we believe would bear the most impact and then make them more, you know, resilient.

In terms of leveraging all this other information, that's exactly what we're doing. We're starting light towers, you know, to be readily deployed in certain areas based on what we know now about outages and how they're likely to occur. Same goes with, you know, heating issues and shelter issues. That -- those places, those pre-deployments, I guess you could call them, are being driven in large part by the data analysis that we were able to do now.

QUESTIONER: Right. And then looking at the scenario to the unique, you know, area, right? So, like, what happened -- you know, a hurricane in New Orleans is different than a storm in New York or different than a tornado in Joplin. Is there --


QUESTIONER: -- and the definition of resiliency is different as well, but, you know, using data to, you know, apply the more unique and -- more unique and purpose-driven for that region across an organization like FEMA, from your perspective in New York, is -- the data's been able to make a case that, you know, New York is different than it is in Jacksonville, Florida?

FLOWERS: I -- yeah. I mean, certainly, and that experience played out. And more to the point, there are about 50,000 New Yorks, right? (Chuckles.) So you have -- you know, to say that -- it just -- by -- just by concentration of building type, for example, Manhattan has an enormous Z axis, so we're -- you know, we have a lot of tall buildings, right? I think anybody who's ever taken a train in here can see that -- whereas if you go out to Queens, there is a large number of one and two family dwellings. Same goes for certain parts of Brooklyn and The Bronx. So -- and then on a community district by community district level, the nuances become even more extreme -- what kind of -- how narrow are the alleys in Gerritsen Beach versus Broadway.

So we ourselves have to take into account an extreme range of variables that in and of itself, you know, drives our conversations with state and federal partners. I mean, there are certain things that happened in New Orleans that make sense in certain parts of New York City that are slightly similar, but New York City itself is so incredibly large and diverse that we -- you know, we can learn as much from all those other places you mentioned in terms of how to respond.

QUESTIONER: Thanks very much. Appreciate -- (inaudible).

FLOWERS: Certainly.

OPERATOR: Thank you. Our next question comes from Garrett Mitchell from The Mitchell Report.

QUESTIONER: Thanks very much. And by the way, this is a question, I think, for Kenneth Cukier and -- whose article in Foreign Affairs I've read and I think is really excellent. I'm particularly interested in this concept of big data being used not for causality-related questions but in looking for correlations. And as you say, big data helps answer what, not why.

So two examples occurred to me when I read that. The first was: So if I'm a cancer researcher, and I've been spending my life trying to figure out why, is it -- is it a reasonable interpretation of what you've written about regarding big data to assume that perhaps the key to solving big challenges like cancer will come from the what related big data source instead of the path of inquiry about why these things happen? And second, is it reasonable to assume that big data, if properly used, may reduce -- and I'm thinking here about large public policy questions like universal health care -- will reduce our experience of unintended consequences?

CUKIER: So both are really good questions. So to the first one, there's a great -- there's a great utility in knowing why about things, because then, when you generalize, you can then address the particular. As you also know, it's useful; you can apply it to other things. The problem, though, is that often, when we think we find the cause and effect -- when we think we have causality, we don't. We've just deluded ourselves, because the world is very, very complex, and we tend to use models that simplify it, and it doesn't actually account for everything.

So that's one reason why so much medical knowledge becomes obsolete every -- say, every 25 years, because we're always learning new things and we're realizing that what we thought was an answer to one thing actually wasn't really the answer, that it was something else -- a third intervening reason for some things happening.

So although there's a usefulness in knowing why, we have put less stock into the what, into just looking for a correlation and going with that, and what we're finding now is -- with big data, that it's going to -- that it's going to be very useful to know what rather than why and just simply go with that because it's good enough. Often, it's going to get us in the direction where we want to go so that we can then apply, say, trials and regressions to isolate different variables, but at the outset, we'll uncover things that we didn't expect beforehand.

So in the book and also in the article, as you mentioned, I talk about the case of premature babies. And as we -- in the past, we didn't collect all of the data that we could possibly collect, or at least as much data as we could. Now we collect, you know, 16 real-time streams of -- in these studies -- it's not implemented yet -- but in the -- in the research, we collect 16 streams of real-time data from the vital signs -- over a thousand data points a second. And by doing that, we can spot the onset of an infection 24 hours before full-blown symptoms appear. We couldn't do that before.

So just like with cancer -- the way we treat cancer today is, someone comes in with a problem often because the cancer has already been formed, and then we diagnose it, and then, by that time, frankly, it's almost too late. I mean, then we -- then we have to have massive interventions. It's chemotherapy, it's surgery or worse, we'd have to sort of put someone into a position where we have to ease their suffering.

That's going to look like bloodletting in two decades' time. It is going to look so preposterous that where -- it's going to be hard to imagine that we never actually looked at data and tapped it in a predictive model, in a preventative way. So you can just simply imagine what it's -- what it might look like in the future. Will -- you know, our earbuds from our iPhones are going to be monitoring our vital signs, right?

We might able to find, through these -- through collecting lots of data that we never could before, what matches the data signature of someone who has a prostate cancer that is just starting to -- excuse me, a prostate gland that is just starting to become cancerous, and we might find it a year in advance or there years in advance when the cells are very small, not when they're the size of a golf ball, right?

And so suddenly, then we can intervene in a -- in a -- in a way that actually reduces costs and increases the quality of life and quality of care. That's going to be the way to do it, and the way that we will have gotten there is going to have been through correlation -- it's going to be through big data rather than causality. It's useful to know the biological mechanisms of why something grows this way and how it's working and how things function.

In some instances, we're going to still want to do that. We might want to do that with the drugs before we put them on the market, although a skeptic would say, well, actually, all of what we think is causation really is correlation, we just use a fancier term for it because we're really, really -- we really, really believe this correlation, but it's not -- but it's -- but the world is just simply a world of correlations and not causation.

Nevertheless, that's an academic argument. What we'll find is that for lots of things, it's going to -- just knowing what, not why, is going to be good enough. So I don't think the cancer researchers should -- you know, should be in a puddle of tears that the person has been going through and looking for causation, but I think that correlation adds to our toolkit.

The second point that you raised is also a good one, this idea of, will big data reduce this world of unintended consequences? Probably not. All right, the whole point of unintended consequences is that they were unintended. I think we're going to have -- in terms of public policy, we will evidence-based policymaking in more domains, but the same problems that we've had in a small data world of unintended consequences will still exist in the big data world. Big data is going to be great because it's going to solve so many of our -- it's going to help us solve so many of our problems, but it's not a silver bullet, and it has limitations, too.

FLOWERS: Could I just echo that? I think that's such a good point, which is absolutely not is it going to solve everything. So the fact that we're able to be more efficient in terms of our housing enforcement, for example, that means people can't -- that's one less fewer residence that can be occupied that's adding to the housing crunch that is, you know, chronic in New York City. So simply because we're able to target our inspectors to the worst places and make them more effective in terms of removing people from unsafe conditions, that's actually creating other problems in terms of, OK, now we've got to find places for people to live.

QUESTIONER: Great, thanks.

OPERATOR: Thank you. Our next question comes from Eamon Murphy from AOL DailyFinance.

QUESTIONER: Hi. I'm wondering which companies are involved in the efforts that you gentlemen are familiar with.

CUKIER: Gee, there's a lot. I mean, it's strange. I don't want to sort of, you know, mention different startups. I think there's -- you know, it's so vast. I would -- maybe let me look at it large and small.

First, the pioneers of big data are Google and Amazon. They're certainly ahead of most other entities because they have so much data, and it's really their business to learn from data. Amazon interestingly has been focused amazingly inwardly at retailing in the broadest possible way of thinking about it, and logistics, right -- so, I mean, everything around the penumbra of retailing. But they've not been say -- although they could have had the Ngram Viewer which with you could digitize all books or datify, in fact, all books and then have people play around with it -- there was a special tool, like Google Books does. But Amazon doesn't, right? They use it just simply to sell books. They're not doing it as a sort of -- for the cultural patrimony of mankind to study and understand itself through its language, which Google is doing. So Google datifies books by scanning them, digitizing them and then putting them -- thinking of them as data and it makes it available as a tool.

So likewise, you know, it's in the -- it's in the burning interest of Detroit and Stuttgart to think about the future of transportation, yet it was Google that came out with, you know, the self-driving car that captured everyone's imagination because it went the farthest and it seems to have the best scalability.

So Google -- and so Google, unlike Amazon, has been thinking about the uses of data all throughout society, not just in one single domain, as they search for mobile phone operating systems.

But then when you go -- when you look beyond those two companies, you see that lots of companies are transforming themselves around data. So GE is the best example, right? They are -- they are making an incredible company-wide bet on not only having a big data operation at the very heart of what they do in all of their businesses as a diversified conglomerate, but they're going to then be selling big data services to other companies who want to do that. They're sort of a pioneer in that -- in that from an offline, old-world company way.

IBM has -- is betting its company on big data as well, through the Smarter Planet initiative, and then you can see lots of -- well, of course all the technology companies are piling in.

But beyond that, what we can see is that you're going to have large companies, small boutique companies who are analytics companies -- Splunk might be an example -- all are going to be piling into this domain because it looks a lot like computing in the 1960s, which is to say, it's not that it's going to go into helping us do air traffic control better like the SAGE system and then eventually Sabre, which did air traffic reservations, right, big industrial computing projects of the '60s, instead it's going to touch all domains. Everything is going to be touched by this. So in 30 years, it's not going to say that big data worked here. It worked everywhere. We implemented it like we implement computers throughout society and improve practices and do that.

ROSE: "Manana." With that, I'd like to wrap this up. Thank you all very much for participating. I was following avidly until I heard Ken say something about evidence-based policymaking, and now I realize that it's just all fantasy land. (Laughter.) But we look forward to having future conversations and to following the progress of this and other issues in the pages of Foreign Affairs. Thank you all very much.






More on This Topic

Foreign Affairs Article

The Rise of Big Data

Authors: Kenneth Neil Cukier and Viktor Mayer-Schoenberger

Everyone knows that the Internet has changed how businesses operate, governments function, and people live. But a new, less visible...