Kalev Leetaru, 2013–2014 Yahoo! fellow in residence at Georgetown University, joins Media Executive Vivian Schiller to discuss big data and its uses in policymaking, education, and beyond. Using current geopolitical examples, Leetaru explains how big data can show hidden patterns that can be used to forecast, and to shape future strategies.
This meeting is part of the Council on Foreign Relations' Voices of the Next Generation series, which seeks to bring CFR members together with fresh, young voices in the nation's foreign policy discourse.
SCHILLER: Ok—welcome everybody, welcome to the Council on Foreign Relations. Our meeting today is "Voices of the Next Generation with Kalev Leetaru", we're going to be talking on big data. Kalev was most recently the—I forgot to introduce myself, my name is Vivian Schiller, by the way. Kalev was most recently the Yahoo! Fellow in Residence, in—I have to read this because this is a very long title, Yahoo fellow in residence in international values, communications technology, and the global internet of the Edmund A. Walsh School of Foreign Service Georgetown University, that is not a title that easily fits on a business card, but Kalev's work is equally complex, it is in and around examining the world of big data and applying high performace computing to better understand global challenges. Most of his work runs through a project he founded called GDELT—we're going to be hearing a lot more about that today, which is the largest event database in existence. The GDELT project monitors open source intelligence, including the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages, and identifies the people, locations, organizations, themes, sources, and events driving our global society every second of every day, creating a free, open platform for computing on the entire world—not too small an ambition there. Kalev holds three U.S. patents and has been an honoree and invited speaker, panelist, discussant at just about an incredibly long list of prestigious institutions and all of the ones you can imagine including the one in which we sit today—but my very favorite fun fact about Kalev is he founded his first internet company before entering high school, so—which I'm also going to ask you about before we turn it over to audience questions. Anyway, welcome.
LEETARU: Oh thank you, this is an honor to be here today.
SCHILLER: You will discover—as I just did chatting with him earlier—that this man has about more energy and speaks faster than just about anybody I know, so I may come down going wait, slow down.
LEETARU: This is me low energy today.
SCHILLER: If this is him low energy, I can't even imagine him high energy. Alright, well I want to start by defining terms—and I'm going to assume that we have a range of expertise about data and computing in the room and I'm going to, with the apologies to the experts in the room—I'm really going to aim this towards the slightly less informed about big data and computing than the more informed amoungst you can ask questions when we get to Q&A shortly. But let's just start by defining terms. So big data has come to mean a range of different things, when it comes to the work you were going at Georgetown and with GDELT around international relations, what kind of data are you looking at? And what kind of information can it reveal about geopolitical events?
LEETARU: Yeah, you know, that's the interesting part. You think about, you know, the term Big Data has become—it's become sort of the buzzword that everyone's using. You know, for a long time, everyone was a Web 2.0 person. And, you know, and for a while there, everyone, you know, was a social media marketer.
You know, the term Big Data has become so popular that you're seeing it tossed around for almost any purpose. And I really draw a distinction between—you know, you think about every company out there, you know, every Fortune company, probably has tons of data it has to manage. To me, though, that's Big Data storage. That's not really Big Data in my world. Again, it's a very complex task.
But if you're just thinking about, like, all the backups that your company's maintaining, you know, yes, there's a lot of complexity to that, but to me, Big Data is really the—sort of the potential for what it can give us, in terms of allowing—when you have all this data, you think about today, you know, people from Bangladesh to Buenos Aires, you know, they're basically telling one another and the world how they feel, how they think, what's happening around them.
You know, when you have this kind of data on society today, to me, Big Data is really allowing that data to tell us things that we've never been able to see before. And you think about, you know, today as—you know, sort of society as a whole, if you think about social media, for example, again, social media, obviously, has not spread to the world. I was at a panel recently where someone was talking about how, you know, conflict is live tweeted today.
I said, well, you know, in the Congo or Sri Lanka or in the case of, for example, the origins of the Ebola outbreak in the forest regions of Guinea, not everyone's walking around tweeting, but yet we still have so much data that's emerging this, whether it's CDR records from cellphones in these areas, whether it's satellites, you know, up above. The volume of data that we have today and the way it can show us things that we never dreamed of seeing, because we've never been able to observe that before.
You know, in terms of how this—what this can reveal about geopolitical, you know, I think that's the interesting, the fascinating sort of untapped territory. You know, so much of the work today has really been about—you know, it's about advertising, selling ads. You know, you think about how much of the Big Data revolution has really been on medicine. Medicine has definitely had a huge investment, things like genomics. But so much of the work has really been about, how do we sort of, you know, commoditize people to sell them ads?
You think about from a standpoint—what's so interesting about that is, the notion of, you know, this person here and, you know, whether they're going to want a Pepsi in an hour or whether they're going to want this, there's so much that goes into sort of thinking about how people—how we use all this data to understand people that has tremendous phenomenal applications to, you know, what are the people in North Korea right now? You know, are they going to reach that breaking point of, you know, standing up against their government?
What about, you know, another country? What about, you know, let's take another example here, Ukraine. You know, are we going to have more unrest there? The ability to really look at data—and I think a great example of this, data—you know, data—I kind of level—I kind of liken to like a devil's advocate.
You know, I did a piece for Foreign Policy magazine earlier this year, right before the president fled in Ukraine. It was the day before, actually. And it was basically a map of protests and unrest across the country. And this map got uniformly negative reception here in Washington. Very senior people reacted to it and said, well, you know, you've got Crimea. You know, Crimea, you're showing Crimea getting ready to pull apart and you're showing eastern Ukraine really coming apart. But, you know, he just signed a peace deal with the protesters and they're already leaving. You know, Ukraine's at peace.
And this is I think is where—you know, data is never going to say at 5:05 next Friday, you know, people are going to stand up. I think where data—though, what the potential it offers is the ability to scoop up all this material and show us something that we're not expecting. In the case of Ukraine, you could see little small townships and villages, for example, on the vicinity of Ukraine—vicinity of Kiev, you could see something, for example, like a broadcast saying, by the way, you know, local—the local police force is on buses up to reinforce in Kiev, so we're going to have a reduction of police service today.
In the case of—I'm trying to think of other good examples—you know, this, I think, is a really powerful, potent tool, is the ability to sort of show us those macro-level patterns or—actually, this is the example I was thinking of. When Egypt started coming apart again last year, so a map that I produced that was widely circulated here in town was basically a live map pulling together all this media about Egypt, so anything that was being said about Egypt at this moment. And, again, this is based on all open source information, so this is all public information.
But the ability to create a single map and say, you know, here's what all the media is reporting is happening moment by moment, so you could drill down and say, wow, you know, there's something happening down here. You could draw on, and then you could say, well, you know, I've got someone there who's—and this isn't—this isn't validating what they're seeing, so that's by itself very useful.
Or in the case, for example, in Russia, to be able to see that—what was it, I think two weeks after the jetliner went down, you were still seeing some, not major Russian press, but you were still seeing some Russian press that was reporting it's a botched assassination attempt by the Ukrainian military, you know, trying to shoot down Putin's plane.
And that to me, I think, is a very powerful ability of data to both show you the things that you're missing, but also it's high level—how are people perceiving and understanding the world? Because, again, people don't react based on facts. People base—they react based on their interpretation, understanding of the world. It's the ability to also reach into the media and social media and understand emotional undercurrents. And this is a huge emerging area of my work, is thinking of the news media not as sort of a factual chronology of the world, because, again, there is no such thing as truth. There's always versions of the truth.
But what's so powerful is when we start looking, say, at emotion, we move beyond positive and negative, we start looking at anxiety. For example, my last Foreign Policy column out this past weekend was to look at the Ebola outbreak in domestic American media. And what you see there is this fascinating transition of actually the media has actually become less negative about the outbreak. And this is kind of interesting. You know, you hear all these claims that, oh, well, the media is causing all this. It's actually social media where things are going wild. Mainstream media has been very, very level.
But you see things like anxiety or panic. You see these emotions. You see them actually—you don't see much of them until all of a sudden, you know, the nurses get infected. And so all of a sudden you see a burst, but then you see kind of this calmness, especially when they get shipped out to Emory and the other hospitals, you see calmness return to the calm. People are still concerned, but they're not in abject panic.
And then I think it's a powerful way to really look at these dimensions. You think about communication, you know, things like what the CDC needs to do from here from a policy standpoint, you know, it's very—A, I think this has really taught them that you can't just do a press conference, you know, and have all the media line up and do a nice press conference on CNN. You have to engage in social, and they've learned that lesson.
But you also have to be able—you can't just put out a tweet periodically, you know, reassuring message. You have to be able to be reactive. This world is moving so fast now—so the ability to look inside of all that data and be able to see, for example, when people are just concerned, you know how to react to that.
But if all of a sudden people are panicking, you want to react to that in a very different way. You want to react faster, and you want to have a different type of message that you wouldn't otherwise have. And so I think this is—this is sort of the untapped potential is the ability to—you know, you're thinking about today, we have all this data. I think really, you know, if I kind of, you know, think about it, the real challenge today is—is how to make this data sort of speak in a policy perspective.
SCHILLER: Right. Well, there was—there were so many different threads in what you just talked about, I could go at about twelve different angles. I want to follow up on a couple of things. You talked about what—you know, what is sort of—you know, what—you didn't use the word sentiment analysis, but you talked about sentiment analysis, which is just notoriously very difficult. It's hard for machines to look at language, whether it's in a tweet or newsprint, and understand what the intent is because people use sarcasm, they use all kinds of things. How do you—how are you able to overcome that?
LEETARU: So this is the fascinating thing about things like sarcasm and humor. So humor, of course, we all know that a joke that's funny to one person might not be funny to another. Sarcasm is very interesting. So a project that I was involved in several years ago, we had a bunch of PhD students. We sat them down. We had them read New York Times presidential editorials about the presidential campaigns over—since the '40s to present.
I forget—was it the '50s or '60s? One of those periods, there was—there was a candidate who was known to be a terrible public speaker. And so all the New York Times editorials at the time, of course, lampooned this and said, you know, once again, he wowed his audience. He rivals Shakespeare for his command of the English language.
So, of course, you know, our students coded those as uniformly positive, because sarcasm basically means you're making a false statement that you know that the other person knows is false. And this becomes one of my favorite examples actually of—from all the Twitter discussion on Hurricane Katrina. There was a tweet that came out of Ohio where the person said, "Come on, guys, it's really not that bad." And this is all you have, and there's no tweets before or after that contextualize this.
So you have a tweet, and it's unclear, you know, is this person, you know, just being sarcastic? Or are they really, you know, sort of making a statement there, because it wasn't bad where they are? And these are these fascinating questions. I mean, of course, we've all had—you know, we've all sent an e-mail where someone on the other side reacted poorly to it, because they can't see that emotion that comes with that.
But one of the powerful things about sentiment is also, you know, you talk to psychologists, for example. And they don't talk about positive or negative, you know? They talk about this whole range of psychological sort of states that we undergo. And also I think one of the interesting parts about this, there actually are ways actually to have machines understand sarcasm.
In particular, one of the ways that's emerging is you, for example, take—take a statement, like a tweet, take that and run that against Wikipedia and then see, is this—you know, does whatever Wikipedia—now, again, Wikipedia isn't necessarily the best source of information, but it's a source. And you could say, well, this statement that was made is, you know, false according to Wikipedia.
And so then potentially you can look at some of other sort of emotional pieces to go with that, and you can say, well, you know, likely this person was trying to be sarcastic. And this, I think, is—but what's I think the more challenging problem is the social cultural issue. And so all these companies are doing, for example, social media analysis. Majority of them are coming from a Western perspective, so I can't tell you how many companies I've seen that are examining Arabic, looking at hashtags. Well, for those of you that know sort of that world, here's—and here's a good example.
So, you know, most Americans, if they saw a cartoon today that had Prophet Mohammed and it was—he was being lampooned, he was doing something bad, most of us, you know, here in the U.S.—I mean, obviously, all of you are much more aware of these things—but you think about a typical American probably recognizes that that's going to be problematic. If you say, there's a cartoon today lampooning Prophet Mohammed, it was attributed to an American cartoonist, oh, and Secretary Kerry's landing there tomorrow, you already know there's going to be a problem.
But if you say, for example, oh, here's a cartoon in this paper, it's a picture of a dog next to a cat, and there's a star next to it, you know, you see these—all these very, very interesting symbolisms, and the problem is, you need social culture expertise to understand what that means. And, of course, the rise of imagery now.
In Jakarta, for example, it's less likely that someone will tweet "It's a beautiful day." They'll take a photograph of the sky, of a beautiful sky. This is an interesting challenge, because we don't have the machine tools today to really understand imagery, and it's interesting, because even from a human perspective, even in academia and other places, we lack a lot of the tools—we lack a lot of sort of structured tools available to be able to look at, for example, an image and say, what would this connote?
You know, what would this—because the classic example I always give with emotion, emotion, any word has different connotations depending on who it is. If you were a kid and you almost drowned in the swimming pool as a kid, you know, the word "water" or the word "ocean" will invoke terrible fear in you. If you grew up on boats and boating with your parents, the word "ocean" will connote wonderful positiveness to you.
So the goal is kind of emotion, and there are ways you can actually get into that on a person-by-person basis. But I think one of the greatest powers in emotion is being able to look at sort of the higher level, at scale, and across so many pieces, you can start bringing things in.
So I did another column for Foreign Policy magazine, where what I looked at was the emotion of the world's media towards Assad. And this is quite fascinating. If you look at all the world's news media coverage of Assad, and you look for periods where it sort of plunged—the whole world's just darkening fast about him—these are the periods when he launches large attacks, the chemical weapons attack, the barrel bombs attack.
What was so interesting, though, is in the beginning, you were seeing the world really, really darken on him. You saw him collapse. You saw the chemical weapons attack. And then you saw kind of this pause, and the whole world kind of paused, and you saw all these editorials around the world saying, you know, his palaces are going to start vaporizing in hours.
And then, at the 72-hour mark, when nothing happened, it became clear there's going to be no reaction to him, one of the dimensions that I use is a dimension of tone that takes into account military invulnerability. So basically sort of this perception—because you think about, a leader who's killing people right and left, that's a terrible person, but the—you know, they functionally—they have sort of an air of invulnerability that they can do this at will, so they have power. Even though they're a bad person, they have absolute power.
And you see that particular emotion just skyrocket across the world, and you see kind of this pullback, it skyrockets up. And about this point is when you see the U.N. issue their blanket order to all U.N. news agencies to halt all negative coverage of him, because, you know, he's here to stay. And this is a very fascinating element. And it turns out, if you run this for other leaders, when you see these collapses and you see sort of these freefalls—I call it sort of the freefall of death—this is when leaders enter a certain trajectory, without external intervention, this is usually when they're forced to exit. And oftentimes you need, you know, sort of credible opposition to rise at the same time.
These were all these fascinating things that, you know, companies—you take Nike shoes and, you know, take any brand you're familiar with, they're all doing sentiment analysis today, trying to look at, you know, how sales, and they're correlating it—but they have something to correlate with, because the pretty graph of tone—you know, tone's going up and down, up and down, up and down, but what does that mean? OK, people are upset. What does that mean?
And, again, Assad isn't using—isn't, you know, doing things like this and using this opinion poll, but he's reacting again to his sort of internalization of this, because, again, the media—they're acting as sort of a proxy of all these different beliefs and views, and also seeing how that's varying, seeing, you know, one of my—I think one of the most—one of the greatest pieces of insight from the emotional data was the recent Israeli-Hamas conflict again. As that conflict began, in all the previous conflicts, you see the usual sort of chorus of support for Hamas. It's sort of condemnation of Israel and what they were doing. But this time, you basically heard crickets. You didn't see it. In the first twenty-four hours, you saw none of the traditional support.
Now again, you know, this to me is a very powerful indicator that, at the very least, you know, that may not tell you, hey, this is what's going to happen at the end, but you can say something fundamentally has changed. And that to me, I think, is a great piece. It's kind of—it essentially is telling you, things are different that they used to be.
Or Tunisia, you say, well, when the first fruit vendor set himself on fire that began the Arab Spring, you know, why was that not—you know, why was that not a catalytic point where everyone said, oh my God, things are going to fall apart? Well, that's because it happened twice before—yeah, I believe twice before you'd had similar circumstances.
What you had to know was that sort of public perception had really darkened that period. And so you're essentially adding a catalyst to a situation where, again, the data—I don't think we're going to be at a point anytime soon where we could say 5:05 next Friday the protests are going to begin, but I think that it's sort of like a weather forecast. You know, the rain clouds—it's dark up, or right—right the right moment, you know, things could happen.
SCHILLER: So I want to come back to something else that you've said in your—in the first piece. All of this is—you evoke North Korea—I don't remember what the context was—so how can you apply the work that you're doing with GDELT against societies where there's—you know, closed societies where, you know, there is no Twitter, there is no open media? So how can you—how can you extrapolate from your data information about those societies?
LEETARU: These are the interesting questions. So a dear friend and colleague of mine, Tony Olcott, who some of you may know—he's retired now—he has a wonderful book that really goes—he focused primarily on open source intelligence. And he talks a lot about, you know, how do we sort of triangulate into societies? And, you know, it's interesting, I sort of look at it as there's two different—there's sort of two different dimensions, one, there's countries like Kiribati, where there's just not much media. On Kiribati, there just isn't much media to analyze there.
So in terms of our awareness of Kiribati, it's not that Kiribati is a nation trying to evade people understanding. There just isn't much media there. Countries like North Korea are more difficult, but this is—this is where the media becomes so powerful. The media is a fabric. So you think about it, North Korean media may not tell us much, but it's interesting, because North Korean media is very scripted. So it's sort of like, you know, Russian media used to be, more stoic to some degree today, where it gives you an interesting—you know, it's sort of the Kremlinology almost, you know, sort of the notion of, well, what the government is putting out there.
So you saw, for example, when the South China Post had that very interesting—you know, there's been some of these stories about North Korea. These are giving you very interesting sort of insights into sort of the political dynamics between the two countries.
But if you look, for example, at China and Russia and South Korea and some of the neighboring countries, they're all reporting. You know, information is constantly getting out there. And if you begin triangulating across all these, you start getting a better picture of this. It's sort of like—if you take any repressive regime, you know, and—and let's say there's a big labor protest, and a bunch of people get shot, you know, the government might say, well, nothing happened here. You know, Amnesty International might say, you know, millions slaughtered in cold blood, but what you'll see is in between all that, enough information propagates, and that's the beautiful part about our connected world today, is that it's increasingly difficult for countries that have total solidification, total control.
And, you know, you roll back the clock to the rise of satellite television. You saw, you know, of course, Russia reacting to it as a sovereignty violation. You know, for the first time, you can force information to a country and it has no way of cutting that off at the border.
And, of course, you know, today, those are sort of quaint notions, but I think the flow of information—we can't get a perfect view, but in North Korea, we don't know, of course, where the leader is at any moment, and health issues are things that they closely guard. But in terms of the day-to-day flow of the nation and where different ministers are, what they're talking about, those actually all come out, and we're able to actually map a lot of that in real-time.
SCHILLER: Yeah. So, you know, with Big Data, one of the things that you—sort of gets associated or appended to the phrase Big Data is concerns about privacy. Now, I know you've been very clear that you're looking at sort of open-source material. But how do you—there's still, I would say, you know, concern about even the idea that analyzing even open-source—open-source material can lead to inadvertent invasions of privacy. So how do you address that?
LEETARU: Yeah, and there's actually two pieces. I think there's the privacy, and there's also the over-hype. So Big Data has kind of gotten this bad rep, especially in this town, I think, first off, for hype, because, you know, there's so much, you know, people—especially all the contractors in town, you know, they'll go and say, hey, look, you know, we can—you know, for $100 million, we'll give you everything happening across the planet. And, of course, this never works out.
So I think there's a lot of hype that's come with it that's kind of given it sort of a backlash. It's sort of like machine translation in 1950s, you know, the government spent freely, all these companies coming out saying, look, for just, you know, a couple of tens of millions of dollars, we'll have a perfect translation system. And so, you know, you had that. And I think we're at that point.
But the privacy piece, I actually think open-source actually has a phenomenal ability to help us with privacy. You know, you look at all the Snowden, you know, and the post-Snowden world and all the revelations about monitoring, I think there are very few people at least in—I think there are very few people in the West that would say, you know, tapping Bin Laden's telephone is a bad idea and that's a privacy violation. I think most people agree, that's legitimate, because that's a criminal investigation.
I think where people would draw the line is tapping everyone's phone just in case some data might do something bad. And what—I think what's so interesting about this is, is you look at all these situations, even the lone wolf situations, you know, I mean, you look at the very sad school shooting recently in Washington and that fact that, you know, this person was putting out all this information allegedly ahead of time, providing basically strong foreshadowing of what was going to take place there.
I think, you know, social is—I think if you—I mean, open media I think gives us enormous possibilities. And I think it allows us to sidestep a lot of them. I mean, obviously, there's always privacy implications, but I think it allows us to sidestep that. It's more difficult.
You think about—if you can tap someone's phone and hear what they're plotting, yeah, that's an easy case, but I think that opens a lot of Pandora's boxes about, sort of, privacy. If you are looking purely at what people are putting out there, I think—I think that we're able to—we're able to get a line.
When you think about, you know, what is the point? Like, if you look back to the foundations of open-source intelligence, all the way back in World War II, you know, what was the purpose of open-source intelligence? And really, the purpose of open-source intelligence was to do exactly that. If you had a particular person that you thought was a bad person, you went through a traditional criminal process.
You wanted to know, what was the population thinking? What are they feeling? This is where open-source came from. And you look at, you know, Admiral Studeman, you know, claimed in—was it '92—that 80 percent to 90 percent of actionable intelligence that was actually used to say the—the Soviet Union came from open sources. And, again, whether that's true or not, again, he had a vested interest in open source.
But I think that there's phenomenal possibilities that allow us to understand societies at a macro level and understand sort of criminal elements at a macro level that allows us to at least avoid a lot of the things that have been rustling. I think, again, any piece of information you put out there—and the famous example, if you tweet and said, I had, you know, dinner with Uncle John, you know, at, you know, at Something Bistro, you know, under my house, in one fell swoop, you've given your mother's maiden name, your birthday, and everything else.
LEETARU: But I think—I think—I think as a whole, there's a lot of...
SCHILLER: It's true.
LEETARU: ... possibilities there.
SCHILLER: Yeah, yeah. I want to begin—I'm going to ask one last question, and then I want to throw it open to your questions, so I would ask you to just be prepared and think about what those are, and we'll have—we have mikes that will be going around, right? But I have to ask you before I turn it over to the audience, so what was your middle school start-up? And I have to ask, too, is there any hope for those of us that, like, weren't expert coders before we hit puberty?
LEETARU: So—so I actually—I founded my first company in eighth grade, the year after Mosaic came out, so I've been doing web for twenty years now, almost since the dawn of the modern web itself. So I've kind of a unique perspective on how the web has kind of reshaped, you know, society.
You know, I think this is one of the most exciting things about where we are in society today. You know, when I first started, I mean, even accepting credit cards, you know, you had to have the—I mean, I don't know how many of you have run a small business, but, you know, way back in the—you know, in the '90s, you know, you had this big—which is funny to say that. You think about where technology's come from. You know, we didn't have PayPal. We didn't have these things. So to get a credit card, you know, someone would—you know, you'd get that credit card information, you had this turmoil from the bank that had a phone line that went to it. Half the times, you'd swipe it, it'd come back and say there was an error and the bank couldn't process it, try it again in an hour.
You know, and so technology—I think the greatest part today is really sort of the cloud computing, the era of cloud computing, and, of course, businesses really face this. And, you know, twenty years ago, if you wanted a map, if you wanted sort of a map, say, of sales by territory, you sent that off to your business intelligence unit and they went off and then they sent you back a nice map. But the problem is, it was great. It was easy. But it could take days or weeks, depending on, you know, where they were in the cycle.
Today, you know, with the rise of tools like Tableau and all these other tools that have made the business, anyone within business can right click and have access to all that data. And it simplifies it. They may be touching, you know, petabytes of data, where you think—my favorite example is the Google search. You know, every—I like to—I usually open my Big Data talks by saying, you know, yesterday, I conducted over 200 searches on a 100 petabyte data set involving 200 variables. And, of course, the audience goes, "Oh my God." And I say, yeah, I ran 200 Google searches, because, you know, the Google index is about 100 petabytes that sit behind that, which is a lot of data.
And this, I think, is the interesting part, is that it's become transparent. You don't think about that anymore. You know, when you do a Google search or you graph something or you map something, the tools are there where you don't—coding is still—if you want to be at the bleeding edge, you still need to understand coding. But there are so many tools that are out there today that let you, you know, sort of, you know, drag and drop a spreadsheet. There are tools today that if you want to make a map of something, you can drag a spreadsheet into it, and it figures out what the geographic location on there. It does all the necessary stuff and just gives you a map.
And I think that's kind of, I think, if you want to really push the boundaries, I think you still obviously need that, but I think...
SCHILLER: So I appreciate that there's hope for us, but what was the start-up?
LEETARU: So my start-up actually was a web company, Gamacles Software, and basically what we did was we actually made a web offering software. So this was back in the day when you still had to write webpages by hand, actually handwrite code. And long story short, basically, my parents bought me a robot for Christmas in eighth grade, and it was basically this little robot that goes around. But you could program it.
And I didn't like the program that—the way you program it, so I built my own tool. And my father, he's a geologist. But at the time, he was one of the early ones at his—at the Geologic Survey in Illinois that was making webpages. He said, you know what, what you've built looks like a lot like webpage programming, so we started putting it out there, and everyone was downloading it, so we became a company, sold that off.
I'm one of those sad dotcom crash stories. We had a company—we had a buyer lined up. I could have been retired. But this is right as the dotcom crash was at full acceleration, so I'm one of those sob stories where everything was done, you know, except the final sign by the venture capitalist, and the buyer, and so, sadly, I was not one of those people.
But, again, founded my second company, went from there, went to the supercomputing center where Mosaic was created, started working there as a high schooler, actually, then, and finally, you know, went off from there. So—yeah.
SCHILLER: And now as a very old man, here you are. OK, we're opening up for Q&A. Please wait for the mike. And when you are getting ready to ask your question, please say your name and your affiliation. And we ask also that you keep your questions very concise.
I saw the woman in the green had her hand up first, so we're going to go there, but we need...
QUESTION: Hi, I'm (OFF-MIKE)
SCHILLER: Wait, mike, wait. No? No mike? OK, just yell.
QUESTION: (OFF-MIKE) can you hear me?
QUESTION: OK, I'm (OFF-MIKE) from Google. I was wondering if you could talk a little bit about education and (OFF-MIKE) also, how would you advise policymakers? The White House had its report on Big Data (OFF-MIKE) know about and how—what would you recommend to a policymaker?
LEETARU: Yeah, you know, it's an interesting question. You know, the policy piece, I think, I actually don't think that the U.S. is doing a good job in terms of understanding. In fact, I have Estonian heritage, as you can probably tell from my name. You know, Estonia really—you think about innovation and how we can use data for governance. I mean, Estonia really is the pioneer in that.
And, you know, and, really, I think the philosophy of how we think about data—I think here in the U.S., it's sort of the way—the way American policymakers have really treated it has really been sort of—it's the hot new thing, so let's put out a few paper reports and do some things. There hasn't been—there's been investment. I mean, obviously, there's been the Big Data initiative.
But there hasn't been the same investment you've seen in other countries. You know, Japan about—what was it, five or so—or however many years ago, you know, they started fundamentally making investment in Big Data and saying, you know, this is the future. I haven't—you know, here in, you know, the National Science Foundation, here in the U.S., when they buy supercomputers, they're still focusing on the old world of sort of the hard sciences. Lots of CPUs, no disks, no memory, you know, it's not a data-driven world.
So I think there's phenomenal possibility. And I think there are other governments, especially, that are doing a much, much better job of thinking both how do we use it domestically to improve society, but then also how do we use it in ways to encourage business, in particular, entrepreneurship?
In terms of education, I think there's phenomenal possibilities for it, especially the ability for—you think about a—you know, a child, for example, to be able to build a tool, you know, using some of these things. So there was a project I was helping with a while ago last fall, where there was basically—I think it was a thirteen-year-old that was basically trying to build a bullying tool, basically to monitor tweets on Twitter that discuss bullying.
And this is fascinating to me, that basically, you know, back where I was, you know, way back in the day, like all the stuff I had to do to build that. Today, it's sort of point and click. You can pull things together.
And in terms of also, I think, the ability for kids—so there's a project called Fantasy Geopolitics that makes use of my data. And this is—and it's a social sciences teacher, basically, who was trying to get his kids—you know, you sit down with the social sciences—you know, I think he's middle school or high school—you say, what's happening in Ukraine? And, of course, the kids yawn. You know, who cares about what's happening in Ukraine?
So he made it into a game, basically, where it's essentially like fantasy football, but for geopolitics, and those kids basically leveraging, you know—so my project, which is supported by Google Ideas, the ability of the process, you know, the world's material and all of the stuff happening around the world, putting it in the format that someone like him then can use it to really get kids engaged and interested in it.
And so I think there's phenomenal—and I think that the best part of—about Big Data is really the ability—whether we're looking at, you know, whether, whether we're looking at what's happening around the world, in terms of global events, the ability to simplify that and put that in terms that people can interact with and you don't need to know C code or write assembler code or know all this fancy stuff.
There's an incredible amount—and so the amount that you can do without knowing programming, you know, if you think about here, this top part's what you still need the programming part for, but, you know, the vast, vast majority of it you could do without really programming today.
SCHILLER: Yes, right here. Yeah. Got you next.
QUESTION: Mike Mosettig, PBS Online NewsHour. Speaking from a more traditional CFR demographic and experience, is the arms race analogy appropriate here, you know, U.S. versus Soviet missiles, one makes an advance, the other catches up quickly? And then specifically apply that to the situation in Hong Kong, where the demonstrators are trying to do all sorts of technological things and the government on the mainland is trying just as hard to make sure they don't work.
LEETARU: Yeah, you know, this is the fascinating part. You know, what was particularly interesting to me is, is you think about how social media in particular has reshaped the environment. You know, I—if you think about the example of ISIL or ISIS, you know, they are probably the most adept users of social media in the world today. They beat every government, every corporation in the way that they're using this as a tool and the impact it is having on their organization.
Now, again, they're using it for a very, very, you know, bad purpose, but the—the way in which they have beaten every government—I mean, there's no government out there that has—that is—I mean, when you think about it here, for example, you know, we look at how many retweets an ambassador's tweets get. And that's considered state-of-the-art. You know, we hold up a sign—you know, we kind of leverage some meme and we hold up a sign and say, hey, look, you know, we use social media. Hey, look, we use Twitter for this.
It's sort of a knee-jerk of everyone's using Twitter, so we'd better put out a tweet and pretend that, you know—and it still boggles me that, you know, you talk to ambassadors here, and many that are currently serving, and, you know, it's still considered a big deal that, you know, someone is tweeting.
And to me, it's kind of—you know, you talk to some of these other countries, and, you know, like every—you know, the president of Estonia tweets, you know? And in a lot of countries, you know, everyone says—but, you know, he actually tweets. This isn't while some press secretary is, you know, typing something in for him.
And I think—I think there's tremendous possibility. The example of Hong Kong, I think, is an interesting one. You know, you have a lot of governments, like Iran, for example, that's been very adept—Iran is also an interesting example where they don't block a lot of material. They simply—well, they do some, but they're very more adept in terms of throttling it down. So information—Internet access that would have a strong impact on business is allowed through at full stream, but things like YouTube or other platforms that could be used more for social dissent are dramatically throttled.
In the case of Hong Kong, I think this is an interesting case, because I think you're seeing this more and more now, where you're seeing sort of protestors who are leveraging social to try and hold the government, you know, to sway, but at the same time, the government's trying to work around that. And you see this in—you see it in most countries now. You're seeing this ebb and flow.
This is an interesting question. I think it's—there's a two-part—two parts to the arms race. I think one is the way in which, you know, it's sort of the constant one-upmanship of, you know, the moment the government blocks Twitter, oh, well, we'll use this, we'll use a proxy. Well, the moment the government blocks that, you use some other platform. You're never going to be truly able to block information in this world. Even cutting out entirely, you know, you have satellite phones that are becoming more and more available today. I think that part is dead.
But I think there's also—the other piece of this is the arms race in terms of innovation that, you know, you have governments like ours that, you know, again, we all talk Big Data and there's all, you know, the Big Data initiative this, Big Data initiative that. And I always come back to, you know, you look at like book digitalization, you know, this—you know, the U.S. government put out all these, you know—you know, all these beautiful working committees talking about what it's going to take to digitize libraries and all this stuff. And, you know, I mean—I mean, you could stack this entire room full of all the reports that were commissioned. And you had a company like Google that just came along and said, hey, let's just do it, you know? Let's not write about it; let's just do it.
And, of course, you know, today we—no one's—you know, we have this massive collection. And I think there are governments out there that are being very innovative about this, that aren't saying what—I think it's often—I think the biggest thing in terms of arms race, there are many governments that talk about what we can't do in Big Data and what all the limitations are, and then there are those who are talking about, you know, again, we all know the limitation. I mean, there's limitation to everything. What can we do?
And I think that's—that's the arms race. I think there's the two arms races, the innovation arms race of who's going to leap ahead, and it's going to be governments that are using this for innovation, and then in terms of citizens, I think there's always going to be that arms race, but I think ultimately countries that try to repress information are going to find that increasingly difficult, short of totally closing the door, and even totally closing the door—you know, you had Cuba for a long time. You had the Sneakernet, you know, people walking around, you know, with USB flash drives, you know, disguised as, you know, anything you can imagine.
And I think even the most repressive regimes—or North Korea, you know, you have the—the common thing people have, you know, their music players or other things will have hidden data segments in them. I think, you know, in this modern era, I think it's—you know, I think it is an arms race, but I think—you know, I think increasingly citizens are going to win.
But I think the other piece of that—and I'll—I'm giving a long answer, as I always do—I think the other piece of that, though, is interesting, though. You look at the rise of all the new—what are all the new social platforms that are coming out there? The things like Snapchat, there are things—all these platforms for ephemeral messaging that people want to be able to talk and have it instantly destroyed and not attributable to them. And this, I think, is a backlash. People are kind of backlashing against the privacy—people are realizing.
The new generation that's coming out there, they're realizing, wow, you know, everything I put out there, it's permanent. People are going to use it against me. I want to be able to communicate without—I want to go back to the old days where I could say something and I can't have it used against me twenty years later.
I think this is an interesting case, because to what degree is that going to be used, you know, for dissidents to, you know, communicate? Or to what degree is it going to be used for people to put stuff out there and then back away from those comments later? So I think it's interesting...
SCHILLER: Well, as you and I discussed, ephemeral is not always ephemeral...
LEETARU: Exactly. Exactly.
SCHILLER: ... so—yeah. Question over here?
QUESTION: Marisa Lino from Northrop Grumman. The data-driven world that you are describing is fundamentally a reactive model. How do you mesh that with governments and their need to provide strategy—long-term strategy, in particular, if you are constantly focusing on the reaction?
LEETARU: Well, this is actually—I think—I think governments oftentimes, you know, they say that they're strategizing, but I think, as we see each of the crises that emerge, you can come up with all of these beautiful strategies. And, of course, the Pentagon, you know, loves having, you know, these million things. But the problem is that, invariably, what actually happens is not what was in your strategy. It was the one thing you didn't think of.
And so I'm always less interested in this, because, you know—you know, when I look at all the things that sort of batter—that batter countries, you know, you look, for example, at the Ebola crisis and you look at, for example, all the strategies that came about, what would happen if we had, you know, an outbreak here in the U.S.? And, of course, all those—you know, all those books have sort of gone out the window now, as the reality of what's actually happening has occurred.
And this, I think, is—I think an interesting case, because I think—when we think about reactive, the—and, again, you know, I'm both an evangelist—I'm one of those rare people who's both an evangelist for Big Data about what we can do with the bleeding edge, but also a realist. And my last—my two times ago Foreign Policy column talked about, you know, the beginning—the first glimmers of the Ebola outbreak.
And, you know, again, there was a lot of claims that this was seen by data-mining millions of tweets and highly reactive, but it turns out, no, the government of Guinea went on national television to announce it. But it was in broadcast, it was in French. And I think what's interesting about this is, if you'd actually—if you were actually watching all the data from that area, you actually saw all the pieces that you needed to come into play.
And actually, when that initial announcement came out, again, it wasn't tagged as Ebola. It was tagged as all the characteristics that would represent Ebola. And I think—but if you kind of rewound the clock and looked all the data that was there, in terms of sort of a forward-looking, I think right now we think of data mining like—you know, we're looking at Twitter, and what's trending on Twitter at this moment? Oh, look, you know, there's a shooting here, a bombing here. That's, again, this highly reactive, of, well, how do we sort of calm people down from this?
I think where data becomes so powerful and is the area that governments certainly—few governments are using properly is the ability to really look very far into the future using all of this. So long-term trends—so there's a majority organization, for example, that uses my data to look at hidden patterns. Take a country in Africa and say, for example, that six Christian missionaries have died in the last X number of months. You know, human reporters even on the ground or intelligence officers, they see that, but it means nothing to them. You know, people die.
But, you know, data—machines can look at that and say, of everything that's occurring in this entire region, those six now have died in such a proximity—both spacially and temporally—this cannot be random chance. Someone's targeting these people, and then dispatch that to someone to then dig further.
And so, you know, again, you can say, well, that's kind of reactive, but I think, you know, a good example, too, like Boko Haram, for example, one of the things that's interesting with my data is Boko Haram doesn't just kind of move into an area. They basically set up shop. What you see from the data is you'll see a glimmer, for example, an arms depot will go off because they were building an arms depot and actually detonated it, or you'll see not an attack, but you'll see, for example, rumors that a few people were seen in this area, and then you'll see kind of—or an ISIL, it's amazing to watch it from a 30,000-foot view, you know, kind of moving in and pulling back, and the tactical spread.
And, again, if you bring in the social cultural piece—I mean, you look at ISIL, for example, you know, rewind the clock, you know, data strongly suggested that this is what you were going to see, that, you know, you have the de-Baathification, you have all these, you know, military experts that know this area, they've got to—you know, they've got to eat. They've got to get a salary somehow.
You see all these—all the right pieces. You see disenfranchisement from the government. And you see the connections beginning to form across all the—at least open-source data that's out there. You know, to me, I mean, I think, you know, ultimately, everything is kind of reactive, but I think, you know, certainly a year out, a year ago, you were seeing strong, strong glimmers of this.
And, for example, Mubarak in 2011, back in 2005 is when you really start seeing a fundamental restructuring in Egyptian media suggesting that, you know, people—you know, again, you're not seeing people take—no one was taking to the streets saying down with Mubarak, but you were seeing a fundamental restructuring that you would have never seen in his rule. And you start seeing this accelerating, as we get towards the Arab Spring.
So, again, you know, it's—I liken it, really, to weather forecasting. The pieces are all there. I think the challenge is that we have to go beyond shiny new object syndrome, of, "cool, big data, let's prepare another 500 page report with a panel of experts", towards, you know, how do we think about this? How do actually think holistically? And how do we bring people in that are not—you know, so much of the work right now in the data world is being done by tech companies. And you have to have the tech companies there, but you have to have the experts who actually know the society and they understand, what does this mean, if I see this cartoon or I see this or, you know, people are doing this? You know, what does that—what does that mean? And I think these are the missing pieces.
SCHILLER: Question over here in the—either, well, I'll get to you both, so that's fine. Go ahead.
QUESTION: Hi, Ronit Avni with Just Vision. I'm curious, just to get a little bit granular right now, about who you're following in terms of the intellectuals who are at the cutting-edge of this and which apps are you watching that you think are very helpful, just for us to know, who can you not, you know, live a week without as you're following this?
LEETARU: You know, it's funny, because, again, you know, in terms of what's happening around the world, you know, it's funny, I know more about what's happening in the rest in the world than I usually do here in the U.S. It changes every day, really.
You know, in terms of—I mean, I think in terms of technology, in terms of tools, you know, obviously, you know—you know, it's funny, because I used to be what was called a performance guy, so I was one of those supercomputing people who were writing, you know, basically machine code and optimizing every little piece, so I was deeply in that world.
And I sort of pivoted all the way now to being an applications person, where, you know, I just want things to work. You know, I don't want to deal with things.
SCHILLER: Like the rest of us.
LEETARU: And so, you know—so, for example, you know, last week, Felipe Hoffa from Google, cloud developer was here from Google Cloud, and he's one of the evangelists for Google BigQuery, and that's a tool, actually, that my data sits inside of. And I think that's a great example of, you know, my data set's 275 million records and, basically, huge amount of detail for every record.
Traditional databases, you say, well, that's not a lot of data, but every single person that queries it queries it in a different way and accesses it in a different way and are doing things that are brute force. So one of the things that we did was actually look at, if you take—you know, take right now, you know, today in Ukraine, take the last two months of what's happened, make a timeline of unrest, walk that through all of world history and find all the periods in the past most similar to today, look at what happened after each of those, and it turns out that actually gives you a decent forecast of the overall macro-level movements. That's a brute force. That literally requires, you know, right now, taking 2.5 million correlations, basically. Brute force.
And so tools like BigQuery, these tools that are transparent, I just load my data and I just start, you know, writing queries. And, you know, I don't know if it's at five processors or a million processors sitting behind it. It's just magic. And to me, you know, the rise of sort of these tools that allow us to really have the just magic piece of it, that I think is—you know, is a huge piece of this.
You know, I think I'm also interested in how platforms are evolving, so the things like Vcontact, you know, Vcontact is really changing in terms of its representativeness. And I've got some pieces coming out later on this. It's not as representative of sort of the anti-Putin movement as it used to be. Balatarin, which is sort of the Tumblr of Iran, is totally government now. You really don't have—you know, it used to be a really vibrant community.
So I think also platforms are constantly evolving. A colleague and I did a piece a number of years ago about MySpace. And at the time, you know, everyone was like, there's never going to be a replacement. MySpace will forever be here.
And so things—the world is constantly changing. And so I kind of look at it day by day, but also what platforms are being interesting in what parts of the world? Things like WhatsApp and some of these other platforms that, you know, not—you know, you say in the street here, and not everyone says, "Oh, yeah, I bought it," but in other areas, you look at how it's being used in other areas, I think the—or things like Weibo. You know, I think—you know, I'm very fascinated to see where, you know, a lot of these platforms are going.
And it's kind of a roundabout answer, because, you know, again, I'm—you know, I'm constantly moving—I'm a one-person show, so my—you know, I'm constantly sort of focusing across the planet. And so it kind of depends on what day it is. But I think that the big part is tools and platforms that allow us to take, you know, this—allow us to do things transparently, where we're able to take sort of the technology from a technical standpoint.
And then from a messaging platform standpoint, I think what's emerging with the teenagers is always very interesting to me, because you think about—you know, they are the future. I mean, this is where this stuff comes from. And I think oftentimes what they do, you know, things like Snapchat and these things, like, you know, ultimately goes all the way to Wall Street and, you know, allegedly Wall Street bankers are using it to exchange insider trading tips. You know, and it's kind of interesting, because to what degree will that start happening in other platforms?
And these I think are also very interesting, because if you have, for example, an encryption app on your phone and you're in a repressive regime and the police seize your phone, if they see one of these apps on here, you're going to stand out as a dissident.
If you have, say, Snapchat on—now, again, Snapchat would be considered bad in some countries, but in many countries they see, you got Snapchat, they're just going to think you're just, you know—they're not going to see that, what it's being used for. And I think that's also kind of interesting, is kind of the evolution of sort of the mobile and physical divide and spatial information.
I know that was a poor roundabout answer, but...
SCHILLER: No, no, that was—somebody—right there. Yeah, thank you.
QUESTION: That's me. My name Ivan Sellin. Everybody seems to have his or her own definition of what's Big Data. I've counted six that you've gone through so far. But the sixth one is the most interesting. You didn't really bring that up until two questions ago. And to me, what Big Data is, is a way of looking at data without knowing what your hypotheses are in advance. In other words, allowing the data to suggest the hypotheses.
You used the analogy of forecasting weather, but that's not a good analogy, because the weather forecasters know the questions. They just don't know the answers. I would like to ask you—you talked a little bit about apps, but not much, you know, how to figure out in retrospect what was going on in Ukraine or what have you.
What's—what's new? What's coming? What is possible in starting out where you don't really know what questions you have? You just sort of general—and please don't limit yourself to somebody else's definition of Big Data, which is large, unstructured, crappy databases, you know? We can be talking about scientific databases, as well, where the inputs are very highly structured, but there are just a lot of them and not necessarily in the same formats.
LEETARU: Yeah, I mean—I mean, I think the—you know, the short answer is, is my next book from Wiley coming out end of next year, early the year after. It's actually on defining Big Data and how it's reshaping society.
And I think, you know, to me, I always use that analogy of, you know, to me, Big Data really is about—the true promise of Big Data is really in what you do with it. So you think about, you know, Twitter, and you could say, well, you know, there's how many, 160 billion tweets? And you can say, well, you know, that's a lot of data. And people always use Twitter as the example of Big Data. And they say, well, how is the typical person...
SCHILLER: Yeah. Yeah.
LEETARU: ... you know, how do people—how do people use Twitter? Well, the average person isn't looking at 160 billion tweets. They're doing a keyword search. Even companies, they're doing a keyword search. And if you're like Nike shoes, you're doing a keyword search. You're bringing back the few tens of thousands of tweets or hundreds of thousands that mentioned your brand that day. That's what you're actually analyzing. You're not looking at all 170 billion tweets.
To me, the power comes in—when you look beyond keyword searching for your brand or if you're, say, the United States, instead of keyword searching, what—you know, what's the mention of President Obama across the world, what you want to look at is, how are people reacting not just to the United States? How are they sort of reacting to the notion of democracy? How are they reacting to all the sort of the policy perspectives that we have?
So it's really about looking holistically. And I did an interview for Wired when I was in London two years ago—of course, Wired has the famous, you know, death of theory. And I think—I think ultimately, theory isn't quite dead. Where it is, is theory used to book-end the exploration process. Data was so expensive to collect, you had to have a theory to say, what should I be looking for? Collect the data for that. And sometimes you still need to do that in some areas.
But oftentimes, you've, you know, basically—you sort of say, well, what in this—you think about—or experimental physics. You know, they build these massive, billion-dollar facilities, and so many of the fundamental things that come out of those aren't what they built the facility for. It's when people look through all the data that's there and say, well, this is kind of odd. Why am I seeing this?
And that to me is the true promise of Big Data is, taking these massive data sets, plowing them into machines, and I did a famous piece, Culturomics 2.0, in 2011 that actually did exactly this. It started with 100 million news articles, plowed them into a big machine, you know, basically extracted all this information, and basically said, what's interesting here?
And that, to me, is the future, because what the machine does, then, is the machine goes through, and it says, here's some really interesting patterns. In that particular paper, it said, you know, the global news tone about our country moves in highly non-random ways that seem to be structural alongside stability of that country.
Now, that's all the machine can do. That's kind of the end right now. So then the human comes in and says, well, what does this mean? You know, why might this be? What's the connection there? So to me, theory is kind of on the end now, where, you know, you—we have all these data. Because, you know, humans are very limited. You look at every theory that's ever come out about, you know, why countries go to war, all the things. A lot of those are going out the window in the modern era, because they're based on sort of understandings of what happened, say, 1950s Europe.
You look at like the PITF, a lot of things that came out of PITF, Political Instability Task Force, which is kind of the intelligence community's liaison with the academic community. A lot of the models that they come up with for why countries are collapsing, they don't hold in our modern world. And that's, again, because humans—you know, we have to really dumb things down to understand.
The most brilliant human on this Earth can't absorb everything on the planet, so they have to kind of simplify things. So they have to look at, you know, why did the Egyptian revolution occur? Well, it's because were unhappy. Well, that's great. Let's bring that a little bit further down.
Well, again, humans can't deal with that. Machines can look at every available data point there and find these fascinating little pieces. And the way I liken it is, you know, if you want to know, for example, will North Korea go—you know, will it collapse next year? What you ideally do is you go person by person, interview every person in the country and see whether they're—whether they want to rebel or not. The problem is, you can't do that.
And so where this stuff comes out, or you look in the London protests. The London protests were easy to foresee. The promise is that the government data, you don't—I mean, I forgot if it's quarterly there—the data you want, like high youth unemployment, if you had that by household in a real-time basis, you would have foreseen that—you know, long, long before. The problem is, you don't have that type of data.
So the data that you would want that would—that you should be looking for you can't access today. So what happens is kind of all the early precursors bubble up in all these different places, and theory would never tell you to go look at the price of wheat in Zimbabwe to give you the price of gold in London. Theory would never tell you to look for that because there is no direct relationship.
The problem is the thing you want to look at, you can't observe, so it manifests itself in all of these ways, and data can bring all those to the forefront. It can tell us these hidden patterns. And then we as humans can then kind of layer a theory on top of that and say, well, here's why I think we're seeing this.
QUESTION: Let me ask a quick follow-up. These are all unit-by-unit questions. You know, what's this one doing? What's that one doing? What about networks? What about the future of trying to find what a group of people doing or who else is doing the same thing...
LEETARU: Absolutely. So...
QUESTION: ... which may not be coded in the basic data.
LEETARU: Absolutely. in fact, most of my data (inaudible) was physically happening around the world, so I make a catalog of physical events, so riots, protests, et cetera. I think this will capture all the interconnections between that, and there actually is some work I have coming out soon that actually builds influencer graphs, that actually says, for example, who's the most influential—who are the top three most influential journalists in a particular topic in a particular geographic area, and actually, in real-time evolving this?
Who are the most influential? If you need, for example, to—to get, say, the agricultural minister of a certain country, and you're having difficulty through traditional diplomatic channels, what are other—what are other sort of persons of influence? It turns out we can actually extract a lot of this, because when you're looking for our—sort of the residue that those people leave in the open world—so, for example, you might see, for example, that—when this reporter writes something, regardless of topic, regardless of who they're interviewing, regardless of anything, you know, things change in that world. Or when this particular politician says something, things change. When this politician says something, nothing changes.
Or what we see, actually, in some countries, which are very interesting, is we'll see, for example, without naming the country, you'll see, for example, a business leader over here, and this politician will say things, but when that business leader, that business leaders always seems to comment, and whichever direction this business leader comments is where things are going. So that tells us that the real power behind the throne is this person over here, not this person.
Or, you know, you think about the oil and gas industry in a country like Brazil or Nigeria or elsewhere, you know, there's external influences that have a lot more control over that than the domestic government. So these are all the networks that we can pull out of this—and, again, these are induced networks, so these are things where you're looking at very, very sort of nuanced patterns of information activity. There's a lot we can evolve. And that is a big piece of the future.
SCHILLER: Over here. Thank you.
LEETARU: Got a question from this side of the room.
QUESTION: Curtis Valentine. The question is more about the role of Big Data in data mining and actually pushing people closer to people who are like them and further away from people who are different, that in America as divided as ever been, for the idea when I go on Twitter, Facebook, Instagram, and they say, all your friends are like this? Here's some more people just like you...
QUESTION: ... people who—who like this also watch this TV show. Go watch this TV show. And it's pushing us further and further close to people who are like us and further, further away from people who may have varying opinions, which is, I think, right now in America what we need. So the question is, what role can Big Data do?
And flipping that proposition and saying, Curtis, here's someone who may have a different opinion about this, but someone who you should be speaking with if you're trying to get anything done.
LEETARU: This is a fantastic question. And it's actually—I mean, data mining plays a huge role in this. It's been a fascinating piece. So you think about, you know, fifty years ago, you open the New York Times, and you were exposed to all kinds of different views. Now, again, you know, if you are a certain viewpoint, you might not open the New York Times.
But, you know, ultimately, you were still exposed to a lot of things, because, you know, you might open it because you're interested in a particular story, but you're flipping back and say, wow, I didn't know this was happening in Haiti, I didn't know this was happening in Ukraine. You're being exposed to what we in the information science world call serendipitous discovery.
Increasingly, that's dying. Because you think about it, you go to the New York Times website now, and you're—you're zooming in right to that article, where someone's sharing and telling you, you should check this out. And then it's recommending other things, so the echo chamber effect.
What's so fascinating to me is, you know, the majority of the money that's being spent right now on trying to understand people is being spent on that type of targeting, to say—because you think about it, if you, say, a staunch Democrat and someone comes along and says, here's this great Fox News piece, or you're a staunch Republican and someone says, here's this great, you know, Huff Post piece, you're not probably—you know, you're going to—you're not going to be happy about that. You're going to—you're probably not going to read it all the way through. So in terms of ad sales or other things, it's not useful for them. It doesn't monetize you.
And so there's this real push towards, you know, making you happy. And you look at, for example, the Facebook study that got all the press about manipulating people's news feeds, you know, that—the only reason that was so controversial is because we saw it. You know, and this happens at companies all the time. I mean, that is the type of research that happens, because the goal is to make you happy. So if we give you all these wonderful stories that make you feel wonderful, it's almost a drug, essentially. You keep coming back to it.
I think, though, this has profound implications to what we understand about the world. I think it's also interesting in that the gatekeepers used to be the journalists, for example. And this is—you know, whenever—whenever people say, oh, well, social media gives us on all these things. I say, OK, well, what's the latest that's occurring in Haiti right now?
Now, someone in this room probably does know what's happening in Haiti, but usually when I give this talk, I say, well, you know, for all this data we have out there, and people are talking about all these things, and I say, well, what's happening in Haiti? Well, most people don't know because it's not an interest right now to—you know, the far more interesting news stories, so to speak. And so the New York Times, it's been a while since they've run a story on Haiti.
And so it used to be, you know, sort of editors were making that decision about what they thought were useful to their audience and what they thought were sort of relevant to what's happening in the domestic politics. So there was—there was some accountability.
In the data world, it's very interesting, because these are cold, hard algorithms that are making these calls, but it is kind of interesting, because these—in the news world, again, you know, if they really screw it up over time, people will stop buying that paper. But algorithms are seeing this in real-time. They're pointing a bunch of people to go see this study—to go see this link, and no one's clicking on it or they're getting rid of it real fast. So they know that was a bad choice to make.
So algorithms are learning so fast, they know more about what we like, our innermost desires, than we ourselves know. And this, I think, is kind of an interesting piece, in that these algorithms—because they're watching—you know, like you're thinking about Google, for example, when you go to that search page, it's looking at what link you're searching on, what link you click on, and if you then hit the back button and go right back and click on a different link, it knows, hey, this was a mistake.
This is fascinating to me in terms of the power it has to make us happy, but is that—is that the purpose? I mean, should we be exposed to things that make us unhappy, you know, opposite viewpoints? And I think this is a—this is a fascinating question, because I think this is part of what's driving us towards a hyper-partisan world, is that we have the ability today to wall ourselves off from anything.
And again, we've always kind of had this, because elites have always controlled that world. You know, you think back, you know, 100 years ago, only elites had the ability to get information out there. You know, you could maybe hang a pamphlet on a light post, but, you know, it was the elites that controlled the world.
Today, it's still the elites. It's algorithms. But it's done in a way where at least before there was always some kind of push. They were trying to kind of give it a comprehensive view to some degree. I think this is the interesting question. And we're only going down this pathway further and further.
And so I think—you know, and also the rise, especially with things like cable television, I mean, this is not new to the social data-mining world. You know, this is—this is the rise of cable television, when you start seeing all these specialty things. You see the rise of things like Fox News and—I forget its equivalent on the Democratic side. You know, you see—you saw this rise. This predates the social media world.
But the social media world has really systematized it and brought it to levels that we've never seen before. And I think, too, as information has exploded, we don't have the ability to process this stuff anymore, so the notion of sharing, you kind of like—you log onto Facebook or something, and you see what all your friends are looking at, and you go click those links. By the time, you know, ten minutes or twenty minutes or probably more has erupted, you don't have time to go look at other things. So that's your daily news-fest.
So I think this is an interesting thing. I don't think we've quite comprehended yet what the long-term impact will be. But I do think that it is—it is a very interesting world that is pushing us down. And I think—I mean, and the polar opposite, though, is I think technology does give us the ability, if we do want to reach, there are tools that can do that. In fact, a lot of my work goes around, you know, here's a topic and here are the two—here are the different sides.
So I just did a piece with my father. He's one of the carbon capture FutureGen people. And so one of our latest papers looked at the whole media landscape around that. Tried to break that down to, how do people talk about carbon capture, CCS? Like, what are the different ways it's portrayed in the media? And being able to then separate out, where are the journalists, then, that cover each side? Where are the politicians, the business and political and academic elites?
So the ability to instantly, with a few mouse clicks—and it's interesting, because even people that specialize in that field, we found a lot of very interesting ways in which these were connected. So it gives you that ability to say, well, what is the other side saying about this? So it's there. I think oftentimes we just don't listen to it, because, again, it's in the hyper-partisan world.
SCHILLER: We have a question right over here in the front.
QUESTION: So I'm Sara Agarwal with Hewlett-Packard. And, of course, we have Autonomy and Vertica Big Data units. And my question for you is, I think we need to understand more about how this is going to be a reality in organizations today and what is the intermediary look like? Because there's like a small number of people like you who really understand this stuff well, and then there's like normal people...
QUESTION: ... who want to incorporate it...
QUESTION: ... and is it—is it going to be that there's going to be some companies that are very specialized that figure out how to do this? Is it that organizations themselves need to hire more data scientists? And the reason I ask that question—and I talk to a lot of development organizations on a regular basis, and I had this conversation with the—you know, innovation lab at the World Bank last week. And they were...
LEETARU: I work with them.
QUESTION: ... hosting this big challenge for development institutions that want to—you know, or people within their own organization that want to promote a Big Data project, right? And they were like, you know, what we really need to do is, like, hire some data scientist who will help us figure this out and help these people who have these great ideas, right?
And I'm like, well, why don't you just take the data and send it to, you know, a software organization? Or we could do a pilot project for you through Autonomy. And, you know, we can run the data and give it back to you in two weeks, and, voila, there you have it.
And he looked at me with this like blank expression on their face, like—but, god, what data do we send you? Like, how do we know data to send you? And how do we know what to analyze? And how do we do that? Right, so we really are missing an intermediary in this link, and I'm just wondering what your views...
LEETARU: Yeah, I mean...
SCHILLER: Before you answer the question, I just—because that's a great question. And actually, it's going to be our last question. It's a great one to finish on, which is the interpretation of the data and how you—how do you make actionable. So—but since this will be the last question, I just want to remind everybody this session has been on-the-record, and we'll make this the last one. Go ahead.
LEETARU: Excellent. Yeah, so I think that's a fantastic question. And I think, you know, that is one of the challenges, because you have lots of these boutique companies out there that basically they can, you know, sort of do anything under the sun.
The challenge is, it's not about sort of boxing up your data and shipping it over. You really have to embed in an organization to understand it. So I've worked with—with a large number of the Fortune 50 over the years on—actually worked most recently actually with the World Bank.
And where I think the—what you have to do is you can't just sort of simply say, you know, here are our data archives, have at it, because you really have to understand how an organization works. So, for example, one large consumer product organization I worked with, company, they were very interested in consumer sentiment and how consumers viewed their product and what the future of their industry was going to be.
And so, you know, when I arrived, you know, they were all, you know, focusing narrowly on that, and I was the one who sort of said, well, let's look at every piece of your organization. What are all the data sets of your entire organization has? And how do you go about this right now?
And so it turns out, you know, from sitting there and kind of interacting across the company, it turns out that the way that they purchased new retail stores for themselves, they really—their retail unit—it was a whole different part of the company—deeply tried to understand each community they built a store in. Well, this is incredible data that other parts of the company aren't seeing.
And I think the challenge really is, is interpreting and contextualizing. So the classic example, obviously, in the policy space, I'm sure you've all had, you know, companies come—I do a lot with the DoD community. And you have all these contractors kind of coming up and saying, hey, look, you know, we grabbed this data and said, oh, look, we've got some great findings here.
The problem is that they lack the understanding to say, well, OK, the way I want to answer this question is, I went to—a DoD public on-the-record DoD meeting. And they—there was a graph up there that was showing how the Syrian rebels felt about their government, and it was down to the block level, the most beautiful graphs I've ever seen, without naming the company or the organization to avoid embarrassing them.
But I was the only one in my—so it was amazing, because, you know, this is—this is a room, and there were a lot of policy folks here, and I was the only one that raised my hand at the end and said, well, you know, this is amazing, this is an unclassified presentation. This is amazing data. Where are you getting block-level data on this?
And they specifically had at the bottom, you know, billions of data points, and they said, well, these are English-language geotagged tweets. And there are so many things wrong with that statement, because, you know, the—I mean, they're using Facebook as their primary piece, not Twitter right now. You know, there's Arabic, dialectical Arabic, not even formal Arabic, and very few of them have GPS tagging or cellular triangulation turned on their tweets.
And I said, well, why aren't you, you know, looking at Arabic instead of English? And they said, well, none of us—you know, none of on our technical team speak Arabic. And I said, well, why Twitter? Well, you know, Twitter is a much easier API to use than Facebook. And I said, well, why geotagged tweets? And they said, well, these are easier to just shove on a map.
And so the problem is it that technical folks and the technical folks in the room, the thing that I think that struck me the most is that the—I mean, this is a very, very skilled company that the particular company was doing this. This is not, you know, some random boutique. These people knew what they were doing.
But the problem is that, you know, someone like me was just like—and I love the gasps from this room, because I've actually given—I've actually given that example on technical presentations and no one saw what was wrong with that. And this is the problem, is that the technical folks, they don't oftentimes know what's happening, because how many—you know, how many people, you know, know what the Syrian rebels—one of my junior fellows, (inaudible), she was actually married to a Syrian. She was actually there in Syria for a big portion of the conflict. And so that was part of what her work focused on.
And this—you know, you get a different perspective from being in country and understanding the local—like, how people use media. You know, classic example, you know, something like Jakarta and, you know, you'll see all this—all these photographic tweets or Instagram or other platforms, and you'll kind of set those aside and say, well, these aren't, you know, useful.
Well, the problem is that if you know the culture, you know that these are more important than the other tweets, so I think the problem is you really can't—I think that is the challenge today, is that right now the challenge has been outsourcing. Most companies—especially here in Washington, most, you know, organizations, they kind of outsource—hey, get some data analysts, make me some graphs. You really have to embed and understand that, and that's the piece that's been missing.
SCHILLER: Great. Thank you so much. Thank you, Kalev. Thank you, everybody, for the great questions.