Debating Big Data for Intelligence

I’m always afraid of engaging in a “battle of wits” only half-armed.  So I usually choose my debate opponents judiciously.

Unfortunately, I recently had a contest thrust upon me with a superior foe: my friend Mark Lowenthal, Ph.D. from Harvard, an intelligence community graybeard (literally!) and former Assistant Director of Central Intelligence (ADCI) for Analysis and Production, Vice Chairman of the National Intelligence Council – and as if that weren’t enough, a past national Jeopardy! “Tournament of Champions” winner.

As we both sit on the AFCEA Intelligence Committee and have also collaborated on a few small projects, Mark and I have had occasion to explore one another’s biases and beliefs about the role of technology in the business of intelligence. We’ve had several voluble but collegial debates about that topic, in long-winded email threads and over grubby lunches. Now, the debate has spilled onto the pages of SIGNAL Magazine, which serves as something of a house journal for the defense and intelligence extended communities.

SIGNAL Editor Bob Ackerman suggested a “Point/Counterpoint” short debate on the topic: “Is Big Data the Way Ahead for Intelligence?” Our pieces are side-by-side in the new October issue, and are available here on the magazine’s site.

Mark did an excellent job of marshalling the skeptic’s view on Big Data, under the not-so-equivocal title, Another Overhyped Fad.”  Below you will find an early draft of my own piece, an edited version of which is published under the title A Longtime Tool of the Community”:

Visit the National Cryptologic Museum in Ft. Meade, Maryland, and you’ll see three large-machine displays, labeled HARVEST and TRACTOR, TELLMAN and RISSMAN, and the mighty Cray XMP-24. They’re credited with helping win the Cold War, from the 1950s through the end of the 1980s. In fact, they are pioneering big-data computers.

Here’s a secret: the Intelligence Community has necessarily been a pioneer in “big data” since inception – both our modern IC and the science of big data were conceived during the decade after the Second World War. The IC and big-data science have always intertwined because of their shared goal: producing and refining information describing the world around us, for important and utilitarian purposes

What do modern intelligence agencies run on? They are internal combustion engines burning pipelines of data, and the more fuel they burn the better their mileage. Analysts and decisionmakers are the drivers of these vast engines, but to keep them from hoofing it, we need big data.

Let’s stipulate that today’s big-data mantra is overhyped. Too many technology vendors are busily rebranding storage or analytics as “big data systems” under the gun from their marketing departments. That caricature is, rightly, derided by both IT cognoscenti and non-techie analysts.

I personally get the disdain for machines, as I had the archetypal humanities background and was once a leather-elbow-patched tweed-jacketed Kremlinologist, reading newspapers and HUMINT for my data. I stared into space a lot, pondering the Chernenko-Gorbachev transition. Yet as Silicon Valley’s information revolution transformed modern business, media, and social behavior across the globe, I learned to keep up – and so has the IC. 

Twitter may be new, but the IC is no Johnny-come-lately in big data on foreign targets.  US Government funding of computing research in the 1940s and ‘50s stretched from World War II’s radar/countermeasures battles to the elemental ELINT and SIGINT research at Stanford and MIT, leading to the U-2 and OXCART (ELINT/IMINT platforms) and the Sunnyvale roots of NRO.

In all this effort to analyze massive observational traces and electronic signatures, big data was the goal and the bounty.

War planning and peacetime collection were built on collection of ever-more-massive amounts of foreign data from technical platforms – telling the US what the Soviets could and couldn’t do, and therefore where we should and shouldn’t fly, or aim, or collect. And all along, the development of analog and then digital computers to answer those questions, from Vannevar Bush through George Bush, was fortified by massive government investment in big-data technology for military and intelligence applications.

In today’s parlance big data typically encompasses just three linked computerized tasks: storing collected foreign data (think Amazon’s cloud), finding and retrieving relevant foreign data (Bing or Google), and analyzing connections or patterns among the relevant foreign data (powerful web-analytic tools).

Word CloudThose three Ft. Meade museum displays demonstrate how NSA and the IC pioneered those “modern” big data tasks.  Storage is represented by TELLMAN/RISSMAN, running from the 1960’s throughout the Cold War using innovation from Intel. Search/retrieval were the hallmark of HARVEST/TRACTOR, built by IBM and StorageTek in the late 1950s. Repetitive what-if analytic runs boomed in 1983 when Cray delivered a supercomputer to a customer site for the first time ever.

The benefit of IC early adoption of big data wasn’t only to cryptology – although decrypting enemy secrets would be impossible without it. More broadly, computational big-data horsepower was in use constantly during the Cold War and after, producing intelligence that guided US defense policy and treaty negotiations or verification. Individual analysts formulated requirements for tasked big-data collection with the same intent as when they tasked HUMINT collection: to fill gaps in our knowledge of hidden or emerging patterns of adversary activities.

That’s the sense-making pattern that leads from data to information, to intelligence and knowledge. Humans are good at it, one by one. Murray Feshbach, a little-known Census Bureau demographic researcher, made astonishing contributions to the IC’s understanding of the crumbling Soviet economy and its sociopolitical implications by studying reams of infant-mortality statistics, and noticing patterns of missing data. Humans can provide that insight, brilliantly, but at the speed of hand-eye coordination.

Machines make a passable rote attempt, but at blistering speed, and they don’t balk at repetitive mindnumbing data volume. Amid the data, patterns emerge. Today’s Feshbachs want an Excel spreadsheet or Hadoop table at hand, so they’re not limited to the data they can reasonably carry in their mind’s eye.

To cite a recent joint research paper from Microsoft Research and MIT, “Big Data is notable not because of its size, but because of its relationality to other data.  Due to efforts to mine and aggregate data, Big Data is fundamentally networked.  Its value comes from the patterns that can be derived by making connections between pieces of data, about an individual, about individuals in relation to others, about groups of people, or simply about the structure of information itself.” That reads like a subset of core requirements for IC analysis, whether social or military, tactical or strategic.

The synergy of human and machine for knowledge work is much like modern agricultural advances – why would a farmer today want to trudge behind an ox-pulled plow? There’s no zero-sum choice to be made between technology and analysts, and the relationship between CIOs and managers of analysts needs to be nurtured, not cleaved apart.

What’s the return for big-data spending? Outside the IC, I challenge humanities researchers to go a day without a search engine. The IC record’s just as clear. ISR, targeting and warning are better because of big data; data-enabled machine translation of foreign sources opens the world; correlation of anomalies amid large-scale financial data pinpoint otherwise unseen hands behind global events. Why, in retrospect, the Iraq WMD conclusion was a result of remarkably-small-data manipulation.

Humans will never lose their edge in analyses requiring creativity, smart hunches, and understanding of unique individuals or groups. If that’s all we need to understand the 21st century, then put down your smartphone. But as long as humans learn by observation, and by counting or categorizing those observations, I say crank the machines for all their robotic worth.

Make sure to read both sides, and feel free to argue your own perspective in a comment on the SIGNAL site.

Education for Information Security in a Connected World

Much of what I work on involves technologies which address information security and cyber security. So I have to ask, Who is training our next generation of technologists? And are those educators doing enough to focus on the dynamically changing demands of Information Security?

Those fundamental questions took me to Chicago recently, to take part in a roundtable discussion sponsored by DeVry University, “The Demand for Information Security in a Connected World.”

Continue reading

Google, Microsoft, and Medical Research

Fact: Two stark numbers are published today about Google co-founder Sergey Brin. First, the annual update of the “Forbes 400″ wealthiest billionaires reports that Brin’s personal net worth is $15.9 billion (though that’s down some $2.7 billion from last year, due to the decline of Google’s stock price by 40% since last November).  More importantly, Brin himself wrote in his personal blog today that by having genetic research done on himself, “I learned something very important to me — I carry the G2019S mutation… it is clear that I have a markedly higher chance of developing Parkinson’s in my lifetime than the average person. In fact, it is somewhere between 20% to 80% depending on the study and how you measure.”

Analysis: Sergey Brin’s own blog account of his discovery is a remarkably personal and touching piece, dealing with his mother and her own belated diagnosis of Parkinson’s, and the scientific boundaries of current genetic research and the implications one can draw from this immature field of science.

(c) AP Photo, Paul Sakuma

This was only the second post on Sergey’s new blog; the blog’s name is “Too” – and the first post merely stated the rationale for that name (“Welcome to my personal blog. While Google is a play on googol, too is a play on the much smaller number – two. It also means ‘in addition,’ as this blog reflects my life outside of work”). 

If his refreshing honesty and thoughtfulness today are going to be the calibre of his writing, I’m going to be a regular reader. 

His piece reminds me of Steve Jobs’ modern classic, his 2005 Stanford Commencement Address.  If you’ve never read that, then stop reading my words right now, and go read that. You’ll find yourself over the weekend thinking about your own approach to life.

But back to Brin and genetic research.  It will be interesting to watch what Google’s research arm is able to do in the area of medical and health research.  To make progress in bioengineering and genetics, “organizing the world’s information” is absolutely paramount and of course that’s Google’s mission statement.

Continue reading

Invisibility, Mind-Control, Great Coffee, and a New OS

Lots of interest and blogoshere commentary beginning about “The Mojave Experiment.”

The reaction is reminiscent of one of those Obama or McCain provocative ads posted online, generating far more attention and buzz than the attention they get on the natural by being broadcast.

Sure, it’s a sales pitch, and pretty narrowly geeky at that (thanks GoogleFight!).

But at least it’s an innovative one – as the Wall Street Journal puts it today, “Give Microsoft people credit: They did it with humor, and they weren’t afraid to air the negative stuff.”

Continue reading

Test for Prediction Markets: They Say Obama, but Polls Say It’s Tied

Fact: According to the latest Rasmussen poll released Saturday July 12, and promptly headlined by the Drudge Report, “The race for the White House is tied. The Rasmussen Reports daily Presidential Tracking Poll for Saturday shows Barack Obama and John McCain each attract 43% of the vote.” Newsweek is reporting a similar result in its own poll, with Obama moving down and McCain up (“Obama, McCain in Statistical Dead Heat“), and other polls increasingly show a similarly close race.

Analysis: I’ve been tracking the growing divide between two quite different methods purporting to offer statistical predictive analysis for the November presidential election. Polls are saying one thing, but Prediction Markets are saying another. 

Continue reading

Early Bill Gates, and Bill’s Last Email

From Bill Gates’s final email today to “All at Microsoft”:

 

As Microsoft has grown, one of the most exciting and fulfilling things for me has been to watch new leaders develop. I am thrilled to have Ray and Craig playing key roles in guiding the company’s strategy… For over a decade I had hoped that we could convince Ray to join Microsoft — and in the three years he has been here, he has made a huge difference in helping us focus on the challenge and opportunity of software plus services. I have worked with Craig for more than 15 years. His ability to anticipate the future direction of technology is a key asset, as is his deep interest in and understanding of emerging markets.

Of course, I’ll continue to be involved in the work of the company as part-time Chairman. As part of this I will help with a handful of projects that Steve, Ray, and Craig select.”  

Continue reading

“The Largest Social Network Ever Analyzed”

FACT: According to ComScore data cited in a story in Monday’s FInancial Times, “Facebook, the fast-growing social network, has taken a significant lead over MySpace in visitor numbers for the first time… Facebook attracted more than 123 million unique visitors in May, an increase of 162 per cent over the same period last year… That compared with 114.6 million unique visitors at MySpace, Facebook’s leading rival, whose traffic grew just 5 per cent during the same period… The findings mark the first time that Facebook, launched in 2004, has taken a significant lead in unique visitors, [and] come at a time of change inside Facebook, as the one-time upstart attempts to transform itself into a leading media company.

ANALYSIS:  This week several members of the Microsoft Institute met in Redmond with a visiting friend from government, and among other talks we had a very interesting discussion with Eric Horvitz, a Microsoft Research principal researcher and manager.  Eric’s well known for his work in artificial intelligence and currently serves as president of the Association for the Advancement of Artificial Intelligence (AAAI).

We talked about one of Eric’s recent projects for quite a while: “Planetary-Scale Views on a Large Instant-Messaging Network,” a project which has been described by his co-author as “the largest social network ever analyzed.” 

Continue reading

Follow

Get every new post delivered to your Inbox.

Join 6,249 other followers

%d bloggers like this: