Debating Big Data for Intelligence

I’m always afraid of engaging in a “battle of wits” only half-armed.  So I usually choose my debate opponents judiciously.

Unfortunately, I recently had a contest thrust upon me with a superior foe: my friend Mark Lowenthal, Ph.D. from Harvard, an intelligence community graybeard (literally!) and former Assistant Director of Central Intelligence (ADCI) for Analysis and Production, Vice Chairman of the National Intelligence Council – and as if that weren’t enough, a past national Jeopardy! “Tournament of Champions” winner.

As we both sit on the AFCEA Intelligence Committee and have also collaborated on a few small projects, Mark and I have had occasion to explore one another’s biases and beliefs about the role of technology in the business of intelligence. We’ve had several voluble but collegial debates about that topic, in long-winded email threads and over grubby lunches. Now, the debate has spilled onto the pages of SIGNAL Magazine, which serves as something of a house journal for the defense and intelligence extended communities.

SIGNAL Editor Bob Ackerman suggested a “Point/Counterpoint” short debate on the topic: “Is Big Data the Way Ahead for Intelligence?” Our pieces are side-by-side in the new October issue, and are available here on the magazine’s site.

Mark did an excellent job of marshalling the skeptic’s view on Big Data, under the not-so-equivocal title, Another Overhyped Fad.”  Below you will find an early draft of my own piece, an edited version of which is published under the title A Longtime Tool of the Community”:

Visit the National Cryptologic Museum in Ft. Meade, Maryland, and you’ll see three large-machine displays, labeled HARVEST and TRACTOR, TELLMAN and RISSMAN, and the mighty Cray XMP-24. They’re credited with helping win the Cold War, from the 1950s through the end of the 1980s. In fact, they are pioneering big-data computers.

Here’s a secret: the Intelligence Community has necessarily been a pioneer in “big data” since inception – both our modern IC and the science of big data were conceived during the decade after the Second World War. The IC and big-data science have always intertwined because of their shared goal: producing and refining information describing the world around us, for important and utilitarian purposes

What do modern intelligence agencies run on? They are internal combustion engines burning pipelines of data, and the more fuel they burn the better their mileage. Analysts and decisionmakers are the drivers of these vast engines, but to keep them from hoofing it, we need big data.

Let’s stipulate that today’s big-data mantra is overhyped. Too many technology vendors are busily rebranding storage or analytics as “big data systems” under the gun from their marketing departments. That caricature is, rightly, derided by both IT cognoscenti and non-techie analysts.

I personally get the disdain for machines, as I had the archetypal humanities background and was once a leather-elbow-patched tweed-jacketed Kremlinologist, reading newspapers and HUMINT for my data. I stared into space a lot, pondering the Chernenko-Gorbachev transition. Yet as Silicon Valley’s information revolution transformed modern business, media, and social behavior across the globe, I learned to keep up – and so has the IC. 

Twitter may be new, but the IC is no Johnny-come-lately in big data on foreign targets.  US Government funding of computing research in the 1940s and ‘50s stretched from World War II’s radar/countermeasures battles to the elemental ELINT and SIGINT research at Stanford and MIT, leading to the U-2 and OXCART (ELINT/IMINT platforms) and the Sunnyvale roots of NRO.

In all this effort to analyze massive observational traces and electronic signatures, big data was the goal and the bounty.

War planning and peacetime collection were built on collection of ever-more-massive amounts of foreign data from technical platforms – telling the US what the Soviets could and couldn’t do, and therefore where we should and shouldn’t fly, or aim, or collect. And all along, the development of analog and then digital computers to answer those questions, from Vannevar Bush through George Bush, was fortified by massive government investment in big-data technology for military and intelligence applications.

In today’s parlance big data typically encompasses just three linked computerized tasks: storing collected foreign data (think Amazon’s cloud), finding and retrieving relevant foreign data (Bing or Google), and analyzing connections or patterns among the relevant foreign data (powerful web-analytic tools).

Word Cloud Big Data for IntelligenceThose three Ft. Meade museum displays demonstrate how NSA and the IC pioneered those “modern” big data tasks.  Storage is represented by TELLMAN/RISSMAN, running from the 1960’s throughout the Cold War using innovation from Intel. Search/retrieval were the hallmark of HARVEST/TRACTOR, built by IBM and StorageTek in the late 1950s. Repetitive what-if analytic runs boomed in 1983 when Cray delivered a supercomputer to a customer site for the first time ever.

The benefit of IC early adoption of big data wasn’t only to cryptology – although decrypting enemy secrets would be impossible without it. More broadly, computational big-data horsepower was in use constantly during the Cold War and after, producing intelligence that guided US defense policy and treaty negotiations or verification. Individual analysts formulated requirements for tasked big-data collection with the same intent as when they tasked HUMINT collection: to fill gaps in our knowledge of hidden or emerging patterns of adversary activities.

That’s the sense-making pattern that leads from data to information, to intelligence and knowledge. Humans are good at it, one by one. Murray Feshbach, a little-known Census Bureau demographic researcher, made astonishing contributions to the IC’s understanding of the crumbling Soviet economy and its sociopolitical implications by studying reams of infant-mortality statistics, and noticing patterns of missing data. Humans can provide that insight, brilliantly, but at the speed of hand-eye coordination.

Machines make a passable rote attempt, but at blistering speed, and they don’t balk at repetitive mindnumbing data volume. Amid the data, patterns emerge. Today’s Feshbachs want an Excel spreadsheet or Hadoop table at hand, so they’re not limited to the data they can reasonably carry in their mind’s eye.

To cite a recent joint research paper from Microsoft Research and MIT, “Big Data is notable not because of its size, but because of its relationality to other data.  Due to efforts to mine and aggregate data, Big Data is fundamentally networked.  Its value comes from the patterns that can be derived by making connections between pieces of data, about an individual, about individuals in relation to others, about groups of people, or simply about the structure of information itself.” That reads like a subset of core requirements for IC analysis, whether social or military, tactical or strategic.

The synergy of human and machine for knowledge work is much like modern agricultural advances – why would a farmer today want to trudge behind an ox-pulled plow? There’s no zero-sum choice to be made between technology and analysts, and the relationship between CIOs and managers of analysts needs to be nurtured, not cleaved apart.

What’s the return for big-data spending? Outside the IC, I challenge humanities researchers to go a day without a search engine. The IC record’s just as clear. ISR, targeting and warning are better because of big data; data-enabled machine translation of foreign sources opens the world; correlation of anomalies amid large-scale financial data pinpoint otherwise unseen hands behind global events. Why, in retrospect, the Iraq WMD conclusion was a result of remarkably-small-data manipulation.

Humans will never lose their edge in analyses requiring creativity, smart hunches, and understanding of unique individuals or groups. If that’s all we need to understand the 21st century, then put down your smartphone. But as long as humans learn by observation, and by counting or categorizing those observations, I say crank the machines for all their robotic worth.

Make sure to read both sides, and feel free to argue your own perspective in a comment on the SIGNAL site.

2012 Year in Review for Microsoft Research

The year draws to a close… and while the banality and divisiveness of politics and government has been on full display around the world during the past twelve months, the past year has been rewarding for me personally when I can retreat into the world of research. Fortunately there’s a great deal of it going on among my colleagues.

2012 has been a great year for Microsoft Research, and I thought I’d link you to a quick set of year-in-review summaries of some of the exciting work that’s been performed and the advances made:

Microsoft Research 2012 Year in Review

The work ranges from our Silicon Valley lab work in “erasure code” to social-media research at the New England lab in Cambridge, MA; from “transcending the architecture of quantum computers” at our Station Q in Santa Barbara, to work on cloud data systems and analytics by the eXtreme Computing Group (XCG) in Redmond itself.

Across global boundaries we have seen “work towards a formal proof of the Feit-Thompson Theorem” at Microsoft Research Cambridge (UK), and improvements for Bing search in Arab countries made at our Advanced Technology Labs in Cairo, Egypt.

All in all, an impressive array of research advance, benefiting from an increasing amount of collaboration with academic and other researchers as well. The record is one more fitting tribute to our just-departing Chief Research and Strategy Officer Craig Mundie, who is turning over his reins including MSR oversight to Eric Rudder (see his bio here), while Craig focuses for the next two years on special work reporting to CEO Steve Ballmer. Eric’s a great guy and a savvy technologist, and has been a supporter of our Microsoft Institute’s work as well … I did say he’s savvy 🙂

There’s a lot of hard work already going on in projects that should pay off in 2013, and the New Year promises to be a great one for technologists and scientists everywhere – with the possible exception of any remaining Mayan-apocalypse/ancient-alien-astronaut-theorists. But even to them, and perhaps most poignantly to them, I say Happy New Year!

Tearing the Roof off a 2-Terabyte House

I was home last night playing with the new Kinect, integrating it with Twitter, Facebook, and Zune. Particularly because of the last service, I was glad that I got the Xbox 360 model with the 250-gigabyte (gb) hard disk drive. It holds a lot more music, or photos, and of course primarily games and game data.

So we wind up with goofy scenes like my wife zooming along yesterday in Kinect Adventures’ River Rush – not only my photo (right) but in-game photos taken by the Kinect Sensor, sitting there below the TV monitor.

Later as I was waving my hands at the TV screen, swiping magically through the air to sweep through Zune’s albums and songs as if pawing through a shelf of actual LP’s, I absent-mindedly started totting up the data-storage capacity of devices and drives in my household.  Here’s a rough accounting:

  • One Zune music-player, 120gb;
  • 2 old iPods 30gb + 80gb;
  • an iPad 3G at 16gb;
  • one HP netbook 160gb;
  • an aging iMac G5 with 160gb;
  • three Windows laptops of 60gb, 150gb, and 250gb;
  • a DirecTV DVR with a 360gb disk;
  • a single Seagate 750gb external HDD;
  • a few 1gb, 2gb, and a single 32gb SD cards for cameras;
  • a handful of 2gb, 4gb, and one 16gb USB flash drives;
  • and most recently a 250gb Xbox 360, for Kinect. 

All told, I’d estimate that my household data storage capacity totals 2.5 terabytes. A terabyte, you’ll recall, is 1012 bytes, or 1,000,000,000,000 (1 trillion) bytes, or alternately a thousand gigabytes.

Continue reading

Mix, Rip, Burn Your Research

You’ve done research; you’ve collected and sifted through mounds of links, papers, articles, notes and raw data. Shouldn’t there be a way to manage all that material that’s as easy and intuitive as, say, iTunes or Zune – helping you manage and share your snippets and research the way you share and enjoy your music?

Continue reading

Four Score and Seven Years Ago

Today, August 5, has a number of interesting anniversaries in the world of technology and government. In 1858 the first transatlantic telegraph cable was completed, allowing President James Buchanan and Queen Victoria to share congratulatory messages the following week. (Unfortunately within a month the cable had broken down for good.)  The first quasar (“quasi-stellar astronomical radio object”) was discovered on Aug. 5, 1962. And exactly one year later the Nuclear Test Ban Treaty was signed on August 5, 1963, between the U.S., U.S.S.R., and Great Britain.

But one important date I’d like to commemorate was a bit different: eighty-seven years ago today, on August 5, 1923, my father was born, in Greensboro, North Carolina. Happy Birthday, Dad!

There’s a shorthand way of telling my father’s life-history which fits with the theme of technological advance: he graduated from college (his beloved N.C. State) as an early recipient of a B.S. degree in Mechanical Engineering; he worked for decades for a growing company interested in adopting new technologies to drive its business; and he capped his career as Corporate Vice President for Research and Development at a Fortune 300 company.

But that misses the fun he had along the way, and the close-up view he had of innovation. He was an early adopter, even before college. (I like to think I get that from him.)  So I thought I’d illustrate a couple of vignettes I’ve heard over the years of his interaction with computers along the way, simply to portray the thrust of radical change that has paced along during the course of one man’s life.

Continue reading

Bing vs Google, the quiet semantic war

On Wednesday night I had dinner at a burger joint with four old friends; two work in the intelligence community today on top-secret programs, and two others are technologists in the private sector who have done IC work for years. The five of us share a particular interest besides good burgers: semantic technology.

Oh, we talked about mobile phones (iPhones were whipped out as was my Windows Phone, and apps debated) and cloud storage (they were stunned that Microsoft gives 25 gigabytes of free cloud storage with free Skydrive accounts, compared to the puny 2 gig they’d been using on DropBox).

But we kept returning to semantic web discussions, semantic approaches, semantic software. One of these guys goes back to the DAML days of DARPA fame, the guys on the government side are using semantic software operationally, and we all are firm believers in Our Glorious Semantic Future.

Continue reading

A Technical Computing revolution

Last week I enjoyed hosting a visit in Redmond from Chris Kemp, NASA’s new Chief Technology Officer for information technology. Our discussions were with folks from the Windows Azure cloud computing team, the high-performance computing and large-data folks, and our Extreme Computing Group. I smiled when Chris said he was a fan of the book Total Recall: How the E-Memory Revolution Will Change Everything, written by Microsoft’s Gordon Bell and colleague Jim Gemmell. (I wrote about their research projects in an earlier post, Total Recall for Public Servants.)

Continue reading

Using the body in new virtual ways

This is CHI 2010 week, the Association for Computing Machinery’s Conference on Human Factors in Computing Systems in Atlanta. Top researchers in human-computer-interaction (HCI) are together April 10-15 for presentations, panels, exhibits, and discussions. Partly because of our intense interest in using new levels of computational power to develop great new Natural User Interfaces (NUI), Microsoft Research is well represented at CHI 2010 as pointed out in an MSR note on the conference:

This year, 38 technical papers submitted by Microsoft Research were accepted by the conference, representing 10 percent of the papers accepted. Three of the Microsoft Research papers, covering vastly different topics, won Best Paper awards, and seven others received Best Paper nominations.

Continue reading

Total Recall for Public Servants

MyLifeBits is a Microsoft Research project led by the legendary Gordon Bell, designed to put “all of his atom- and electron-based bits in his local Cyberspace….MyLifeBits includes everything he has accumulated, written, photographed, presented, and owns (e.g. CDs).” 

SenseCam - Click to enlarge

Among other technical means, Bell uses the SenseCam, a remarkable prototype from Microsoft Research.  It’s a nifty little wearable device that combines high-capacity memory, a fisheye lens passively capturing 3,000 images a day, along with an infrared sensor, temperature sensor, light sensor, accelerometer, and USB interface. My group has played with SenseCam a bit, and shared it with quite a few interested government parties and partners. More info on SenseCam here, and more on its parent Sensors and Devices Group in MSR.  

Continue reading

Inside Cyber Warfare

One year ago, the buzz across the government/technology nexus was focused on a pair of political guessing games. Neophytes mostly engaged in debating over whom the newly-elected President would name to be the nation’s first Chief Technology Officer. Grizzled Pentagon veterans and the more sober Silicon Valley types wondered instead who would get the nod as President Obama’s “Cyber Czar.”

Continue reading

%d bloggers like this: