Building the Next Virtual Machine

I have a great new job, allowing me to spend several weeks recently in the center of the universe, and I’m loving it. I’m going to spend even more time there from now on.

By that I mean Palo Alto, Silicon Valley’s capital and VMware HQ, where I am now Senior Director, National Technology Strategy, working primarily with the R&D team. But I can’t help putting that “Valley capital” term in a bit of historical context. Back in ancient times (late ’80s-early ’90s) when I worked for the Mayor of San Jose, S. J. City Hall was dealing with a bit of civic insecurity. Although San Jose’s population was already larger than San Francisco and now the tenth largest city in the country, our mayor (my boss Tom McEnery, the first government leader ever elected to the Silicon Valley Business Hall of Fame) believed that we needed to brand the city explicitly as “The Capital of Silicon Valley.” So that became a multi-million-dollar marketing campaign, and we punched the message home every chance we got.

Yet as the mayor’s policy adviser and speechwriter, I laughed each time I used the phrase. I had just moved to San Jose from Palo Alto, where I got a graduate degree at Stanford. Just twenty miles up Highway 101, Palo Alto had much better claim to being the center of the geographically hazy electronics domain. I knew the arguments we used in San Jose (see here for example). But I also had already met Bill Hewlett and Dave Packard in person in Palo Alto, and haHPGarage.JPGd walked many times on the sidewalk by the legendary garage at 367 Addison where HP was born in the late 1930s; and I had also seen a different historic marker four blocks from the garage, at the corner of Channing and Emerson, commemorating Palo Alto’s very first electronics startup – Federal Telegraph Company, founded in 1909.

Palo Alto itself has spawned thousands of startups for many many decades, and it never stopped. Fast forward to the turn of the millenium just 20 years ago, when Microsoft and Amazon were trying to shift attention to Seattle/Redmond, Palo Alto struck back and fostered yet another legendary Valley startup: VMware – now my new home. Here’s the origin context for VMware, from an official history of Stanford Research Park:

It can be said that one of the cornerstones of Silicon Valley was laid when Varian Associates broke ground as Stanford Research Park’s first company in 1951. The Stanford Industrial Park, as it was first called, was the brainchild of Stanford University’s Provost and Dean of Engineering, Frederick Terman, who saw the potential of a University-affiliated business park that focused on research and development and generated income for the University and community.

Dean Terman envisioned a new kind of collaboration, where Stanford University could join forces with industry and the City of Palo Alto to advance shared interests. He saw the Park’s potential to serve as a beacon for new, high-quality scientists and faculty, provide jobs for University graduates, and stimulate regional economic development.

In the 1950s, leaders within the City of Palo Alto and Stanford University forged a seminal partnership by creating Stanford Research Park, agreeing to annex SRP lands into the City of Palo Alto to generate significant tax revenues for the County, City, and Palo Alto Unified School District.

Throughout our history, an incredible number of breakthroughs have occurred in Stanford Research Park. Here, Varian developed the microwave tube, forming the basis for satellite technology and particle accelerators. Its spin-off, Varian Medical, developed radiation oncology treatments, medical devices and software for medical diagnostics. Steve Jobs founded NeXT Computer, breaking ground for the next generation of graphics and audio capabilities in personal computing. Hewlett-Packard developed electronic measuring instruments, leading to medical electronic equipment, instrumentation for chemical analysis, the mainframe computer, laser printers and hand-held calculators. At Xerox’s Palo Alto Research Center (PARC), innovations such as personal work stations, Ethernet cabling and the personal computer mouse were invented. Lockheed’s space and missile division developed critical components for the International Space Station. Mark Zuckerberg grew Facebook’s social networking platform from 20 million to 750 million people worldwide while its headquarters were in the Park.

Today, Tesla’s electric vehicle and battery prototypes are developed and assembled here in its headquarters. Our largest tenant, VMware, continues to create the virtualization hardware and software solutions they pioneered, leading the world in cloud computing. With over 150 companies in 10 million square feet and 140 buildings, Stanford Research Park maintains a world-class reputation.

source: Stanford Research Park, “About Us”

In the summer of 2017, I got an email from a former Microsoft research colleague and one of the most eminent leaders in American technology R&D, David Tennenhouse. David has held key leadership roles in dream positions over the past quarter-century – everyone has wanted him on their team. He was Chief Scientist at DARPA; a research professor at MIT; President of Amazon’s R&D arm A9; VP & Director of Research at Intel; a senior leader in Microsoft’s Advanced Strategy and Research division. Smart companies have wooed him in serial fashion. Now David is VMware Chief Research Officer building and leading a stellar team, and over several months into 2018 we had some great conversations about where VMware had been and was going, and what I could bring to that journey. I had a chance to speak with several of the dozens of Ph.D.s he has been hiring to flesh out a comprehensive R&D agenda. I excitedly joined recently and we’ve been off to the races.

For a 20-year-old startup, the company’s growing like gangbusters (the stock market obviously still loves it), and it ranks high every year on lists of Best Employers. But what really attracted me was the stress on R&D and innovation culture, driving an unbelievably ambitious vision. I had always been impressed by VMware’s early virtualization technology; at DIA we were pioneering federal customers fifteen years ago, and wound up using it as a foundation of what would become our private cloud infrastructure. But VMware scientists and research engineers took virtualization much further, with abstraction becoming almost addictively popular. After the server and the OS were virtualized, so was storage, and then networks, and then the data center itself. Now our research agenda is energetically broad, across the following areas:

VMwareResearchAreas

In fact, any large complex orchestration of resources, hardware, and processes may actually be just the next big virtual machine. We intend to build it, with disruptively great software. In 2011, web pioneer and Netscape cofounder Marc Andreesen wrote a famous manifesto in the Wall Street Journal, “Why Software is Eating the World”:

“More and more major businesses and industries are being run on software and delivered as online services—from movies to agriculture to national defense. Many of the winners are Silicon Valley-style entrepreneurial technology companies that are invading and overturning established industry structures. Over the next 10 years, I expect many more industries to be disrupted by software, with new world-beating Silicon Valley companies doing the disruption in more cases than not.”

That’s why I smiled last month, just after joining VMware, when our CEO Pat Gelsinger rebuffed talk of him moving to Intel as that company’s new CTO. He began his career GelsingerTweetat Intel, was its first-ever CTO and the father of the fabled -486 processor. But today he’s virtualizing the world’s computational resources, and Pat tweeted his response to a CNBC anchor’s comments about the Intel CEO job: “I love being CEO of VMware and not going anywhere else. The future is software!”

I still intend to live in Virginia and work closely with DC government friends and colleagues on research, reflecting the Valley’s traditionally close working  partnership with the federal government. In fact, if you’re in a government position and are wondering “What’s going on inside VMware Research labs?” – drop me a line 🙂

VRG.JPG

Problem Number One, Watching for Superintelligence

Two years ago, the AFCEA Intelligence Committee (I’m a member) invited Elon Musk for a special off-the-record session at our annual classified Spring Intelligence Symposium. The Committee assigned me the task of conducting a wide-ranging on-stage conversation with him, going through a variety of topics, but we spent much of our time on artificial intelligence (AI) – and particularly artificial general intelligence (AGI, or “superintelligence”).

I mention that the session was off-the-record. In my own post back in 2015 about the session, I didn’t NGA Photo: Lewis Shepherd, Elon Musk 2015characterize Elon’s side of the conversation or his answers to my questions – but for flavor I did include the text of one particular question on AI which I posed to him. I thought it was the most important question I asked…

(Our audience that day: the 600 attendees included a top-heavy representation of the Intelligence Community’s leadership, its foremost scientists and technologists, and executives from the nation’s defense and national-security private-sector partners.)

Here’s that one particular AI question I asked, quoted from my blogpost of 7/28/2015:

“AI thinkers like Vernor Vinge talk about the likelihood of a “Soft takeoff” of superhuman intelligence, when we might not even notice and would simply be adapting along; vs a Hard takeoff, which would be a much more dramatic explosion – akin to the introduction of Humans into the animal kingdom. Arguably, watching for indicators of that type of takeoff (soft or especially hard) should be in the job-jar of the Intelligence Community. Your thoughts?”

Months after that AFCEA session, in December 2015 Elon worked with Greg Brockman, Sam Altman, Peter Thiel and several others to establish and fund OpenAI, “a non-profit AI research company, discovering and enacting the path to safe artificial general intelligence (AGI).” OpenAI says it has a full-time staff of 60 researchers and engineers, working “to build safe AGI, and ensure AGI’s benefits are as widely and evenly distributed as possible.”

Fast-forward to today. Over the weekend I was reading through a variety of AI research and sources, keeping SpecialProjectscurrent in general for some of my ongoing consulting work for Deloitte’s Mission Analytics group. I noticed something interesting on the OpenAI website, specifically on a page it posted several months ago labelled Special Projects.”

There are four such projects listed, described as “problems which are not just interesting, but whose solutions matter.” Interested researchers are invited to apply for a position at OpenAI to work on the problem – and they’re all interesting, and could lead to consequential work.

But the first Special Project problem caught my eye, because of my question to Musk the year before:

  1. Detect if someone is using a covert breakthrough AI system in the world. As the number of organizations and resources allocated to AI research increases, the probability increases that an organization will make an undisclosed AI breakthrough and use the system for potentially malicious ends. It seems important to detect this. We can imagine a lot of ways to do this — looking at the news, financial markets, online games, etc.”

That reads to me like a classic “Indications & Warning” problem statement from the “other” non-AI world of intelligence.

I&W (in the parlance of the business) is a process used by defense intelligence and the IC to detect indicators of potential threats while sufficient time still exists to counter those efforts. The doctrine of seeking advantage through warning is as old as the art of war; Sun Tzu called it “foreknowledge.” There are many I&W examples from the Cold War, from the overall analytic challenge (see a classic thesis  Anticipating Surprise“), and from specific domain challenge (see for example this 1978 CIA study, Top Secret but since declassified, on “Indications and Warning of Soviet Intentions to Use Chemical Weapons during a NATO-Warsaw Pact War“).

The I&W concept has sequentially been transferred to new domains of intelligence like Space/Counter-Space (see the 2013 DoD “Joint Publication on Space Operations Doctrine,” which describes the “unique characteristics” of the space environment for conducting I&W, whether from orbit or in other forms), and of course since 9/11 the I&W approach has been applied intensely in counter-terrorist realms in defense and homeland security.

It’s obvious Elon Musk and his OpenAI cohort believe that superintelligence is a problem worth watching. Elon’s newest company, the brain-machine-interface startup Neuralink, sets its core motivation as avoiding a future in which AGI outpaces simple human intelligence. So I’m staying abreast of indications of AGI progress.

For the AGI domain I am tracking many sources through citations and published research (see OpenAI’s interesting list here), and watching for any mention of I&W monitoring attempts or results by others which meet the challenge of what OpenAI cites as solving “Problem #1.” So far, nothing of note.

But I’ll keep a look out, so to speak.

 

 

Docere et Facere, To Teach and To Do

“Helping aspiring data scientists forge their own career paths, more universities are offering programs in data science or analytics.” – Wall Street Journal, March 13, 2017

George Bernard Shaw’s play Man and Superman provides the maxim, “He who can, does. He who cannot, teaches.” Most of us know this as “Those who can’t do, teach.” (And Woody Allen added a punch line in Annie Hall: “… and those who can’t teach, teach gym.”)

I’m determined both to do and to teach, because I enjoy each of them. When it comes to data and advanced analytics, something I’ve been using or abusing my entire career, I’m excited about expanding what I’m doing. So below I’m highlighting two cool opportunities I’m engaging in now…

 

Teaching Big Data Architectures and Analytics in the IC

I’ve just been asked by the government to teach again a popular graduate course I’ve been doing for several years, “Analytics: Big Data to Information.” It’s a unique course, taught on-site for professionals in the U.S. intelligence community, and accredited by George Mason University within GMU’s Volgenau Graduate School of Engineering. My course is the intro Big Data course for IC professionals earning a master’s or Ph.D. from GMU’s Department of Information Sciences and Technology, as part of the specialized Directorate for Intelligence Community Programs.

I enjoy teaching enormously, not having done it since grad school at Stanford a million years ago (ok, the ’80s). The students in the program are hard-working data scientists, technologists, analysts, and program managers from a variety of disciplines within the IC, and they bring their A-game to the classroom. I can’t share the full syllabus, but here’s a summary:

This course is taught as a graduate-level discussion/lecture seminar, with a Term Paper and end-of-term Presentation as assignments. Course provides an overview of Big Data and its use in commercial, scientific, governmental and other applications. Topics include technical and non-technical disciplines required to collect, process and use enormous amounts of data available from numerous sources. Lectures cover system acquisition, law and policy, and ethical issues. It includes discussions of technologies involved in collecting, mining, analyzing and using results, with emphasis on US Government environments.

I worry that mentioning this fall’s class now might gin up too much interest (last year I was told the waiting list had 30+ students who wanted to get in but couldn’t, and I don’t want to expand beyond a reasonable number), but when I agreed this week to offer the course again I immediately began thinking about the changes in the syllabus I may make. And I solicit your input in the comments below (or by email).

math-1500720_960_720.jpgFor the 2016 fall semester, I had to make many changes to keep up with technological advance, particularly in AI. I revamped and expanded the “Machine Learning Revolution” section, and beefed up the segments on algorithmic analytics and artificial intelligence, just to keep pace with advances in the commercial and academic research worlds. Several of the insights I used came from my onstage AI discussion with Elon Musk in 2015, and his subsequent support for the OpenAI initiative.

More importantly I provided my students (can’t really call them “kids” as they’re mid-career intelligence officials!) with tools and techniques for them to keep abreast of advances outside the walls of government – or those within the walls of non-U.S. government agencies overseas. So I’m going to have to do some work again this year, to keep the course au courant, and your insight is welcome.

But as noted at the beginning, I don’t want to just teach gym – I want to be athletic. So my second pursuit is news on the work front.

 

Joining an elite Mission Analytics practice

I’m announcing what I like to think of as the successful merger of two leading consultancies: my own solo gig and Deloitte Consulting. And I’m even happy Deloitte won the coin-toss to keep its name in our merger 🙂

For the past couple of years I have been a solo consultant and I’ve enjoyed working with some tremendous clients, including government leaders, established tech firms, and great young companies like SpaceX and LGS Innovations (which traces its lineage to the legendary Bell Labs).

But working solo has its limitations, chiefly in implementation of great ideas. Diagnosing a problem and giving advice to an organization’s leadership is one thing – pulling together a team of experts to execute a solution is entirely different. I missed the camaraderie of colleagues, and the “mass-behind-the-arrowhead” effect to force positive change.

When I left Microsoft, the first phone call I got was from an old intelligence colleague, Scott Large – the former Director of NRO who had recently joined Deloitte, the world’s leading consulting and professional services firm. Scott invited me over to talk. It took a couple of years for that conversation to culminate, but I decided recently to accept Deloitte’s irresistible offer to join its Mission Analytics practice, working with a new and really elite team of experts who understand advanced technologies, are developing new ones, and are committed to making a difference for government and the citizens it serves.

Our group is already working on some impressively disruptive solutions using massive-scale data, AI, and immersive VR/AR… it’s wild. And since I know pretty much all the companies working in these spaces, I decided to go with the broadest, deepest, and smartest team, with the opportunity for highest impact.

Who could turn down the chance to teach, and to do?

 

Meet the Future-Makers

Question: Why did Elon Musk just change his Twitter profile photo? I notice he’s now seeming to evoke James Bond or Dr. Evil:

twitter photos, Elon v Elon

I’m not certain, but I think I know the answer why. Read on… Continue reading

Young Americans and the Intelligence Community

IC CAE conferenceA few days ago I travelled down to Orlando – just escaping the last days of the DC winter. I was invited to participate in a conference hosted by the Intelligence Community’s Center of Academic Excellence (IC CAE) at the University of Central Florida.  The title of my speech was “The Internet, 2015-2025: Business and Policy Challenges for the Private Sector.” But I actually learned as much as I taught, maybe more. Continue reading

Debating Big Data for Intelligence

I’m always afraid of engaging in a “battle of wits” only half-armed.  So I usually choose my debate opponents judiciously.

Unfortunately, I recently had a contest thrust upon me with a superior foe: my friend Mark Lowenthal, Ph.D. from Harvard, an intelligence community graybeard (literally!) and former Assistant Director of Central Intelligence (ADCI) for Analysis and Production, Vice Chairman of the National Intelligence Council – and as if that weren’t enough, a past national Jeopardy! “Tournament of Champions” winner.

As we both sit on the AFCEA Intelligence Committee and have also collaborated on a few small projects, Mark and I have had occasion to explore one another’s biases and beliefs about the role of technology in the business of intelligence. We’ve had several voluble but collegial debates about that topic, in long-winded email threads and over grubby lunches. Now, the debate has spilled onto the pages of SIGNAL Magazine, which serves as something of a house journal for the defense and intelligence extended communities.

SIGNAL Editor Bob Ackerman suggested a “Point/Counterpoint” short debate on the topic: “Is Big Data the Way Ahead for Intelligence?” Our pieces are side-by-side in the new October issue, and are available here on the magazine’s site.

Mark did an excellent job of marshalling the skeptic’s view on Big Data, under the not-so-equivocal title, Another Overhyped Fad.”  Below you will find an early draft of my own piece, an edited version of which is published under the title A Longtime Tool of the Community”:

Visit the National Cryptologic Museum in Ft. Meade, Maryland, and you’ll see three large-machine displays, labeled HARVEST and TRACTOR, TELLMAN and RISSMAN, and the mighty Cray XMP-24. They’re credited with helping win the Cold War, from the 1950s through the end of the 1980s. In fact, they are pioneering big-data computers.

Here’s a secret: the Intelligence Community has necessarily been a pioneer in “big data” since inception – both our modern IC and the science of big data were conceived during the decade after the Second World War. The IC and big-data science have always intertwined because of their shared goal: producing and refining information describing the world around us, for important and utilitarian purposes

What do modern intelligence agencies run on? They are internal combustion engines burning pipelines of data, and the more fuel they burn the better their mileage. Analysts and decisionmakers are the drivers of these vast engines, but to keep them from hoofing it, we need big data.

Let’s stipulate that today’s big-data mantra is overhyped. Too many technology vendors are busily rebranding storage or analytics as “big data systems” under the gun from their marketing departments. That caricature is, rightly, derided by both IT cognoscenti and non-techie analysts.

I personally get the disdain for machines, as I had the archetypal humanities background and was once a leather-elbow-patched tweed-jacketed Kremlinologist, reading newspapers and HUMINT for my data. I stared into space a lot, pondering the Chernenko-Gorbachev transition. Yet as Silicon Valley’s information revolution transformed modern business, media, and social behavior across the globe, I learned to keep up – and so has the IC. 

Twitter may be new, but the IC is no Johnny-come-lately in big data on foreign targets.  US Government funding of computing research in the 1940s and ‘50s stretched from World War II’s radar/countermeasures battles to the elemental ELINT and SIGINT research at Stanford and MIT, leading to the U-2 and OXCART (ELINT/IMINT platforms) and the Sunnyvale roots of NRO.

In all this effort to analyze massive observational traces and electronic signatures, big data was the goal and the bounty.

War planning and peacetime collection were built on collection of ever-more-massive amounts of foreign data from technical platforms – telling the US what the Soviets could and couldn’t do, and therefore where we should and shouldn’t fly, or aim, or collect. And all along, the development of analog and then digital computers to answer those questions, from Vannevar Bush through George Bush, was fortified by massive government investment in big-data technology for military and intelligence applications.

In today’s parlance big data typically encompasses just three linked computerized tasks: storing collected foreign data (think Amazon’s cloud), finding and retrieving relevant foreign data (Bing or Google), and analyzing connections or patterns among the relevant foreign data (powerful web-analytic tools).

Word Cloud Big Data for IntelligenceThose three Ft. Meade museum displays demonstrate how NSA and the IC pioneered those “modern” big data tasks.  Storage is represented by TELLMAN/RISSMAN, running from the 1960’s throughout the Cold War using innovation from Intel. Search/retrieval were the hallmark of HARVEST/TRACTOR, built by IBM and StorageTek in the late 1950s. Repetitive what-if analytic runs boomed in 1983 when Cray delivered a supercomputer to a customer site for the first time ever.

The benefit of IC early adoption of big data wasn’t only to cryptology – although decrypting enemy secrets would be impossible without it. More broadly, computational big-data horsepower was in use constantly during the Cold War and after, producing intelligence that guided US defense policy and treaty negotiations or verification. Individual analysts formulated requirements for tasked big-data collection with the same intent as when they tasked HUMINT collection: to fill gaps in our knowledge of hidden or emerging patterns of adversary activities.

That’s the sense-making pattern that leads from data to information, to intelligence and knowledge. Humans are good at it, one by one. Murray Feshbach, a little-known Census Bureau demographic researcher, made astonishing contributions to the IC’s understanding of the crumbling Soviet economy and its sociopolitical implications by studying reams of infant-mortality statistics, and noticing patterns of missing data. Humans can provide that insight, brilliantly, but at the speed of hand-eye coordination.

Machines make a passable rote attempt, but at blistering speed, and they don’t balk at repetitive mindnumbing data volume. Amid the data, patterns emerge. Today’s Feshbachs want an Excel spreadsheet or Hadoop table at hand, so they’re not limited to the data they can reasonably carry in their mind’s eye.

To cite a recent joint research paper from Microsoft Research and MIT, “Big Data is notable not because of its size, but because of its relationality to other data.  Due to efforts to mine and aggregate data, Big Data is fundamentally networked.  Its value comes from the patterns that can be derived by making connections between pieces of data, about an individual, about individuals in relation to others, about groups of people, or simply about the structure of information itself.” That reads like a subset of core requirements for IC analysis, whether social or military, tactical or strategic.

The synergy of human and machine for knowledge work is much like modern agricultural advances – why would a farmer today want to trudge behind an ox-pulled plow? There’s no zero-sum choice to be made between technology and analysts, and the relationship between CIOs and managers of analysts needs to be nurtured, not cleaved apart.

What’s the return for big-data spending? Outside the IC, I challenge humanities researchers to go a day without a search engine. The IC record’s just as clear. ISR, targeting and warning are better because of big data; data-enabled machine translation of foreign sources opens the world; correlation of anomalies amid large-scale financial data pinpoint otherwise unseen hands behind global events. Why, in retrospect, the Iraq WMD conclusion was a result of remarkably-small-data manipulation.

Humans will never lose their edge in analyses requiring creativity, smart hunches, and understanding of unique individuals or groups. If that’s all we need to understand the 21st century, then put down your smartphone. But as long as humans learn by observation, and by counting or categorizing those observations, I say crank the machines for all their robotic worth.

Make sure to read both sides, and feel free to argue your own perspective in a comment on the SIGNAL site.

Bullshit Detector Prototype Goes Live

I like writing about cool applications of technology that are so pregnant with the promise of the future, that they have to be seen to be believed, and here’s another one that’s almost ready for prime time.

TruthTeller PrototypeThe Washington Post today launched an exciting new technology prototype invoking powerful new technologies for journalism and democratic accountability in politics and government. As you can see from the screenshot (left), it runs an automated fact-checking algorithm against the streaming video of politicians or other talking heads and displays in real time a “True” or “False” label as they’re speaking.

Called “Truth Teller,” the system uses technologies from Microsoft Research and Windows Azure cloud-computing services (I have included some of the technical details below).

But first, a digression on motivation. Back in the late 1970s I was living in Europe and was very taken with punk rock. Among my favorite bands were the UK’s anarcho-punk collective Crass, and in 1980 I bought their compilation LP “Bullshit Detector,” whose title certainly appealed to me because of my equally avid interest in politics 🙂

Today, my driving interests are in the use of novel or increasingly powerful technologies for the public good, by government agencies or in the effort to improve the performance of government functions. Because of my Jeffersonian tendencies (I did after all take a degree in Government at Mr. Jefferson’s University of Virginia), I am even more interested in improving government accountability and popular control over the political process itself, and I’ve written or spoken often about the “Government 2.0” movement.

In an interview with GovFresh several years ago, I was asked: “What’s the killer app that will make Gov 2.0 the norm instead of the exception?”

My answer then looked to systems that might “maintain the representative aspect (the elected official, exercising his or her judgment) while incorporating real-time, structured, unfiltered but managed visualizations of popular opinion and advice… I’m also a big proponent of semantic computing – called Web 3.0 by some – and that should lead the worlds of crowdsourcing, prediction markets, and open government data movements to unfold in dramatic, previously unexpected ways. We’re working on cool stuff like that.”

The Truth Teller prototype is an attempt to construct a rudimentary automated “Political Bullshit Detector, and addresses each of those factors I mentioned in GovFresh – recognizing the importance of political leadership and its public communication, incorporating iterative aspects of public opinion and crowd wisdom, all while imbuing automated systems with semantic sense-making technology to operate at the speed of today’s real world.

Real-time politics? Real-time truth detection.  Or at least that’s the goal; this is just a budding prototype, built in three months.

Cory Haik, who is the Post’s Executive Producer for Digital News, says it “aims to fact-check speeches in as close to real time as possible” in speeches, TV ads, or interviews. Here’s how it works:

The Truth Teller prototype was built and runs with a combination of several technologies — some new, some very familiar. We’ve combined video and audio extraction with a speech-to-text technology to search a database of facts and fact checks. We are effectively taking in video, converting the audio to text (the rough transcript below the video), matching that text to our database, and then displaying, in real time, what’s true and what’s false.

We are transcribing videos using Microsoft Audio Video indexing service (MAVIS) technology. MAVIS is a Windows Azure application which uses State of the Art of Deep Neural Net (DNN) based speech recognition technology to convert audio signals into words. Using this service, we are extracting audio from videos and saving the information in our Lucene search index as a transcript. We are then looking for the facts in the transcription. Finding distinct phrases to match is difficult. That’s why we are focusing on patterns instead.

We are using approximate string matching or a fuzzy string searching algorithm. We are implementing a modified version Rabin-Karp using Levenshtein distance algorithm as our first implementation. This will be modified to recognize paraphrasing, negative connotations in the future.

What you see in the prototype is actual live fact checking — each time the video is played the fact checking starts anew.

 – Washington Post, “Debuting Truth Teller

The prototype was built with funding from a Knight Foundation’s Prototype Fund grant, and you can read more about the motivation and future plans over on the Knight Blog, and you can read TechCrunch discussing some of the political ramifications of the prototype based on the fact-checking movement in recent campaigns.

Even better, you can actually give Truth Teller a try here, in its infancy.

What other uses could be made of semantic “truth detection” or fact-checking, in other aspects of the relationship between the government and the governed?

Could the justice system use something like Truth Teller, or will human judges and  juries always have a preeminent role in determining the veracity of testimony? Will police officers and detectives be able to use cloud-based mobile services like Truth Teller in real time during criminal investigations as they’re evaluating witness accounts? Should the Intelligence Community be running intercepts of foreign terrorist suspects’ communications through a massive look-up system like Truth Teller?

Perhaps, and time will tell how valuable – or error-prone – these systems can be. But in the next couple of years we will be developing (and be able to assess the adoption of) increasingly powerful semantic systems against big-data collections, using faster and faster cloud-based computing architectures.

In the meantime, watch for further refinements and innovation from The Washington Post’s prototyping efforts; after all, we just had a big national U.S.  election but congressional elections in 2014 and the presidential race in 2016 are just around the corner. Like my fellow citizens, I will be grateful for any help in keeping candidates accountable to something resembling “the truth.”

%d bloggers like this: