Bullshit Detector Prototype Goes Live

I like writing about cool applications of technology that are so pregnant with the promise of the future, that they have to be seen to be believed, and here’s another one that’s almost ready for prime time.

TruthTeller PrototypeThe Washington Post today launched an exciting new technology prototype invoking powerful new technologies for journalism and democratic accountability in politics and government. As you can see from the screenshot (left), it runs an automated fact-checking algorithm against the streaming video of politicians or other talking heads and displays in real time a “True” or “False” label as they’re speaking.

Called “Truth Teller,” the system uses technologies from Microsoft Research and Windows Azure cloud-computing services (I have included some of the technical details below).

But first, a digression on motivation. Back in the late 1970s I was living in Europe and was very taken with punk rock. Among my favorite bands were the UK’s anarcho-punk collective Crass, and in 1980 I bought their compilation LP “Bullshit Detector,” whose title certainly appealed to me because of my equally avid interest in politics :)

Today, my driving interests are in the use of novel or increasingly powerful technologies for the public good, by government agencies or in the effort to improve the performance of government functions. Because of my Jeffersonian tendencies (I did after all take a degree in Government at Mr. Jefferson’s University of Virginia), I am even more interested in improving government accountability and popular control over the political process itself, and I’ve written or spoken often about the “Government 2.0″ movement.

In an interview with GovFresh several years ago, I was asked: “What’s the killer app that will make Gov 2.0 the norm instead of the exception?”

My answer then looked to systems that might “maintain the representative aspect (the elected official, exercising his or her judgment) while incorporating real-time, structured, unfiltered but managed visualizations of popular opinion and advice… I’m also a big proponent of semantic computing – called Web 3.0 by some – and that should lead the worlds of crowdsourcing, prediction markets, and open government data movements to unfold in dramatic, previously unexpected ways. We’re working on cool stuff like that.”

The Truth Teller prototype is an attempt to construct a rudimentary automated “Political Bullshit Detector, and addresses each of those factors I mentioned in GovFresh – recognizing the importance of political leadership and its public communication, incorporating iterative aspects of public opinion and crowd wisdom, all while imbuing automated systems with semantic sense-making technology to operate at the speed of today’s real world.

Real-time politics? Real-time truth detection.  Or at least that’s the goal; this is just a budding prototype, built in three months.

Cory Haik, who is the Post’s Executive Producer for Digital News, says it “aims to fact-check speeches in as close to real time as possible” in speeches, TV ads, or interviews. Here’s how it works:

The Truth Teller prototype was built and runs with a combination of several technologies — some new, some very familiar. We’ve combined video and audio extraction with a speech-to-text technology to search a database of facts and fact checks. We are effectively taking in video, converting the audio to text (the rough transcript below the video), matching that text to our database, and then displaying, in real time, what’s true and what’s false.

We are transcribing videos using Microsoft Audio Video indexing service (MAVIS) technology. MAVIS is a Windows Azure application which uses State of the Art of Deep Neural Net (DNN) based speech recognition technology to convert audio signals into words. Using this service, we are extracting audio from videos and saving the information in our Lucene search index as a transcript. We are then looking for the facts in the transcription. Finding distinct phrases to match is difficult. That’s why we are focusing on patterns instead.

We are using approximate string matching or a fuzzy string searching algorithm. We are implementing a modified version Rabin-Karp using Levenshtein distance algorithm as our first implementation. This will be modified to recognize paraphrasing, negative connotations in the future.

What you see in the prototype is actual live fact checking — each time the video is played the fact checking starts anew.

 – Washington Post, “Debuting Truth Teller

The prototype was built with funding from a Knight Foundation’s Prototype Fund grant, and you can read more about the motivation and future plans over on the Knight Blog, and you can read TechCrunch discussing some of the political ramifications of the prototype based on the fact-checking movement in recent campaigns.

Even better, you can actually give Truth Teller a try here, in its infancy.

What other uses could be made of semantic “truth detection” or fact-checking, in other aspects of the relationship between the government and the governed?

Could the justice system use something like Truth Teller, or will human judges and  juries always have a preeminent role in determining the veracity of testimony? Will police officers and detectives be able to use cloud-based mobile services like Truth Teller in real time during criminal investigations as they’re evaluating witness accounts? Should the Intelligence Community be running intercepts of foreign terrorist suspects’ communications through a massive look-up system like Truth Teller?

Perhaps, and time will tell how valuable – or error-prone – these systems can be. But in the next couple of years we will be developing (and be able to assess the adoption of) increasingly powerful semantic systems against big-data collections, using faster and faster cloud-based computing architectures.

In the meantime, watch for further refinements and innovation from The Washington Post’s prototyping efforts; after all, we just had a big national U.S.  election but congressional elections in 2014 and the presidential race in 2016 are just around the corner. Like my fellow citizens, I will be grateful for any help in keeping candidates accountable to something resembling “the truth.”

Total Recall for Public Servants

MyLifeBits is a Microsoft Research project led by the legendary Gordon Bell, designed to put “all of his atom- and electron-based bits in his local Cyberspace….MyLifeBits includes everything he has accumulated, written, photographed, presented, and owns (e.g. CDs).” 

SenseCam - Click to enlarge

Among other technical means, Bell uses the SenseCam, a remarkable prototype from Microsoft Research.  It’s a nifty little wearable device that combines high-capacity memory, a fisheye lens passively capturing 3,000 images a day, along with an infrared sensor, temperature sensor, light sensor, accelerometer, and USB interface. My group has played with SenseCam a bit, and shared it with quite a few interested government parties and partners. More info on SenseCam here, and more on its parent Sensors and Devices Group in MSR.  

Continue reading

Inside Cyber Warfare

One year ago, the buzz across the government/technology nexus was focused on a pair of political guessing games. Neophytes mostly engaged in debating over whom the newly-elected President would name to be the nation’s first Chief Technology Officer. Grizzled Pentagon veterans and the more sober Silicon Valley types wondered instead who would get the nod as President Obama’s “Cyber Czar.”

Continue reading

Gunning the Microsoft Semantic Engine

New Bing Maps Beta with embedded data layers from Twitter and other social feeds, click to enlarge screenshot

There’s a lot of information on the Internet already. Every day, more is added – a lot more. And while there are a concomitant number of new analytic or sense-making tools on the web, they butt up against the fact that the data – the all-important data – is held in multiple places, formats, and platforms.

How are we going to deal with all this? One approach is almost mechanical: ensuring that datasets can be accessed commonly, as in our new Microsoft Dallas platform associated with the Windows Azure cloud platform.  In the government realm, the anticipated reliance on “government-as-a-platform” (a meme popularized by Tim O’Reilly) holds promise in allowing somewhat aggregated datasets, openly accessible.

Continue reading

43 Gigabytes of Mobile Data per Day

Here’s a nifty infographic, created by Online Education with several striking statistics about “an average day on the Internet” and the volume of data involved in mobile talk and data, Twitter, blogs, wikis, email, news sites and the like. The numbers are staggering! Continue reading

Why a Cloudlet Beats the Cloud for Mobile Apps

Sure, you know cloud computing. You also know a bit about so-called “private clouds,” which enterprises and government agencies are exploring as an option to combine the power and scale of virtualized cloud architectures with security and control over data.

But what do you know of Cloudlets? They may just be a key to the future of mobile computing.

That’s a possible conclusion from the results so far of a Microsoft Research family of projects called MAUI, short for Mobile Assistance Using Infrastructure. The MAUI approach is to enable a new class of CPU-intensive, and data-intensive, applications for mobile devices – but enable them in a new way.  Today’s mobile devices can’t run such apps, at least not well. And if they stick to the cloud they may never do so.

I’ve just read a fundamental MAUI paper published last month in the IEEE’s Pervasive Computing journal: “The Case for VM-based Cloudlets in Mobile Computing” (November 2009, co-authored by MSR’s Paramvir Bahl along with colleagues from Carnegie Mellon University, AT&T Research, and Lancaster University).

Continue reading

The Purple History of Intelink

When I first began talking with DIA CIO Mike Pflueger and Deputy CIO Mark Greer in the fall of 2003 about the work I’d be doing with them inside government, most of the ideas were big ones: let’s re-architect the DoDIIS enterprise, let’s find and deploy revolutionary new analytical software. One of our thoughts was a little one, but for me personally it turned out to be a most valuable project. They let me pull together a panel for the upcoming 2004 DoDIIS Conference called “Geeks and Geezers,” featuring some of the grand old names of intelligence technology. The panel was a success, and in organizing it, I spent quite a bit of time talking to those giants, or should I say listening to them. I learned an enormous amount about “the early days.” This post describes the important work of one of those fellows. 

Data in the Cloud from Dallas to Mars

There’s a lot going on at this week’s Microsoft Professional Developers Conference (PDC 09); it’s a traditional launchpad for cool new stuff. I thought I’d point out several of the government-relevant announcements and technology roll-outs.

I specifically want to spotlight something called Codename Dallas, and how NASA and others have begun using it. In the keynote this morning Microsoft’s Chief Software Architect Ray Ozzie told PDC attendees (and his streaming-video audience) that a landslide of new sensors and observational systems are changing the world by recording “unimaginable volumes of data… But this data does no good unless we turn the potential into the kenetic, unless we unlock it and innovate in the realm of applications and solutions that’s wrapped around that data.”

Here’s how we’re addressing that, with a bit of step-by-step context on the overall cloud-computing platform enabling it.  The steps are: 1. Azure, 2. Pinpoint, and 3. Dallas.

Continue reading

Para Bellum Web

Tim O'Reilly, Ray Ozzie

Tim O’Reilly created a bit of a stir last night in the tech world by writing a thoughtful essay entitled “The War for the Web.” He’ll be expanding on his thoughts in his keynote address today at the Web 2.0 Expo in New York. From the essay, here’s the core argument:

“[W]e’ve grown used to a world with one dominant search engine, one dominant online encyclopedia, one dominant online retailer, one dominant auction site, one dominant online classified site, and we’ve been readying ourselves for one dominant social network. But what happens when a company with one of these natural monopolies uses it to gain dominance in other, adjacent areas? I’ve been watching with a mixture of admiration and alarm as Google has taken their dominance in search and used it to take control of other, adjacent data-driven applications.

It could be that everyone will figure out how to play nicely with each other, and we’ll see a continuation of the interoperable web model we’ve enjoyed for the past two decades. But I’m betting that things are going to get ugly. We’re heading into a war for control of the web. And in the end, it’s more than that, it’s a war against the web as an interoperable platform. [emphasis added] Instead, we’re facing the prospect of Facebook as the platform, Apple as the platform, Google as the platform, Amazon as the platform, where big companies slug it out until one is king of the hill.

… P.S. One prediction: Microsoft will emerge as a champion of the open web platform, supporting interoperable web services from many independent players, much as IBM emerged as the leading enterprise backer of Linux.

Continue reading

Cyber Deterrence Symposium webcast

As I type this, I’m sitting in a seventh-floor conference area at George Washington University’s Elliott School of International Affairs, listening to the keynote speaker for the second of five panels today in the “Cyber Deterrence Symposium,” a joint production of INSA (the Intelligence and National Security Alliance), and the Homeland Security Policy Institute.

If you’re reading this on the day of the symposium (Monday November 2, 2009), you can tune in to the live webcast of the speakers and panels. It is a stellar line-up, see the roster below.

Continue reading

Follow

Get every new post delivered to your Inbox.

Join 6,248 other followers

%d bloggers like this: