Twitter Search as a Government case study

In addition to periodic think-pieces here at Shepherd’s Pi, I also contribute a monthly online column over at SIGNAL Magazine on topics relating to intelligence. This month I keyed off a recent discussion I had onstage at the 2015 AFCEA Spring Intelligence Symposium with Elon Musk, particularly a colloquy we had on implications of the emerging cleavage (post-Edward Snowden) between Silicon Valley technology companies and their erstwhile innovation partners, U.S. intelligence agencies.

That discussion sparked some thinking on the public/private sector divide on tech innovation – and on basic operational performance in building or adopting new technologies. It’s always been a hobbyhorse topic of mine; see previous pieces even from way back in 2007-08 like “Pentagon’s New Program for Innovation in Context,” or “A Roadmap for Innovation – From the Center or the Edge?” or “VC-like Beauty Contests for Government.”

I have an excerpt from my new SIGNAL piece below, but you can read the entire piece here: “The Twitter Hare Versus the Government Turtle.”

Is the public/private divide overstated? Can the government compete? Without going into the classified technology projects and components discussed at the symposium, let’s try a quick proxy comparison, in a different area of government interest: archiving online social media content for public use and research. Specifically, since Twitter data has become so central to many areas of public discourse, it’s important to examine how government and private sector are each addressing that archive/search capability.

First, the government side. More than half a decade ago, the Library of Congress (LoC) announced in April 2010 with fanfare that it was acquiring the “complete digital archives” of Twitter, from its first internal beta tweets. At that time, the LoC noted, the 2006-2010 Twitter archive already consisted of 5 terabytes, so the federal commitment to archiving the data for search and research was significant…

  … Fast forward to today. Unbelievably, after even more years of “work,” there is no progress to report—quite the opposite. A disturbing new report this week in Inside Higher Education entitled “The Archive is Closed” shows LoC at a dead-stop on its Twitter archive search. The publicly funded archive still is not open to scholars or the public, “and won’t be any time soon.”

  … Coincidentally this week, just as the Library of Congress was being castigated for failing in its mission to field a usable archive after five years, Twitter unveiled a new search/analytics platform, Twitter Heron—yes, after just six months [after releasing its previous platform Twitter Storm]. Heron vastly outperforms the original version in semantic throughput and low latency; yet in a dramatic evocation of Moore’s Law, it does so on 3 times less hardware.

Twitter Storm vs Twitter Heron

Oh, and as the link above demonstrates, the company is far more transparent about its project and technology than the Library of Congress has been.

All too often we see government technology projects prove clunky and prone to failure, while industry efforts are better incentivized and managerially optimized for success. There are ways to combat that and proven methods to avoid it. But the Twitter search case is one more cautionary example of the need to reinvigorate public/private partnerships—in this case, directly relevant to big-data practitioners in the intelligence community.

 – Excerpts from SIGNAL Magazine, “The Twitter Hare Versus the Government Turtle.” © 2015 AFCEA International.

Tearing the Roof off a 2-Terabyte House

I was home last night playing with the new Kinect, integrating it with Twitter, Facebook, and Zune. Particularly because of the last service, I was glad that I got the Xbox 360 model with the 250-gigabyte (gb) hard disk drive. It holds a lot more music, or photos, and of course primarily games and game data.

So we wind up with goofy scenes like my wife zooming along yesterday in Kinect Adventures’ River Rush – not only my photo (right) but in-game photos taken by the Kinect Sensor, sitting there below the TV monitor.

Later as I was waving my hands at the TV screen, swiping magically through the air to sweep through Zune’s albums and songs as if pawing through a shelf of actual LP’s, I absent-mindedly started totting up the data-storage capacity of devices and drives in my household.  Here’s a rough accounting:

  • One Zune music-player, 120gb;
  • 2 old iPods 30gb + 80gb;
  • an iPad 3G at 16gb;
  • one HP netbook 160gb;
  • an aging iMac G5 with 160gb;
  • three Windows laptops of 60gb, 150gb, and 250gb;
  • a DirecTV DVR with a 360gb disk;
  • a single Seagate 750gb external HDD;
  • a few 1gb, 2gb, and a single 32gb SD cards for cameras;
  • a handful of 2gb, 4gb, and one 16gb USB flash drives;
  • and most recently a 250gb Xbox 360, for Kinect. 

All told, I’d estimate that my household data storage capacity totals 2.5 terabytes. A terabyte, you’ll recall, is 1012 bytes, or 1,000,000,000,000 (1 trillion) bytes, or alternately a thousand gigabytes.

Continue reading

Bing vs Google, the quiet semantic war

On Wednesday night I had dinner at a burger joint with four old friends; two work in the intelligence community today on top-secret programs, and two others are technologists in the private sector who have done IC work for years. The five of us share a particular interest besides good burgers: semantic technology.

Oh, we talked about mobile phones (iPhones were whipped out as was my Windows Phone, and apps debated) and cloud storage (they were stunned that Microsoft gives 25 gigabytes of free cloud storage with free Skydrive accounts, compared to the puny 2 gig they’d been using on DropBox).

But we kept returning to semantic web discussions, semantic approaches, semantic software. One of these guys goes back to the DAML days of DARPA fame, the guys on the government side are using semantic software operationally, and we all are firm believers in Our Glorious Semantic Future.

Continue reading

A Technical Computing revolution

Last week I enjoyed hosting a visit in Redmond from Chris Kemp, NASA’s new Chief Technology Officer for information technology. Our discussions were with folks from the Windows Azure cloud computing team, the high-performance computing and large-data folks, and our Extreme Computing Group. I smiled when Chris said he was a fan of the book Total Recall: How the E-Memory Revolution Will Change Everything, written by Microsoft’s Gordon Bell and colleague Jim Gemmell. (I wrote about their research projects in an earlier post, Total Recall for Public Servants.)

Continue reading

Using the body in new virtual ways

This is CHI 2010 week, the Association for Computing Machinery’s Conference on Human Factors in Computing Systems in Atlanta. Top researchers in human-computer-interaction (HCI) are together April 10-15 for presentations, panels, exhibits, and discussions. Partly because of our intense interest in using new levels of computational power to develop great new Natural User Interfaces (NUI), Microsoft Research is well represented at CHI 2010 as pointed out in an MSR note on the conference:

This year, 38 technical papers submitted by Microsoft Research were accepted by the conference, representing 10 percent of the papers accepted. Three of the Microsoft Research papers, covering vastly different topics, won Best Paper awards, and seven others received Best Paper nominations.

Continue reading

Follow the USS Carl Vinson to Haiti

As I write on Wednesday afternoon (EST), the scenes of chaos, death, and destruction in Haiti are only now beginning to be visible to the outside world through media. As horrific and heart-rending as those scenes are, they serve a purpose in letting other nations comprehend the magnitude of the crisis and the urgency required in lending direct aid. The U.S. military is uniquely positioned to contribute.

Flight Deck of the USS Carl Vinson

What a difference a day makes: barely 24 hours ago, several hours before the earthquake struck, the Nimitz-class supercarrier USS Carl Vinson (CVN 70) was cranking up its nuclear engines and setting a peaceful course out of Hampton Roads at the base of the Chesapeake Bay, Virginia. At long last after completing a complex overhaul and new sea trials, she was heading south, to make the South America turn and return to homeport in San Diego as part of the Pacific Fleet.

Continue reading

To fix intelligence analysis you have to decide what’s broken

“More and more, Xmas Day failure looks to be wheat v. chaff issue, not info sharing issue.” – Marc Ambinder, politics editor for The Atlantic, on Twitter last night.

Marc Ambinder, a casual friend and solid reporter, has boiled down two likely avenues of intelligence “failure” relevant to the case of Umar Farouk Abdulmutallab and his attempted Christmas Day bombing on Northwest Airlines Flight 253.  In his telling, they’re apparently binary – one is true, not the other, at least for this case.

The two areas were originally signalled by President Obama in his remarks on Tuesday, when he discussed the preliminary findings of “a review of our terrorist watch list system …  so we can find out what went wrong, fix it and prevent future attacks.” 

Let’s examine these two areas of failure briefly – and what can and should be done to address them.

Continue reading

Follow

Get every new post delivered to your Inbox.

Join 6,654 other followers

%d bloggers like this: