Blogs are great for supplementing real-life events, by giving space and time for specific examples and links which can’t be referenced at the time. I was invited to give a talk last week at the first-ever NASA Information Technology Summit in Washington DC, and the topic I chose was “Government and the Revolution in Scientific Computing.” That’s an area that Microsoft Research has been focusing on quite a bit lately, so below I’ll give some examples I didn’t use at my talk.
One groundrule was that invited private-sector speakers were not allowed to give anything resembling a “sales pitch” of their company’s wares. Fair enough – I’m no salesman. The person who immediately preceded me, keynoter Vint Cerf, slightly bent the rules and talked a bit about his employer Google’s products, but gee whiz, that’s the prerogative of someone who is in large part responsible for the Internet we all use and love today.
I described in my talk the radical new class of super-powerful technologies enabling large-data research and computing on platforms of real-time and archival government data. That revolution is happening now, and I believe government could and should be playing a different and less passive role. I advocated for increased attention to the ongoing predicament of U.S. research and development funding.
Alex Howard at O’Reilly Radar covered the NASA Summit and today published a nice review of both Vint’s talk and mine. Some excerpts:
[Shepherd] focused his talk on whether humanity is on the cusp of a fourth research paradigm as the “scale and expansion of storage and computational power continues unabated.” Shepherd put that prediction in the context of the evolution of science from experimental to theoretical to computational. Over time, scientists have moved beyond describing natural phenomena or Newton’s Laws to simulating complex phenomena, an ability symbolized by comparing the use of lens-based microscopes to electron microscopes. This has allowed scientists to create nuclear simulations.
Shepherd now sees the emergence of a fourth paradigm, or “eScience,” where a set of tools and technologies support data federation and collaboration to address the explosion of exabytes of data. As an example he referenced imagery of the Pleiades star cluster from the Digitized Sky Survey synthesized within the WorldWide Telescope.
“When data becomes ubiquitous, when we become immersed in a sea of data, what are the implications?” asked Shepherd. “We need to be able to derive meaning and information that wasn’t predicted when the data sets were constructed. No longer will we have to be constrained by databases that are purpose-built for a system that we design with a certain set of requirements. We can do free-form science against unconstrained sets of data, or modeling on the fly because of the power of the cloud.”
… In particular, Shepherd looked at the growth of cloud computing and data ubiquity as an enabler for collaboration and distributed research worldwide. In the past, the difficulty of replicating scientific experiments was a hindrance. He doesn’t see that as a fundamental truth anymore. Another liberating factor, in his view, is the evolution of programming into modeling. “Many of the new programming tools are not just visual but hyper-visual, with drag and drop modeling. Consider that in the context of continuous networking,” he said. “Always-on systems offer you the ability to program against data sets in the cloud, where you can see the emergence of real-time interactive simulations.”
What could this allow? “NASA can design systems that appear to be far simpler than the computation going on behind the scenes,” he suggested. “This could enable pervasive, accurate, and timely modeling of reality.”
Much of this revolution is enabled by open data protocols and open data sets, posited Shepherd, including a growing set of interactions — government-to-government, government-to-citizen, citizen-to-citizen — that are leading to the evolution of so-called “citizen science.” Shepherd referenced the Be A Martian Project, where the NASA Jet Propulsion Laboratory crowdsourced images from Mars.
To supplement those points from my talk, here are some items from Microsoft Research’s new focus on scientific tools, available for free here. Most of these are open-source tools and “research accelerators”:
- Dryad and DryadLINQ for Data Intensive Research: Dryad is a high-performance general-purpose distributed computing engine that simplifies the task of implementing distributed applications on clusters of Windows-based computers. DryadLINQ allows developers to implement Dryad applications in managed code by using an extended version of the LINQ programming model and API.
- Microsoft Web N-gram Services: Access petabytes of data via this public beta, made available via a cloud-based platform, to drive discovery and innovation in web search, natural language processing, speech, and related areas by conducting research on real-world web-scale data, taking advantage of regular data updates for projects that benefit from dynamic data.
- NodeXL, the Network Analysis and Visualization tool: Network analysis is of growing importance in academic, commercial, and Internet social media contexts, and NodeXL uses a spreadsheet model as host to lower the usability and training barriers to network data analysis and display.
- ESSE, Environmental Scenario Search Engine: Data mining tools to explore exponentially growing archives of environmental sciences, in multiple domains such as space, terrestrial weather, oceans and terrain. Allows fuzzy queries on terabyte datasets, using fuzzy-logic data mining web-services to perform searching and statistical analysis of the distribution of identified events. ESSE will allow parallel mining over web services of distributed data sources, possibly from different subject areas of the Earth sciences, but sharing the same metadata scheme and data exchange formats.
- Zentity, a Research-Output Repository Platform: A rich platform with a suite of building blocks, tools, and services to create and maintain a scientific organization’s digital library ecosystem. Includes a built-in ScholarlyWorks data model with pre-defined semantic entities, such as Lecture, Publication, Paper, Presentation, Video, File, Person, and Tag along with basic properties for each of these and well-known relationships such as Author, Cites, Version, etc. Provides support to create custom entities and design custom data models using an Extensibility API. Includes support for RSS, OAI-PMH, OAI-ORE, AtomPub and SWORD services, as well as a pluggable Security model for Authentication and Authorization to secure repository content.
- Trident, the Scientific Workflow Workbench: Allows users to automate analysis and then visualize and explore data; and to design and schedule experiments as workflows over HPC clusters or cloud computing resources. The workflow workbench provides a tiered library that hides the complexity of different workflow activities and services for ease of use.
- Computational Biology Web Toolkit: Enables and accelerates fundamental advances in biology. One tool, PhyloDViewer, is an interactive visualization tool for phylogenetic dependency networks or any other set of associations among amino acids or between amino acids and environmental traits.
- Microsoft Biology Foundation (MBF): a language-neutral bioinformatics extension to the Microsoft .NET Framework, which includes parsers for common bioinformatics file formats, algorithms for manipulating protein sequences, and connectors to related Web services.
- Chemistry Add-in for Word: makes it easier for researchers and chemists to insert and modify chemical information, such as labels, formulas, and 2-D depictions, from within Microsoft Office Word. In addition to authoring functionality, Chemistry Add-in for Word enables user denotation of inline “chemical zones,” the rendering of high-quality and print-ready visual depictions of chemical structures, and the ability to store and expose semantic-rich chemical information in a semantically rich manner.
- Ontology Add-in for Word: An open-source tool that simplifies the development and validation of semantic ontologies, making ontologies more accessible to a wide audience of authors and enabling semantic content to be integrated in the authoring experience, capturing the author’s intent and knowledge at the source, and facilitating downstream discoverability.
- Microsoft Academic Search is a free academic search engine developed by Microsoft Research. It provides semantically-linked innovative ways to explore scientific papers, conferences, journals, and authors, connecting millions of scholars, students, librarians, and other users, and a neat visuzalization of the link-analysis as well.
That’s a good starter list🙂
By the way, I contend that commercial software is (once again) surpassing government IT systems along multiple vectors of “Information Work,” performed for example by intelligence-community collectors and analysts. Microsoft is aggressively deploying new web-scale capabilities to lead this hyper-competitive space (I did mention Google before, did I not?). Several examples are already live on the web:
Bing Maps Immersive 3D with real-time and NRT semantic data layers is already used by millions on the web. Government has nothing as powerful.
EntityCube is a research prototype (sometimes it’s up, sometimes it’s down for testing), used to explore object-level search technologies, which automatically summarizes the Web for entities such as people, locations and organizations. It’s also a fun way to examine the Kevin Bacon six-degrees-of-separation theory between you and anyone else on the web.
Microsoft Translator is a web-scale machine-translation service, featuring a rich API and collaborative features that enable developers to combine human edits with machine translation intelligently. The API extends the capability to integrate automatic translation capabilities deeply into an application to any developer, for any research project, similar to what Microsoft has embedded already in Internet Explorer, Office, Bing, and Instant Messenger. Imagine if national, state or local governments used multilingual tools so nimbly.
As I said at NASA, the newest scientific revolution continues, in the face of declining government R&D support. I would like to see government regain and maintain a reputation as “early adopter” for new technologies, and what better realm than the advance of science!
If you’d like to see the slides I used to discuss these issues at the NASA Summit, here you go:
Filed under: Government, innovation, Microsoft, R&D, Technology Tagged: | Alex Howard, analysis, analytics, API, Bing, Bing Maps, biology, chemistry, data, data visualization, dataviz, Dryad, EntityCube, Google, Government, IE, IM, IT, JPL, Microsft Research, Microsoft, NASA, NodeXL, O'Reilly, ontology, open source, opendata, opengov, opensource, OSS, research, RSS, science, scientific computing, scientist, semantic, semantic computing, SNA, social media, social networking, social networks, tech, Technology, Vint Cerf, Zentity