Category: Standards
August 10th, 2009
Moving Data.gov towards the Semantic Web
Government transparency in all its forms would appear to be very much in vogue at present, spanning everything from the Obama administration’s Data.gov portal and Prime Ministerial pronouncements in the UK Parliament to municipal proclamations of openness in Vancouver and compelling grass-roots demonstrations by activists and even newspapers.
At the heart of many of today’s initiatives lie programmes to surface Government data for use and re-use by third parties. The ‘open’ in ‘Open Data’ is, of course, a very loaded term, and I’ve looked before at some of the ways in which data might become ‘open’ whilst remaining effectively useless. Nevertheless, Governments’ current enthusiasm for being seen to embrace transparency should certainly be both welcomed and encouraged, and there are real opportunities to work with Government in ensuring that today’s transparency fervour continues undiminished, whether by omission or commission.
Given the complex and varied nature of the data involved, and the obvious linkages between the entities (you and I, our communities, our schools, our hospitals) described in numerous different databases, there’s a clear opportunity for technologies and approaches from the Semantic Web community to play a significant role in simplifying the whole process of moving these legacy databases online.
Already interested in Open Government from previous roles, and (obviously!) committed to encouraging real-world adoption of semantic technologies, I’ve spent some time recently talking to a number of those involved. A number of those conversations are now available as podcasts, and I’ll continue to seek out fresh examples and perspectives to share.
My most recent podcast conversation, released today, is with Professor Jim Hendler and Dr Li Ding of the Tetherless World Constellation at Rensselaer Polytechnic Institute in Troy, NY. The team at Rensselaer have been working with some of the US Federal Government’s data sets on Data.gov, and so far they’ve converted sixteen data sets from their original form, resulting in 2,927,398,352 freely available RDF triples and a number of demonstration applications.
Other conversations already released in the series include;
- David Eaves, talking about Vancouver’s commitment to Open Data
- John Sheridan, Head of e-Services at the UK Government’s Office of Public Sector Information, talking about his Department’s efforts to get Government data online
- Mark Birbeck, talking about work with the UK Government’s Central Office of Information to embed lightweight RDFa into workflows and web pages
Each offers an example of ways in which ‘open data’ contributes to Government transparency, or to increasing the value of the massive sunk investment in collecting, managing and curating the data upon which Governments depend. The Semantic Web’s notion of Linked Data (whether actually in RDF or not!
) offers a means to increase the utility of the data we have, without a massive programme of reengineering the systems used to manage it. The examples we see today, and the work of the individuals and teams with whom I have been speaking, will teach us a lot about how to make this work at Government scale.
April 8th, 2009
Ivan Herman discusses Semantic Web activity at the World Wide Web Consortium
Ivan Herman is Semantic Web Activity Lead at the World Wide Web Consortium (W3C), and in this podcast he talks about a range of current activities across the Semantic Web community.
January 14th, 2009
Thomson Reuters bets on Content remaining King with Calais 4.0
Global information behemoth Thomson Reuters today announces the latest version of its Calais web service, delivering on earlier promises with respect to ‘Linked Data’ and firmly staking out the company’s intention to be a significant player in the shifting market for timely and authoritative information.
I’ll take a more in-depth look at the importance of authoritative sources in the emerging Linked Data ecosystem in this related post, and concentrate on the specifics of the Calais 4.0 release here.
Thomson Reuters’ Tom Tague describes version 4.0 as
“a fundamental change to the underlying service; it’s basically a new service”
This re-engineering of Calais will deliver the functionality that users have come to rely upon, whilst ensuring Thomson Reuters’ ability to continue to scale in a timely and cost-effective manner on the back of Amazon’s Web Services offering.
Tague describes the service released today as a technology preview to run alongside the existing Calais service for a period, but he is confident that it is at production strength from Day 1. Developers, Tague suggested, would
“try it and stay.”
In addition to this strengthening of the core offering, Calais 4.0 includes five substantive developments.
First, the company has followed through on earlier talk about ‘Linked Data,’ ensuring that any of around 25 entity types (company names, geographic areas, album titles, etc) discovered in content submitted to Calais will now be returned to the submitter with a ‘dereferenceable URI‘ that may be followed by either people or software in order to discover further information. The URI resolves to a Calais-hosted page of RDF with pointers to the Linked Data community’s usual suspects; DBpedia, MusicBrainz, GeoNames, the CIA Factbook, etc.
More unusually, and importantly, the second development sees the document include pointers to Thomson Reuters own content such as the (current) stock ticker, Board membership data, etc.
As the Press Release notes,
“In keeping with its commitment to the Linked Data standard, Thomson Reuters has also made a subset of its core data assets available for public use on the Web. The collection of business information represents the first contribution to the ‘Linked Data cloud’ made by a major publisher. It enables developers to programmatically query and use fundamental facts on hundreds of thousands of publically-traded companies, including company descriptions, stock tickers, management teams, locations, boards of directors and more.”
Thirdly, Calais 4.0 includes a ‘metadata transport layer’ to simplify the process of exposing and sharing large bodies of semantically rich data. Tague suggested that 2-300,000,000 persistent and dereferenceable URIs are available today (and capable of servicing tens or hundreds of millions of hits per day) for content previously submitted to Calais, with many more to come as the service scales.
Fourth, Calais is making its first move beyond English language content, and version 4.0 now supports entity extraction in French. French-language relationship and event extraction will follow shortly, as will other languages. Tague suggested that Hebrew, Arabic and Chinese will be amongst those rolled out during 2009. Behind the scenes, the team are also experimenting with automated translation services, which Tague reports to be ‘working very well’ in the lab.
Fifth, and finally, the Calais team is publishing an RDFS version of their schema, giving developers far more flexibility as to the ways in which they integrate the Calais web service into their own applications.
All in all, a welcome set of incremental improvements to Calais that also serves to raise an interesting set of questions about the role of ‘professional’ data in the Linked Data ecosystem.
Thomson Reuters’ Tom Tague is a regular member of the Semantic Web Gang, and should be discussing the release of Calais 4.0 in more depth on this month’s show, due to be recorded on 15 January.
December 8th, 2008
Mark Greaves of Vulcan sees business opportunities in the Semantic Web
Vulcan shares many traits with its reclusive founder, Paul Allen, yet behind the scenes the company is responsible for philanthropic support to research and community-building activities, as well as investing commercially in the likes of Radar Networks (the company behind Twine) and Evri.
Last week, I had the opportunity to talk with Mark Greaves, Vulcan’s Director of Knowledge Systems Research, and the resulting podcast was released earlier today.
Drawing upon a background that includes the likes of Boeing and DARPA, Greaves is persuaded of the benefits to be found in applying semantic technologies to existing business problems and processes.
Greaves identifies four broad areas ripe for development;
- Search
- Enterprise Information
- Social Semantic Web Applications
- Web-scale Knowledge Publishing
It will be interesting to see the extent to which Vulcan - and others - invest in these areas next year.
November 28th, 2008
Hapax CEO recognises importance of shared infrastructure moving forward
I had an enjoyable conversation with Mark Redgrave recently, ahead of his company’s unveiling of their ‘meaning platform,’ Amplify.
Mark is CEO of London-based Hapax, a company that has been applying patented technology to natural language processing (NLP) since 2000.
According to the Press Release,
“Amplify is a web service that brings human understanding to content. Amplify analyses content and returns its meaning in a usable and actionable structure. Amplify enables brands, advertisers and publishers to extract greater value from online content, allowing them to ensure brand safety and target more effectively.
By applying its meaning platform to the online advertising industry, Amplify can eliminate the guesswork in brand safety and targeting decisions. Using patented computational linguistics technology, Amplify enables publishers, social networks, ad networks and media agencies to automatically surface the significant topics, attitudes and pending decisions within any text. Whether to enhance existing targeting mechanisms, create a safe advertising environment or build brand specific products, Amplify provides the core foundation: the meaning of content.”
Amplify is currently being tested by ‘a couple of big Ad networks,’ and an open API is expected in the New Year, which will enable web developers to call upon Amplify within their own applications. This will be free below a certain number of transactions, and chargeable for more intensive use.
There are a lot of companies in the NLP space, and a lot of those are like Hapax in recognising the opportunities for both Advertising and Search Engine Optimisation (SEO). Unlike less advanced solutions that might indiscriminately place advertising for a particular hotel chain on web pages mentioning hotels or cities where the chain has a presence, the emerging generation of NLP-backed solutions are more accurate. Do you want to prominently advertise your hotel on a page discussing crime at the location? Or on a site bemoaning the soulless nature of hotel chains such as yours?
As Redgrave commented,
“This is the missing part of the jigsaw - until now, online advertising has relied on making assumptions based on very limited data. Existing classification techniques such as keyword or statistical analysis provide only half the story as they’re unable to capture the actual meaning. Amplify can now do this - not just accurately but also on a massive scale.”
Sitting beneath Amplify - and almost all of its competitors - is an ontology. This provides Amplify with much of its understanding of the world, and captures the meanings and structures that it will use as the basis of interpreting any text it is given for analysis.
These ontologies tend to be painstakingly constructed, and there is currently very little evidence that companies are pooling their efforts in order to reduce duplication, cut costs, and produce more comprehensive shared offerings. I talked about this elsewhere, recently, and noted at the time that Redgrave was unusual in the readiness with which he recognised the need to pool effort on the general background information that every ontology probably starts out by defining.
October 13th, 2008
Jim Hendler and Dean Allemang to kick off Semantic Universe webcasts
Semantic Universe, the people behind the Semantic Technology Conference, have just announced a new series of free webcasts featuring Jim Hendler and Dean Allemang. These begin on 16 October with a ‘Semantic Web Overview,’ and initially continue through November with investigations of RDF and OWL.
I have recorded podcasts with both Jim and Dean in the past, and am sure that they will do a great job sharing the wealth of their combined experiences with those who are able to sign up.
October 7th, 2008
Does the Semantic Web matter?
My business card reads ‘Technology Evangelist,’ and this is a blog about the Semantic Web. So that should be an unequivocal ‘Yes!’ then, right?
However, it’s frequently worthwhile to revisit and question presumptions, beliefs and ‘truths,’ and the Semantic Web that dominates so much of my working day should be no exception.
I am surrounded by smart people who live and breathe the Semantic Web, and I’m an employee and shareholder in a company that is betting its future growth (and, therefore, mine) on the significance and viability of this next wave.
Through this blog, my podcasts and other activities, I’ve probably met (or will meet) the cream of the Semantic Technology crop, and it is both an honour and daunting to move in such circles. The vision of these individuals is often compelling, although the gleam of zealous fervour in some eyes can prove unsettling.
The ‘excitement’ of squeezing another million triples into your triple store is, frankly, not that exciting in the bigger scheme of things. Talk of huge and all-encompassing ontologies is deeply unsettling, both philosophically and practically. Expectations, whether implicit or explicit, that vast legions will ‘do’ the Semantic Web and express themselves in RDF are, frankly, lunacy. The speed with which ‘RDF’ or ‘OWL’ enter any conversation about the Semantic Web is worrying; and must ultimately prove self-defeating as potential adopters retreat from a barrage of terminology and an opaque glut of unnecessary detail.
Continuing landgrabs by startups that seek to attract, trap and exploit eyeballs stand unashamedly on the shoulders of Semantic Web promise whilst running counter to its basic tenets of linking and openness. On the other hand, companies ‘just’ doing perfectly reasonable - and valuable - things with the meanings of words, phrases and documents latch on to the Semantic Web’s buzz, whilst being all about Semantics and not at all about the Web.
New entrants, hopefully building viable and useful businesses upon the Semantic Web’s ideas, are pilloried by stalwarts of the ‘community,’ because the reality of their business model does not permit a whole-hearted embracing of the entire Semantic Web stack from Day One. Intellectual purity clashes with pragmatism and reality on a daily basis. Well-meaning guidelines and best practices morph in the minds of too many to become laws, ‘truths’, and rods with which to beat outsiders. Visions of Orwellian pigs fill my brain, and I don’t like what I see as they rise up onto two feet and gaze disdainfully around.
Deep down, though, the core principles upon which the specifications, recommendations and code rest resonate powerfully with my views about how the world should be.
The Web is a truly wonderful thing. It has made previously scarce resources available to many by removing real barriers to seeing information as a non-rival good; it is has dramatically lowered the cost of reaching an audience or market, and offers a Platform from which many millions can have their say; it ensures that the most obscure topics can develop communities of interest, even when spread over continental distances. It’s fun, it’s informative, it’s profitable, it’s transformational, it’s educational. And yes, it’s often ‘wrong,’ it can be shallow, and it remains far from universal.
The mainstream Web is becoming ever-richer, as the crowd embraces Web 2.0 principles and participates in conversation across a growing number of online places. This very richness and diversity, though, poses a problem if we are to progress to the next level.
It took the rise of Google in the closing years of the last century to square that circle, dramatically increasing the discoverability of new resources and shortly thereafter settling upon a model by which Google and its dependants could make money. We take much of this for granted today, but the development of the Web - and its viability - have been both remarkable and dramatic. I remember the wonder of those early images from the Vatican and the Louvre, brought across the water and onto my desk. I remember the initial bubbling-up of ‘amateur’ content from individuals inside universities, the bizarrely compelling observation of a not-so-distant coffee pot, and the breathlessness with which select mainstream media (such as the UK’s Guardian newspaper) tracked the journey.
Much that was once amazing is now taken for granted. Many that were once ‘the next big thing’ are no more. The number of people connected, the ways in which they connect, and the things they seek to do once online grow every day, yet the fundamental means of connection between all of these people, all of these places, and all of these things remains the dumb hyperlink. A simple ‘look here.’ A blind pointer into the Void. An impediment to further progress.
This is what the so-called Semantic Web sets out to address. All of the specifications, all of the technology, are about enabling the description of ’stuff’ - and the connections between one piece of stuff and another - to be declared in ways that are explicit, intelligible and actionable to both humans and software applications acting on their behalf.
Author: Paul Miller
This tells you, the human reader, quite a lot. It’s almost opaque to the growing band of software aggregators and agents that trawl the web on behalf of users.
By simply adding the semantics that associate name with person, person with the authored work, and both person and work with the ‘act’ of authorship, that same statement becomes more meaningful. By following the so-called Linked Data Design Issues and expressing these semantics in a ‘linkable’ fashion, the network of relationships between (in this case) me, my communities of interest and my authored works grows stronger and more useful, across the artificial boundaries imposed by ‘communities,’ applications and the like.
A wealth of data and connections exist today, with most remaining woefully under-exploited. We’re already seeing big industries such as Pharmaceuticals apply Semantic Web techniques in realising the potential in the data they already have, and lowering the costs of developing new medicines as a result.
We face serious problems in the world today. Not everything can be solved by analysing and using data, but it should surely be an important tool in support of all our other efforts. By moving from a mentality that sees data ‘closed’ by default to one in which data is ‘open’ by default, we have much to gain. By embracing ‘the Web’ within our applications rather than continuing to see it, practically, as merely an adjunct we can unlock more of the potential that already exists.
The Semantic Web is not some ‘new’ Web. It is not a replacement for what we have today. It is a progression, and an embracing of shifting perceptions as to what is ‘normal’ and what is possible.
So yes, the Semantic Web does matter. And it’s my job to play my small part in showing you how, and why.
Bring it on.
Image shared on Flickr by Matthew Jett Hall.
October 7th, 2008
Whisky, Whiskey, and the Semantic Web ?
Huge ontologies and taxonomies that attempt to boil the ocean and describe ‘the sum of human knowledge’ tend to make me deeply uncomfortable. At the other end of the scale, though, there is clearly a place for reaching some shared understanding on how we describe things. Where does the line lie between ‘deeply uncomfortable’ and ‘clearly a place’? I’m not sure… but I know the extremes when I see them.
It was in this vein that my colleague, Tom Heath, recently devoted his time to working with like-minded peers from across the Semantic Web community to organise, host, and participate in the first VoCamp.
As Tom writes in his report on the Nodalities blog;
“We need more vocabularies because people are increasingly motivated to share their data online, and need some way of describing the data itself in a structured fashion. If people use the same vocabularies when describing data of the same type, or at least some of the same terms, it makes sharing and integrating those data sets much easier.”
And yes, one of those upon which the group chose to focus their attention was an ontology for whisky. Not, perhaps, fully in the spirit of Tim O’Reilly’s recent call for us to tackle the problems that ‘matter,’ but a useful learning exercise for this disparate group of technologists.
VoCamp moves to Ireland next month, and I look forward to seeing whether the group can move beyond the inevitable disputes over spelling the name of their favourite tipple to embrace O’Reilly’s call.
Whisky glass image (c) Kyle May, 2007.
September 23rd, 2008
Thomson Reuters bootstraps Semantic Web of Linked Data with SemanticProxy
The Calais team inside Thomson Reuters continues to impress, and today’s release could in many ways be the best yet as it promises to contribute massively to the growing body of ‘Linked Data’ on the Web. As regular readers will remember, this ‘Linked Data’ is the same stuff being described by Sir Tim Berners-Lee as;
“the Web done right.”
Calais Initiative lead and Semantic Web Gang regular Tom Tague took time out of setting up at Emerging Technology 2008 in Cambridge last night to go through the news with me.
Built on top of Calais and scalably hosted on Amazon’s EC2 service, the new site at SemanticProxy.com enters public beta today, and enables anyone to easily generate rich semantic metadata for pages on the open web, simply by passing the URL to SemanticProxy. Human visitors can do so via a standard web form, but the same results can also be achieved programmatically via an API, and it is uses of this sort that will enable a real growth in the availability of Linkable Data out on the open Web.
Tague said;
“The [Semantic Web] market today is largely building little semantic kingdoms - little self-contained ecosystems - rather than the Semantic Web.”
The successes of the Linking Open Data Project’s enthusiasts aside, this sentiment is unfortunately one with which it is hard to disagree. Many of the pieces in the Semantic Web stack are being deployed today, but deployed in such a way that often they build better, richer, more expressive silos rather than deployed with a technical and procedural presumption that the data should play its full role out on the open Web.
A tool like SemanticProxy makes it straightforward to generate structured metadata from pages on the open Web. Furthermore, it respects the Linked Data community’s ‘rules‘, and Tague stressed that;
“SemanticProxy will return dereferenceable Linked Data URIs by the end of this quarter.”
When I mentioned this to one of the Linked Data community’s proponents, his immediate response of
“Wow, nice!”
was cut short as he headed off to explore the new service.
A simple online demonstration allows users to paste a URL into the site and see the Calais service return identified terms… and a measure of their relevance to the wider story. Optimised at the moment for the content covered by major news sites, the demonstration works best for factual items such as this one from the BBC.
Used for real, as a proxy service that Web applications might routinely query in front of any news item, the possibilities are diverse and compelling. And once all those big news sites are effectively appearing to generate Linked Data? The cloud just got an awful lot bigger, an awful lot more current, and an awful lot more powerful.
June 18th, 2008
Kingsley Idehen opens Linked Data Planet in New York City
Recent podcast subject Kingsley Idehen opened the Linked Data Planet conference in New York yesterday morning, demonstrating the evolution of thought and practice from the world of big databases toward the Semantic Web; a journey that he and his company are well positioned to describe.
Kingsley began by pointing to some of the well-understood trends that made Web 2.0 feasible, and discussed the corresponding growth in ‘user generated content’ in the consumer space. He argued that similar trends are also at work within the enterprise, and that the previously clear lines between enterprise and individual are becoming increasingly permeable.
Assuming this to be true, the enterprise faces increasingly complex challenges in engaging with and empowering its employees on the one hand, and recognising and responding to the blurring lines between work time and personal time, employee and customer on the other.
Linked Data, Kingsley argued, offers a powerful means to “mesh disparate and heterogenous data” over the web in ways that cross some of the boundaries he touched upon earlier in his presentation.
The corporate data silo, Kingsley claimed, “will die.” He didn’t seem saddened by the prospect.
Context will replace content as ‘king’, facilitating a move from ‘mashing up’ (characterised by Kingsley as ‘brute force data linking’) to ‘meshing’ (characterised as ‘natural data linking’) of data across the Web.
Paul Miller provides consultancy and analysis services at the interface between the worlds of Cloud Computing and the Semantic Web. See his full profile and disclosure of his industry affiliations.
Subscribe to The Semantic Web via Email alerts or RSS.
SponsoredWhite Papers, Webcasts, and Downloads
- VMware Infrastructure: A Guide to Bottom-Line Benefits VMware Frustrated by the costs of maintain ever larger data centers?or building ... Download Now
- Five Steps to Determine When to Virtualize YourServers VMware Server virtualization isn't just for big companies. Entry-level ... Download Now
- The Impact of Virtualization Software on Operating Environments VMware Today's use of virtualization technology allows IT professionals to ... Download Now
Recent Entries
- Oracle delivers native support for Thomson Reuters’ OpenCalais service
- Moving Data.gov towards the Semantic Web
- New open source Semantic Web store from Garlik capable of enterprise scale
- Semantic Web Gang podcast looks back at the Semantic Technology Conference
- New York Times embraces Linked Data
Blogs From Our Sponsors
Top Rated
Premier Vendor Content Whitepapers, webcasts & resources from our Power Center Sponsors
- The best support in the Linux business
-
If Linux is going to power your mission-critical applications, you'd better have the best support known to business. Novell was rated the top provider of Linux technical support.

- Learn more >>
- Keep Up With The Latest In Document Management with The DocuMentor.
-
Doc delivers the scoop on today's enterprise content management, printer maintenance, and all other issues related to document management. It's the DocuMentor Blog.
- Learn more >>
- Learn more about tools to grow your business
-
The Business Essentials Guide provides you useful tools and templates to help grow your business and save you time with automated shipping solutions.
- Save time with the UPS Business Essentials Guide
- Microsoft Dynamics CRM Online - Free Six-Month Trial for Eligible Organizations
-
Microsoft Dynamics CRM Online provides fast online access, simple contact management and better sales performance for a low monthly cost - the best value on the market today.

- Learn more about the free, six-month trial offer>>
Archives
Favorite Links
ZDNet Blogs
- All About Microsoft
- The Apple Core
- Between the Lines
- BriefingsDirect
- Collaboration 2.0
- Dev Connection
- Digital Cameras & Camcorders
- Ed Bott's Microsoft Report
- Emerging Tech
- Enterprise Web 2.0
- Forrester Research
- Googling Google
- GreenTech Pastures
- Hardware 2.0
- Home Theater
- iGeneration
- Irregular Enterprise
- IT Project Failures
- Laptops & Desktops
- Lawgarithms
- Linux and Open Source
- Managing L'unix
- The Mobile Gadgeteer
- On Sustainability
- Rational Rants
- The Semantic Web
- Service Oriented
- Smartphones and Cell Phones
- Social Business
- Social CRM: The Conversation
- Software & Services Safari
- Software as Services
- Storage Bits
- Team Think
- Tech Broiler
- Technology and the Global Supply Chain
- Tom Foremski: IMHO
- The ToyBox
- Virtually Speaking
- The Web Life
- ZDNet Education
- ZDNet Government
- ZDNet Healthcare
- Zero Day
White Papers, Webcasts, and Downloads
- Virtualization: Architectural Considerations And Other Evaluation Criteria VMware Of the many approaches to x86 systems virtualization available in the ... Download Now
- Building the Virtualized Enterprise with VMware Iinfrastructure VMware VMware virtualization software has been adopted by over 120,000 enterprise ... Download Now
- Five Steps to Determine When to Virtualize YourServers VMware Server virtualization isn't just for big companies. Entry-level ... Download Now
Meet Doc
-
Here to help you with your Document Management Needs
- Check out Doc’s Blog on ZDNet
- Help your company, help the earth I want to share with you the Environmental Defense Fund Paper Calculator, which allows you to gauge your organization's environmental impact.
- Which is Greener: Paper or Digital? The Answer May Surprise You Anything we can do to reduce paper consumption is good. But what about the impact of digital waste?
-
Produced by
ZDNet and






