On CBS MoneyWatch: Find Cheaper Airfare: 4 Tricks
BNET Business Network:
BNET
TechRepublic
ZDNet

January 14th, 2009

Thomson Reuters bets on Content remaining King with Calais 4.0

Posted by Paul Miller @ 5:04 am

Categories: Commercialisation, Open Data, Semantic Web, Semantic Web Companies, Standards

Tags: Thomson Reuters Corp., URI, Tague, Team Management, Corporate Governance, Management, Business Operations, Corporate Law, Paul Miller

Global information behemoth Thomson Reuters today announces the latest version of its Calais web service, delivering on earlier promises with respect to ‘Linked Data’ and firmly staking out the company’s intention to be a significant player in the shifting market for timely and authoritative information.

I’ll take a more in-depth look at the importance of authoritative sources in the emerging Linked Data ecosystem in this related post, and concentrate on the specifics of the Calais 4.0 release here.

Thomson Reuters’ Tom Tague describes version 4.0 as

“a fundamental change to the underlying service; it’s basically a new service”

This re-engineering of Calais will deliver the functionality that users have come to rely upon, whilst ensuring Thomson Reuters’ ability to continue to scale in a timely and cost-effective manner on the back of Amazon’s Web Services offering.

Tague describes the service released today as a technology preview to run alongside the existing Calais service for a period, but he is confident that it is at production strength from Day 1. Developers, Tague suggested, would

“try it and stay.”

In addition to this strengthening of the core offering, Calais 4.0 includes five substantive developments.

First, the company has followed through on earlier talk about ‘Linked Data,’ ensuring that any of around 25 entity types (company names, geographic areas, album titles, etc) discovered in content submitted to Calais will now be returned to the submitter with a ‘dereferenceable URI‘ that may be followed by either people or software in order to discover further information. The URI resolves to a Calais-hosted page of RDF with pointers to the Linked Data community’s usual suspects; DBpedia, MusicBrainz, GeoNames, the CIA Factbook, etc.

More unusually, and importantly, the second development sees the document include pointers to Thomson Reuters own content such as the (current) stock ticker, Board membership data, etc.

As the Press Release notes,

“In keeping with its commitment to the Linked Data standard, Thomson Reuters has also made a subset of its core data assets available for public use on the Web. The collection of business information represents the first contribution to the ‘Linked Data cloud’ made by a major publisher. It enables developers to programmatically query and use fundamental facts on hundreds of thousands of publically-traded companies, including company descriptions, stock tickers, management teams, locations, boards of directors and more.”

Thirdly, Calais 4.0 includes a ‘metadata transport layer’ to simplify the process of exposing and sharing large bodies of semantically rich data. Tague suggested that 2-300,000,000 persistent and dereferenceable URIs are available today (and capable of servicing tens or hundreds of millions of hits per day) for content previously submitted to Calais, with many more to come as the service scales.

Fourth, Calais is making its first move beyond English language content, and version 4.0 now supports entity extraction in French. French-language relationship and event extraction will follow shortly, as will other languages. Tague suggested that Hebrew, Arabic and Chinese will be amongst those rolled out during 2009. Behind the scenes, the team are also experimenting with automated translation services, which Tague reports to be ‘working very well’ in the lab.

Fifth, and finally, the Calais team is publishing an RDFS version of their schema, giving developers far more flexibility as to the ways in which they integrate the Calais web service into their own applications.

All in all, a welcome set of incremental improvements to Calais that also serves to raise an interesting set of questions about the role of ‘professional’ data in the Linked Data ecosystem.

Thomson Reuters’ Tom Tague is a regular member of the Semantic Web Gang, and should be discussing the release of Calais 4.0 in more depth on this month’s show, due to be recorded on 15 January.

Paul MillerPaul Miller provides consultancy and analysis services at the interface between the worlds of Cloud Computing and the Semantic Web. See his full profile and disclosure of his industry affiliations.


Email Paul Miller

Subscribe to The Semantic Web via Email alerts or RSS.

Talkback

Add your opinion

SponsoredWhite Papers, Webcasts, and Downloads

advertisement

Recent Entries

Top Rated

    Premier Vendor Content Whitepapers, webcasts & resources from our Power Center Sponsors
    advertisement

    Archives

    Favorite Links

    ZDNet Blogs

    White Papers, Webcasts, and Downloads

    Enterprise Applications

    • Check out some of the easiest and most powerful ways to boost productivity while saving money on your application infrastructure. See ZDNet's comprehensive Enterprise Application resource center, now!
    • New Online Dashboard
    • Read about top issues IT decision-makers face every day, plus get cost effective solutions to real life IT problems. Oracle Topline