On CNET: 500GB external hard drive for $79
BNET Business Network:
BNET
TechRepublic
ZDNet

May 12th, 2008

Powerset shows semantic search solution

Posted by Paul Miller @ 6:38 am

Categories: Commercialisation, Semantic Web, Semantic Web Companies

Tags: Natural Language Processing, Team, Xerox PARC, Wikipedia, Powerset, Wiki, Online Communications, Paul Miller

Powerset logoBeating the rush of press releases likely to flood inboxes during next week’s Semantic Technology Conference, Powerset today announced the public availability of a service that adds a whole new dimension to searching for information from Wikipedia.

Whilst much of the functionality unveiled today has been visible to those granted access to the company’s Powerlabs for some time, the Powerset team has clearly been busy optimising code and ensuring that the various components work together much better. Read the rest of this entry »

May 8th, 2008

Peter Mika offers bananas at Yahoo! Research

Posted by Paul Miller @ 6:30 am

Categories: Podcasts, Research, Semantic Web, Talking Semantics

Tags: Yahoo! Inc., Semantic Web, Internet, Paul Miller

Yahoo! SearchMonkey logoYahoo! are certainly being a lot more open than competitors such as Google and Microsoft when it comes to talking about their use of semantic technologies. They’ve been active for several years in recruiting stalwarts of the Semantic Web community such as Dave Beckett, and there is a long tradition of the company’s employees actively contributing to the research side of the Semantic Web world.

More recently, at least part of that sometimes-esoteric research has begun to make the transition toward Yahoo!’s consumer-facing properties. FireEagle and, most recently, SearchMonkey are obvious examples of this transition. SearchMonkey, for example, has real potential to compellingly demonstrate the case for the Semantic Web and is likely to drive a scramble toward structured markup within the SEO sector.

It’s hard to believe that Microsoft and Google are not also actively engaged in this area, although their reticence in speaking about it creates a perfect opportunity for Yahoo! to make the most of the attention… for now.

SearchMonkey came out of hiding late last month, when Yahoo! CTO Ari Balogh introduced it to attendees at the Web 2.0 Expo in San Francisco. There’s a Developer event in Sunnyvale next week, and Yahoo! looks likely to be pretty visible at the Semantic Technology Conference in San Jose, 18-22 May.

It was in the context of the Semantic Technology Conference that I found myself in conversation with Peter Mika of Yahoo! Research earlier today. We talked about the potential for SearchMonkey before considering some of the issues posed by moving Semantic Web specifications such as RDF out of the relatively well behaved academic sphere and onto the open Web where honesty is not everyone’s priority. Peter is speaking at Semantic Technology, a few days after Yahoo!’s own SearchMonkey Developer event. I look forward to seeing how much more Yahoo! shares on those occasions. Peter did suggest that public access to SearchMonkey will be ’sooner than [we] think.’ Next week, maybe?

As Peter enjoins listeners at the end of our conversation, ‘Follow the Monkey!’ It will be interesting to see where it leads. I will also be intrigued to see whether Google and Microsoft quietly join the followers… or are found hiding behind a tree waiting for us when we get where we’re going.

April 29th, 2008

Sir Tim Berners-Lee addresses WWW2008 in Beijing

Posted by Paul Miller @ 6:48 am

Categories: Standards, Web 2.0, Web 3.0, Semantic Web, Semantic Web People, W3C

Tags: Web, Tim Berners-Lee, Channel Management, Marketing, Paul Miller

Great Hall of the People, BeijingSpeaking from the stage in China’s Great Hall of the People last Thursday evening, World Wide Web inventor and Director of the World Wide Web Consortium Sir Tim Berners-Lee shared some of his hopes for the Web with his audience of WWW2008 delegates, impeccably polite and ever-helpful conference volunteers and a collection of local dignitaries. Read the rest of this entry »

April 22nd, 2008

Linked Data on the Web, WWW2008

Posted by Paul Miller @ 8:40 pm

Categories: Semantic Web, W3C, Open Data

Tags: Web, Tim Berners-Lee, Channel Management, Marketing, Paul Miller

www2008 logoThe main programme of this year’s World Wide Web Conference gets underway here in Beijing today (Wednesday), but ahead of that yesterday was devoted to workshops.

With my colleague Tom Heath one of the co-chairs, a paper (pdf) from colleagues Rob Styles, Nadeem Shabir and (the absent) Danny Ayers, and a paper (pdf) from myself, Rob and Tom, my choice of workshop was an easy one to make; Linked Data on the Web.

Tom outlined some of his hopes for the day in a recent article, and the workshop website explains;

“The Web is increasingly understood as a global information space consisting not just of linked documents, but also of Linked Data. More than just a vision, the Web of Data has been brought into being by the maturing of the Semantic Web technology stack, and by the publication of large datasets according to the principles of Linked Data. During 2007, the size of the Web of Data has grown to several billion RDF triples which are served by a network of interlinked data sources and which cover domains such as geographic information, people, companies, online communities, films, music, books and scientific publications. In addition to publishing and interlinking datasets, there is also ongoing work on Linked Data browsers, Linked Data crawlers, Web of Data search engines and other applications that consume Linked Data from the Web.”

Opening the workshop, Sir Tim Berners-Lee emphatically declared,

“Linked Data is the Semantic Web done as it should be. It is the Web done as it should be.”

Presentations throughout the day went a long way toward validating Berners-Lee’s opening assertion. Many - on their own - only described a small part of the problem space, and few - on their own - would have proved compelling to those not on the ‘inside’ of this group. Brought together, though, the collection of resources, ideas, and demonstrations clearly illustrated the potential of a web on which distributed data are drawn together programmatically in order to enrich the user experience.

With the notable exception of Renault’s François-Paul Servant, the emphasis remained very much one of experimentation and research. As we move to the next phase, a number of assumptions need to change, and ‘inconvenient’ practicalities such as licensing, permissions, sustainability, persistence, and quality become increasingly important. The ad hoc mashup that, perhaps, doesn’t fully respect the letter of a third party’s terms and conditions may be acceptable for a demonstrator or proof of concept; the proposition becomes radically different when a service is being delivered to, by, or from an enterprise.

Fundamentally, as Berners-Lee has seen, the work of the Linked Data projects points to a very different way of thinking about the role that data plays in enabling the next phase of the Web. Many of today’s Web applications actually act in very similar ways to traditional offline applications; a single application uses a single user interface to provide access to information from a single store of data. That store of data may now be remote to the user, but the value of linkages is certainly not leveraged. The Linked Data projects demonstrate something else; any number of applications, exposed via any number of user interfaces (and machine readable apis), drawing upon data stored across the Web in any number of stores.

Working through the implications may take a while, but we saw some good starts yesterday, and repeated calls from OpenLink’s Kingsley Idehen to ‘make it real.’

Update: My colleagues Rob Styles and Nadeem Shabir provide their perspectives on the workshop. Other attendees sharing their perspectives include Alexandre Passant.

April 14th, 2008

A Semantic view of the Wikipedia for Data idea

Posted by Paul Miller @ 2:02 pm

Categories: Semantic Web, Open Data

Tags: Paul Miller

Last week CNet’s Dan Farber picked up on a post by ex-Googler Bret Taylor, entitled ‘We need a Wikipedia for data.’ Sarah Perez followed up on ReadWriteWeb with a useful roundup in ‘Where to find Open Data on the Web,’ and the usual flurry of interested individuals commented on each.

The notion of a ‘wikipedia for data’ is nothing new, and commenters were quick to point to such exemplars of the type as Metaweb’s Freebase.

However as we see ever-more mainstream implementation of Semantic Web ideas, the need for usable, addressable, linkable, persistent data becomes ever more pressing. Semantic Web technologies can be deployed inside the firewall with useful results, but the network effects should really begin to resonate at network scale; out on the open Web.

Bret’s post does a pretty good job, up front, of summarising the problem;

“I have come to realize how hard it is for a everyday programmer to get access to even the most basic factual data. If you want to experiment with a new driving directions algorithm, it is infinitely more difficult than coming up with an algorithm; you have to hire a lawyer and a sign a contract with a company that collects that data in the country you are developing for. If you want to write an open source TiVo competitor, you need television listings data for every cable provider in the country, but your options are tenuous at best.”

Data can be hard to obtain. It’s a legal minefield. Comprehensiveness is necessary… but virtually impossible. He goes on to highlight the dubious tactics of current data owners, many of whom make it prohibitively expensive to access commercial data or almost (and it’s an important almost) criminally difficult to access public domain data in useful form.

Bret continues, tellingly;

“I think all of these barriers to data are holding back innovation at a scale that few people realize. The most important part of an environment that encourages innovation is low barriers to entry. The moment a contract and lawyers are involved, you inherently restrict the set of people who can work on a problem to well-funded companies with a profitable product. Likewise, companies that sell data have to protect their investments, so permitted uses for the data are almost always explicitly enumerated in contracts. The entire system is designed to restrict the data to be used in product categories that already exist.”

Give that man a standing ovation. Exactly.

In a 2006 report on the Commercial Use of Public Information, the UK Government’s Office of Fair Trading suggested that;

“more competition in public sector information could benefit the UK economy by around £1billion [almost $2bn] a year.

The study found that raw information is not as easily available as it should be, licensing arrangements are restrictive, prices are not always linked to costs and PSIHs may be charging higher prices to competing businesses and giving them less attractive terms than their own value-added operations.”

Semantic Web applications thrive on data, and assertions about those data in the form of provenanced links from one resource to another. By locking data away, or by exposing crippled subsets of the whole via web interfaces that only a human might traverse, we miss these opportunities.

Yes, (some) businesses would suffer irreparable harm if they opened access to their money tree without also rethinking their Victorian business model. But the UK Government figures (and others) clearly suggest that business (and society) benefits from increased access to this contextual data, even if individual businesses might not.

Wealthy players such as Microsoft and the incumbent search engines might do much here (as they have begun to do with map data) to force a widespread shift in business model, away from enforced scarcity of supply toward plentiful supply and more innovative monetisation of value-added services atop the basic and increasingly commoditised data.

I can - and do - see value in the sort of approach taken by Freebase, in which they set out to become the canonical source of knowledge within a wide range of subjects, and their recent release of data dumps strengthens their case in my eyes.

Personally I am rather more persuaded by the aspirations of the Linked Data projects, which freely expose data on the Web, and actively encourage third parties to use and reuse their data, and to link to it, through it, and from it in an ever-richer web of relationships. As I argued in SemanticReport last year, ready access to data permits the Internet to move inside our next generation of applications in compelling and transformative ways.

Although not itself a Linked Data project, the relationships that Powerset is finding and manipulating in data sets such as those from Wikipedia, Freebase and WordNet is closer to this ideal… and more on Powerset soon.

I am persuaded that a single canonical space cannot succeed, except for a very short time or in a very narrow niche. Instead, we need resilient and distributed mechanisms that enable data to be made available, for that data to be found and enmeshed with other resources to create some new and unanticipated application beyond the ken of the data’s original curators.

We do, of course, need appropriate protections to ensure that any explosion of usable data does not see those data abused. For this, we turn to efforts such as the Open Data Commons, whose Open Data license my employer was involved in developing and financing.

Many of these topics are exactly the sort of thing with which the Linked Data projects have been grappling, and I shall be reporting (and speaking) from next week’s Linked Data on the Web workshop in Beijing, ahead of the main WWW2008 conference.

It’s a pity that Bret will not be with us, as I expect there to be a room full of people who would applaud his desire to see the data, whilst questioning the utility of the (one) DataWiki. The Web is, fundamentally, a distributed creature. It is predicated upon the link. So whilst there is utility in hosting data for those unwilling or unable to do so themselves, why require data to go anywhere before it can be used?

Link to data where it sits, link to it again, and put it to work. The result will be amazing.

March 28th, 2008

Why kill Google?

Posted by Paul Miller @ 3:56 am

Categories: Semantic Web, Semantic Web People, Semantic Web Companies, W3C

Tags: Google Inc., Semantic Web, Internet, Paul Miller

Linking Open Data project cloudTechnology journalists from the mainstream media appear obsessed with locating some magic bullet with which to topple Google from its dominant position in today’s Web, and use of violent language seems part and parcel of this obsession. Have Larry and Sergey done something to upset them? Did they all have Alta Vista stock? Have they been playing too much Halo? Or can they just not handle the fact that a company is doing pretty well in the stock market whilst actually managing to deliver a valuable user experience?

Whatever the reason, ‘Google Killers’ crop up with depressing regularity, even if (allegedly) you need to put words in the mouths of your commentators to find one.

I spoke with Powerset CTO Barney Pell last night, and one of the topics we explored was his company’s billing last year as a ‘Google Killer.’ More on that conversation in a later post, because for now I want to turn to a related item from overnight; Tim Berners-Lee’s latest post to his low-volume blog.

Tim talks about the attention that one of his recent flurry of press interviews has attracted. In this case, he was talking to British broadsheet, The Times, and notes of the article;

“the Times online mis-states that I think ‘Google could be superseded’. Sigh. In an otherwise useful discussion largely about what the Semantic Web is and how it will affect people, a misunderstanding which ended up being the title of the blog.”

He continues,

“The Semantic Web will not supersede the current Web. They will coexist. The techniques for searching and surfing the different aspects will be different but will connect. Text search engines [like Google] don’t have to go out of fashion.”
(my emphasis)

Noting the speed with which news stories such as that from The Times spread into the blogosphere, Tim comments on the difficulty that he is experiencing in getting the paper to correct its misrepresentation of his words and uses this as a trail into a wider consideration of data re-use online.

This (the ability to combine, recombine, use, reuse, link, link, and link again), he would appear to suggest, is the Semantic Web’s (forgive my slip into the language of violence) ‘Killer App.’

“The benefit of the Semantic Web is that data may be re-used in ways unexpected by the original publisher. That is the value added. So when a Semantic Web start-up either feeds data to others who reuse it in interesting ways, or itself uses data produced by others, then we start to see the value of each bit increased through the network effect.”

Bravo. I couldn’t agree more.

I must also admit, though, to being surprised at the extent to which too many ‘Semantic Web’ companies appear not to get this. Too many of those I speak with are proudly, happily, and expensively building yet another data silo. Semantics may run through their applications, and World Wide Web Consortium (W3C) specifications may even get a mention. But the essential primacy of linkable reuse outside the carefully managed boundaries of their application is greeted - at best - with carefully spun hedgeing and - at worst - with outright horror. “Why would some poor misguided user want to do anything outside the Nirvana that my application gives them? Are you mad?”

Tim gives due credit to the great work going on in the Linked Open Data project, and trails the associated workshop (at which I’ll be speaking, along with several of my Talis colleagues) at the Web Conference in Beijing next month.

I, for one, want to see more of the commercial Semantic Web startups embracing a lot more of those ideas. Linked Open Data as a university research project is one thing. Linked Open Data at the heart of a business model is something else entirely, and it appears to be something that either the investors or the startups are not yet taking seriously enough. This is the promise of the Semantic Web; Linking. If the Semantic Web only results in yet another generation of silos then what’s the point? It’s probably easier to build a silo using mySQL and some PHP. The investment in enhancing the linkability, the citeability, of a data resource can only be realised once third parties can link, and can cite.

Rant over, for now. But I really do want to hear from those who get it. And, like Tim;

“So in scanning new Semantic Web news, I’ll be looking out for re-use of data. The momentum around Linked Open Data is great and exciting — let us also make sure we make good use of the data.”

March 26th, 2008

illumin8-ing improvements for knowledge workers?

Posted by Paul Miller @ 6:54 am

Categories: Semantic Web

Tags: Journal, Elsevier, Illumin8, Productivity, Product Development, Research & Development, Business Operations, Paul Miller

The printing press was a pretty pivotal invention, challenging artificial limitations on the dissemination of ideas maintained by scriptoria and opening flood gates to the vibrant philosophical, social and technological innovations of the Renaissance, Reformation, and beyond.

From those early innovations, we entered a long period in which more and more of the advances in thought and practice were reported via papers printed in scholarly or professional journals; journals that were, for most people, too specialised and expensive to be accessed anywhere other than in a library. The number of journals grew, and various tools emerged to help us find the papers we needed. These tools tended to divide along subject or publisher lines, forcing the searcher to have prior knowledge of those journals (and their publisher) most likely to be of use. Various attempts were made to offer solutions capable of searching across more than one of these databases, but these were usually hampered by an unwillingness from the publishers to share sufficient data to drive any really useful searches. We only need to glance at the rather daunting lists of resources maintained by a University Library to see how far from ideal the current model is, with its emphasis upon the container (the journal) rather than the content (the article).

Having spent much of my own time at University finding excuses to avoid dealing with the wilfully (well, so it seemed!) obtuse way in which e-resources were carved up, I was of course interested when offered the opportunity to learn more about a ‘better way.’

Rafael Sidi, VP Product Development in the engineering and technology division at scientific publisher Elsevier, and Jens Tellefsen, VP Marketing & Product Strategy at semantic indexing company NetBase spent some time on the phone, introducing me to their new joint venture; illumin8. Read the rest of this entry »

March 25th, 2008

Semantic Web Gang forms, debates Semantic Web ‘readiness’

Posted by Paul Miller @ 2:43 pm

Categories: Podcasts, Semantic Web, Semantic Web Gang

Tags: Semantic Web, Internet, Paul Miller

With the increasing cacophony from ’semantic’ players in the technology space, it can be extremely difficult to work out what’s important, to identify the trends, and to make informed decisions about how any of this affects you and your business. As part of our contribution to bringing some clarity to the proceedings, I am delighted to announce the first (virtual) meeting of the new Semantic Web Gang. We’ll be recording a show each month, gathering the regular Gang and the occasional special guest to talk about the issues of the day, and taking a step back in order to consider the news in light of broader trends.

The first tranche of Gang members comprise;

I shall be adding additional members to this pool of regulars over the next few episodes, expanding the range of experience and insight represented here even further.

In this first meeting of the Gang, we talked about the current perception that the Semantic Web is ready for mainstream adoption, drawing upon recent statements from Sir Tim Berners-Lee, the announcement from Yahoo! of support for a number of Semantic Web specifications, and the SemanticHacker challenge that TextWise announced the day before our call.

We will be talking on the third Thursday of each month, so episode 2 will be recorded on April 17. I wonder what newsworthy items will come to our attention between now and then?

The audio for our conversation is available here, along with pointers to some of the resources mentioned during the call.

Update: ReadWriteTalk syndicates the Semantic Web Gang.

March 20th, 2008

Jim Hendler shares AI’s lessons for the Semantic Web

Posted by Paul Miller @ 2:34 am

Categories: Podcasts, Research, Standards, Semantic Web, Semantic Web People, W3C, Talking Semantics

Tags: Web, Vision, Hendler, Semantic Web, Internet, Paul Miller

Jim HendlerProfessor James A. Hendler goes by the daunting title of ‘Tetherless World Senior Constellation Professor’ at Rensselaer Polytechnic Institute (RPI) in Troy, New York. Behind the title stands a man who has been closely involved with Artificial Intelligence (AI) research for many years, and someone recognised as amongst the progenitors of the Semantic Web ideal. Hendler is also Associate Director of the Web Science Research Initiative (WSRI), an activity that is being pushed hard by Sir Tim Berners-Lee (a Director) and others.

I spoke to Jim recently, and in a wide-ranging conversation we touched upon early hype around the promise of Artificial Intelligence, conflicting aspirations for the Semantic Web Read the rest of this entry »

March 19th, 2008

TextWise offers $1million for an American semantic hack

Posted by Paul Miller @ 6:31 am

Categories: Commercialisation, Semantic Web, Semantic Web Companies

Tags: Concept, API, Paul Miller

semantichacker-logo_200×29shkl.pngErick Schonfeld at TechCrunch draws my attention to SemanticHacker. The site details an invitation from Rochester, NY-based TextWise to suggest compelling applications powered by their API, in return for a guaranteed payment of $100,000 and up to $900,000 in revenue from subsequent commercialisation of the winning idea.

The challenge starts today, and runs until 18 June 2008. Entrants must be based in the United States, which rather unfortunately excludes the Semantic Web research powerhouses in Europe and Asia.

Quoting from the site;

“What will make you a winner in the SemanticHacker Innovators’ Challenge?

  • Develop a software prototype, business plan or both that will have demonstrable commercial viability and the potential for significant financial impact on the application space to which it is applied.
  • Focus your submission on a vertical market. Areas such as finance, health and pharmaceuticals are just a few of the industries that might be a good place to start.”

At the heart of TextWise’s technological offer is an API they describe as “the world’s first open API for Semantic Discovery.” That strikes me as an assertion that probably needs to be ringed with caveats and footnotes if it’s to stand up to closer scrutiny. This API enables developers to draw upon SemanticSignatures, defined as;

“a representation of ALL concepts covered in a block of text. Each block of text contains semantic dimensions (”concepts”) with associated weights. The dimensions capture the strength of each concept in the text.

Semantic Signatures provide a weighted representation of the concepts contained in a piece of text. The weight of each concept represents the strength of that concept in the text. The Semantic Signatures for two pieces of text that both address the same subject will share many common concepts with high weights. Our technology can therefore recognize that these two pieces of text are related even though they share no common keywords.”

It will be interesting to see what sort of entries this competition attracts, and the fact that TextWise have taken this course speaks volumes to the increasingly crowded semantic text analysis market. There’s a lot of consolidation to come in this market segment, and the current players are working hard to draw attention to themselves. I, for one, would welcome some more effort devoted to explaining why they’re different.

At Talis, Paul Miller is active in raising awareness of new trends and possibilities arising from wider adoption of the Semantic Web. See his full profile and disclosure of his industry affiliations.

advertisement

Recent Entries

Most Popular Posts

advertisement

Archives

ZDNet Blogs

Popular white papers

advertisement
Click Here