On The Insider: Robert Pattinson's New Leading Lady
BNET Business Network:
BNET
TechRepublic
ZDNet

Category: Research

August 10th, 2009

Moving Data.gov towards the Semantic Web

Posted by Paul Miller @ 3:46 am

Categories: Open Data, Podcasts, Research, Semantic Web, Semantic Web People, Standards, Talking Semantics, Web 3.0

Tags: Government, Semantic Web, Podcasts, RDF, XML, Internet, Software/Web Development, Web Development, Paul Miller

Government transparency in all its forms would appear to be very much in vogue at present, spanning everything from the Obama administration’s Data.gov portal and Prime Ministerial pronouncements in the UK Parliament to municipal proclamations of openness in Vancouver and compelling grass-roots demonstrations by activists and even newspapers.

At the heart of many of today’s initiatives lie programmes to surface Government data for use and re-use by third parties. The ‘open’ in ‘Open Data’ is, of course, a very loaded term, and I’ve looked before at some of the ways in which data might become ‘open’ whilst remaining effectively useless. Nevertheless, Governments’ current enthusiasm for being seen to embrace transparency should certainly be both welcomed and encouraged, and there are real opportunities to work with Government in ensuring that today’s transparency fervour continues undiminished, whether by omission or commission.

Given the complex and varied nature of the data involved, and the obvious linkages between the entities (you and I, our communities, our schools, our hospitals) described in numerous different databases, there’s a clear opportunity for technologies and approaches from the Semantic Web community to play a significant role in simplifying the whole process of moving these legacy databases online.

Already interested in Open Government from previous roles, and (obviously!) committed to encouraging real-world adoption of semantic technologies, I’ve spent some time recently talking to a number of those involved. A number of those conversations are now available as podcasts, and I’ll continue to seek out fresh examples and perspectives to share.

My most recent podcast conversation, released today, is with Professor Jim Hendler and Dr Li Ding of the Tetherless World Constellation at Rensselaer Polytechnic Institute in Troy, NY. The team at Rensselaer have been working with some of the US Federal Government’s data sets on Data.gov, and so far they’ve converted sixteen data sets from their original form, resulting in 2,927,398,352 freely available RDF triples and a number of demonstration applications.

Other conversations already released in the series include;

  • David Eaves, talking about Vancouver’s commitment to Open Data
  • John Sheridan, Head of e-Services at the UK Government’s Office of Public Sector Information, talking about his Department’s efforts to get Government data online
  • Mark Birbeck, talking about work with the UK Government’s Central Office of Information to embed lightweight RDFa into workflows and web pages

Each offers an example of ways in which ‘open data’ contributes to Government transparency, or to increasing the value of the massive sunk investment in collecting, managing and curating the data upon which Governments depend. The Semantic Web’s notion of Linked Data (whether actually in RDF or not! :-) ) offers a means to increase the utility of the data we have, without a massive programme of reengineering the systems used to manage it. The examples we see today, and the work of the individuals and teams with whom I have been speaking, will teach us a lot about how to make this work at Government scale.

June 17th, 2009

Semantic Search Round Table at the Semantic Technology Conference

Posted by Paul Miller @ 9:40 am

Categories: Research, Semantic Web, Semantic Web Companies, Web 3.0

Tags:

Wednesday’s opening Keynote here in San Jose sees Guidewire’s Carla Thompson joined on stage by senior representatives from many of the more interesting players in the Semantic Search space; Tomasz Imielinski from Ask, Peter Norvig from Google, Riza Berkan of Hakia, Scott Provost from Microsoft, William Tunstall-Pedoe of the UK’s True Knowledge, and Andrew Tomkins of Yahoo.

Carla asks each panellist to describe the differentiating aspects of their product in ‘one or two sentences;’

Tomasz; “we receive about three times as many questions as other search companies. We want to answer questions the best we can from multiple sources… using structured and unstructured data.”

Scott; “Bing really focusses on understanding the intent behind queries, and organising the page to help people get to their answer much faster.”

Peter; “We focus on being comprehensive, accurate and fast… so we have to keep on innovating in crawling, ranking, systems engineering. One thing that differentiates us… most companies decide whether to focus on marketing or sales. We focus on engineering.”

Riza; “We are a complete semantic search engine, from the bottom up. We don’t even have an index. We’ve optimised the entire process for semantic operations. We focus on credible and dynamic content, and offer users a new perspective.” Instead of popularity, they focus on credibility.

William; “True Knowledge is a platform that does direct question answering. There’s a knowledge base and an inference engine to answer questions we haven’t seen before.” True Knowledge tries to ‘help when it can, and stay quiet when it can’t,’ as can be seen demonstrated in their recently released Firefox plugin.

Andrew: “Yahoo! is very aggressive about semantic annotation… SearchMonkey is about acquiring semantic information and surfacing it in search results on the page.”

Carla mentions Tom Tague’s keynote from yesterday, where he suggested that ’semantic search is an answer to a question no one is asking’… so “why do we need to change search?”

Tomasz responds, suggesting that users don’t necessarily demand new products that subsequently become successful. eg; no one was asking for the iPod before it launched. “When they see it, they will want it.”

Turning to Google and Yahoo!, Carla asks them “why do we need to change search?”

Peter… “as an industry, satisfaction is very high… but that is just because that’s what people know [now]… People don’t like technology… people like solutions. When we deliver it, people will want it.”

Andrew; “Does search need to change? It already is… Today, on any major search engine, if you search for a restaurant, you’ll see structured information about that restaurant; reviews, phone number, etc… This has been accelerating over the last 3-4 years… When we put this information up, and trigger it correctly, we see far higher levels of engagement from our users than anything else.”

Carla; “it may be a stupid question, but it has to be asked; what is semantic search?”

Scott; “it means a lot of different things. At Powerset we focussed on understanding the meaning in web pages, so we could present them, rank them…”

Carla; “Has Powerset’s focus been diluted by the [Microsoft] acquisition?”

Scott; “No.”

Carla asks Riza; “Someone from Hakia that I spoke to last year said you were the only one doing ‘true semantic search.’ Is that true?”

Riza; “No… Semantic Search can enrich search results… Semantic Search can improve precision/disambiguation… Semantic Search can organise results better. In the future, search will move to more conversational systems, and for that you really need semantic technology.”

Carla; “How do you measure the ’semanticity’ of a search engine?”

Tomasz; “That’s my favourite question… We took a sample of ‘equivalent’ queries from the logs, and ran it to evaluate ranking etc; does the search engine give similar answers to questions like ‘Top 10 songs’ and ‘Top Ten songs,’ etc. Should they?”

Andrew; “It’s incredibly hard to understand what a user will like… if you mess with the logo, it changes the perception of the results… if you make tiny changes, it can have a big impact on perception… When it comes to understanding semantic contact in search, we should identify the task the user is trying to solve… and have a metric that’s aligned to that use case… We can break search queries today into different classes; how do we do when a user is trying to book dinner, or a vacation? Semantic Technology should be judged on its impact based on these task metrics rather than any underlying notions of entity resolution, etc… SearchMonkey, for instance, lets users inject structured data into the process… The information can be incorporated in any way… and change how the results are presented. We have about 15,000 people in our development community, changing the way those results are presented every day.”

Tomasz; “I would expect a semantic search engine to deliver equivalent results to queries that would appear similar to a human being; ‘Top 10 songs’ and ‘Top Ten Songs’ should deliver the same answer. Today in most mainstream search engines they don’t.”

Carla; “Search v. Answers. True Knowledge is billed as ‘the Internet Answer Engine;’ is it necessary to move search to an answer-based format, or has Google trained users to think in keywords?”

William; “We support both keyword search and full-text questions. It’s important to answer users’ questions.”

Peter; “Different types of answers are appropriate for different types of questions; sometimes the answer is a fact, or a page, or a series of results to support a process of study. To say there’s going to be one technology or one type of answer doesn’t make sense.”

Riza; “You could be asking a ‘where,’ ‘why,’ ‘how’ type of question. Questions are important, and the search engine needs to be able to interpret the mode of the question and return results appropriately.”

Carla; “You mentioned talking about the credibility of search results. How do you define a ‘credible’ search result, and how much of a need is there really? I’m not hearing users question the credibility of search results they see today.”

Riza; “Practically, credibility is important in ’serious’ subjects; medical information, etc. You want to know where the results come from and how credible they are. When it comes to credible content, you can’t really do a statistical search or have a ‘popularity vote,’ because much credible content isn’t ‘popular.’

Scott; “People’s expectations for credibility are different depending upon the query. If you ask an ‘instant answer’ type query you expect the answer to be credible. If you do a broader search, you expect a mix of results to be returned”

William; “If a system understands structured knowledge, it can understand when different sources contradict one another”

Riza; “A system doesn’t need to know what’s credible; we can go to a librarian for that. Hakia doesn’t decide whether a resource is credible or not; we use librarians for that”

Tomasz; “If you ask for the capital of Japan we expect a single answer. If you ask about taxes, maybe the IRS is the best source but there are others. If you ask ‘how to get rid of acne’ you expect a lot of results.”

Carla; “We’ve seen three news-making launches in the past month; Wolfram Alpha, Bing, Siri. Is Wolfram the first step towards 2001? How is this engine valuable to those of us who don’t need to solve complex maths?”

Scott; “it’s not the first step… we’ve been working on these problems for a long time. There are a lot of questions people want to ask about the types of data that Wolfram aggregates… We see these things as part of full-search services. Powerset has moved along this path as well, pulling structured data in response to full-text queries.”

William; “Wolfram is a tremendous effort. An interesting example of question answering with structured data. I think people will find uses for it in particular use cases; I spoke to someone who’d used it to calculate when his visa expired, because it could do date calculation. I think there will be use cases in various scenarios; maths, nutrition information, etc… if you remember that it has that sort of information and remember to go to it… However one thing it doesn’t have is a decent back-fill. If it doesn’t have the data, or doesn’t understand the way you asked the query, it gives you nothing. We try to keep quiet and fail over to standard internet search in that sort of circumstance.”

Carla; “Does a semantic search engine know how not to answer a question?”

William; “that’s absolutely fundamental. You need the ability to reliably keep quiet when you don’t have the answer… and fail over reliably to other search services. [True Knowledge does try to do this...] “That requires very high quality semantics.”

Andrew; “One way to characterise the approach of Wolfram Alpha is that it’s a centralised approach. The Wolfram Alpha team goes out to find data and bring it in-house to convert to a standard form. A different approach is to have an ecosystem contributing data in the public eye… It’s not clear yet how much of a value-add is going to come from this centralised knowledge mapping approach. Yahoo! is focussed on the ecosystem approach, and helping people with knowledge to make it available.”

Peter; “Our inclination would be that we don’t want a closed walled garden. We want all the information available to combine in different ways. We want the information to be open, and the tool set to be open for mashing up in different ways.”

Scott; “If Wolfram Alpha hadn’t taken a walled garden approach they might never have launched a product.”

Tomasz; “Wolfram Alpha is great, but it’s not a search engine”

Carla; “Siri… caused a lot of buzz, uses True Knowledge… what are your thoughts?”

Andrew; “To be counter-cultural… the notion of getting much deeper and assisting a user with a task is spot on. We’re going to see much more of that. Search has tended to be stateless. Each query you enter is more or less processed without context. Yahoo! is rolling out more stateful search tools, and other companies will do the same. We expect people to use these tools on lots of devices. Would be expect people to come to the same place for purchase, navigation, etc? Do we expect one interface? There are going to be virtual assistants… I just don’t know if they’re going to be embedded into a search box.”

Scott; “Conversation is the ultimate user interface… but it’s not clear that I want to have a conversation with my laptop during the working day. How do I display the results? But there’s a huge role for conversation and dialogue in refining search and getting a user to their results faster.”

Tomasz; “What is the goal of Siri? If you try to go to broad you become a search engine.”

Scott; “When people have a conversational interface, they won’t speak in keywords.”

Carla; “What are the larger goals for Bing?”

Scott; “Bing is trying to simplify key tasks that people do when they come to a search engine. In travel, health, shopping, we can understand what people are trying to do, and get them to better results faster. The thinking has evolved from ten blue links to the whole page, and organising things to help the user by understanding their tasks.”

Carla; “Peter; what did you think of Bing?”

Peter; “I like the idea of innovation in the user interface. There’s a lot of room for that. There’s been a lot of emphasis on getting the ranking right. You still need to do that, but other things are important too. I’m usually happy with results on my big screen. On a mobile device, I’m usually not happy with the results I get.”

March 4th, 2009

Dame Wendy Hall talks about Web Science

Posted by Paul Miller @ 6:17 am

Categories: Podcasts, Research, Semantic Web, Semantic Web People

Tags: Web, Computer Science, Computer, Dame Wendy Hall, Channel Management, Productivity, Marketing, Paul Miller

I must admit that I’ve tended to be rather sceptical about the whole topic of ‘Web Science,’ as proposed by the University of Southampton and MIT through their shared Web Science Research Initiative (WSRI).

My initial view was that we really don’t need yet another academic subject just to ‘permit’ us to study the Web, and that we’re perfectly well served by the Computer Scientists, Anthropologists, Sociologists, Economists, Psychologists and Neuroscientists that already seek to understand both the Web and its impact upon all of us.

Dame Wendy Hall is Professor of Computer Science at the University of Southampton, and currently President of the Association for Computing Machinery (ACM). Together with Sir Tim Berners-Lee, Professor Nigel Shadbolt and Daniel J. Weitzner she is a Founding Director of the Web Science Research Initiative.

I spoke with Wendy last week to learn more about Web Science and her views on the Semantic Web, and the result has just been released as a podcast.

During the conversation she speaks persuasively of the need to bring researchers from diverse disciplines together in a space that is not labelled ‘Computer Science,’ and to find the hooks that will appeal to groups and individuals put off by the nature of gatherings where Computer Science - and Computer Scientists - tend to dominate.

So maybe Web Science isn’t an unnecessary ‘new’ subject, but a label for something that’s already happening; a label that provides institutional credibility to an area of research whilst simultaneously allowing the Anthropologist working to understand our use of Twitter to reassure her friends that she’s really not doing Computer Science!

December 8th, 2008

Mark Greaves of Vulcan sees business opportunities in the Semantic Web

Posted by Paul Miller @ 11:35 am

Categories: Commercialisation, Investment, Open Data, Podcasts, Research, Semantic Web, Semantic Web Companies, Semantic Web People, Standards, Talking Semantics, W3C

Tags: Knowledge, Vulcan, Mark Greaves, Semantic Web, Podcasts, Strategy, Aerospace & Defense, Internet, Management, Manufacturing

Vulcan shares many traits with its reclusive founder, Paul Allen, yet behind the scenes the company is responsible for philanthropic support to research and community-building activities, as well as investing commercially in the likes of Radar Networks (the company behind Twine) and Evri.

Last week, I had the opportunity to talk with Mark Greaves, Vulcan’s Director of Knowledge Systems Research, and the resulting podcast was released earlier today.

Drawing upon a background that includes the likes of Boeing and DARPA, Greaves is persuaded of the benefits to be found in applying semantic technologies to existing business problems and processes.

Greaves identifies four broad areas ripe for development;

  • Search
  • Enterprise Information
  • Social Semantic Web Applications
  • Web-scale Knowledge Publishing

It will be interesting to see the extent to which Vulcan - and others - invest in these areas next year.

October 23rd, 2008

Hans Rosling helps us hear the music with beautiful data

Posted by Paul Miller @ 7:24 am

Categories: Open Data, Podcasts, Research

Tags: Data, Music, Hans Rosling, Semantic Web, Podcasts, Internet, Paul Miller

It’s not strictly ‘about’ the Semantic Web, which isn’t even mentioned in this 45 minute podcast. Nevertheless, Prof. Hans Rosling’s point of view chimed so strongly with almost everything that interests me about the Semantic Web’s potential that I thought it worth sharing here to see if you agree.

Many of you have probably seen the recordings of Rosling’s TED Talks from 2006 and 2007, and although I had my concerns going into it, I’m glad to report that his energy and enthusiasm come through just as strongly when channelled via the spoken word.

All the data, all the uris, all the links in the world are no more than a rather pointless researcher’s plaything if we cannot provide the means to allow those data and connections to speak in ways we understand.

October 15th, 2008

Can the Semantic Web help education?

Posted by Paul Miller @ 1:56 am

Categories: Podcasts, Research, Semantic Web

Tags: Education, Semantic Web, Internet, Paul Miller

Dr Jason Ohler certainly thinks so, and discusses his views in a recent paper and during a conversation that we recorded earlier this week.

Both Jason and I struggled to think of examples where educationalists and educational technologists are already working with these technologies, or seriously considering doing so.

Can anyone think of examples?

October 7th, 2008

Whisky, Whiskey, and the Semantic Web ?

Posted by Paul Miller @ 2:06 am

Categories: Open Data, Research, Semantic Web, Standards

Tags: Tim O'Reilly, Ontology, Semantic Web, Strategy, Internet, Management, Paul Miller

2046233502_f15cd9199b_m.jpgHuge ontologies and taxonomies that attempt to boil the ocean and describe ‘the sum of human knowledge’ tend to make me deeply uncomfortable. At the other end of the scale, though, there is clearly a place for reaching some shared understanding on how we describe things. Where does the line lie between ‘deeply uncomfortable’ and ‘clearly a place’? I’m not sure… but I know the extremes when I see them.

It was in this vein that my colleague, Tom Heath, recently devoted his time to working with like-minded peers from across the Semantic Web community to organise, host, and participate in the first VoCamp.

As Tom writes in his report on the Nodalities blog;

“We need more vocabularies because people are increasingly motivated to share their data online, and need some way of describing the data itself in a structured fashion. If people use the same vocabularies when describing data of the same type, or at least some of the same terms, it makes sharing and integrating those data sets much easier.”

And yes, one of those upon which the group chose to focus their attention was an ontology for whisky. Not, perhaps, fully in the spirit of Tim O’Reilly’s recent call for us to tackle the problems that ‘matter,’ but a useful learning exercise for this disparate group of technologists.

VoCamp moves to Ireland next month, and I look forward to seeing whether the group can move beyond the inevitable disputes over spelling the name of their favourite tipple to embrace O’Reilly’s call.

Whisky glass image (c) Kyle May, 2007.

May 8th, 2008

Peter Mika offers bananas at Yahoo! Research

Posted by Paul Miller @ 6:30 am

Categories: Podcasts, Research, Semantic Web, Talking Semantics

Tags: Yahoo! Inc., Semantic Web, Internet, Paul Miller

Yahoo! SearchMonkey logoYahoo! are certainly being a lot more open than competitors such as Google and Microsoft when it comes to talking about their use of semantic technologies. They’ve been active for several years in recruiting stalwarts of the Semantic Web community such as Dave Beckett, and there is a long tradition of the company’s employees actively contributing to the research side of the Semantic Web world.

More recently, at least part of that sometimes-esoteric research has begun to make the transition toward Yahoo!’s consumer-facing properties. FireEagle and, most recently, SearchMonkey are obvious examples of this transition. SearchMonkey, for example, has real potential to compellingly demonstrate the case for the Semantic Web and is likely to drive a scramble toward structured markup within the SEO sector.

It’s hard to believe that Microsoft and Google are not also actively engaged in this area, although their reticence in speaking about it creates a perfect opportunity for Yahoo! to make the most of the attention… for now.

SearchMonkey came out of hiding late last month, when Yahoo! CTO Ari Balogh introduced it to attendees at the Web 2.0 Expo in San Francisco. There’s a Developer event in Sunnyvale next week, and Yahoo! looks likely to be pretty visible at the Semantic Technology Conference in San Jose, 18-22 May.

It was in the context of the Semantic Technology Conference that I found myself in conversation with Peter Mika of Yahoo! Research earlier today. We talked about the potential for SearchMonkey before considering some of the issues posed by moving Semantic Web specifications such as RDF out of the relatively well behaved academic sphere and onto the open Web where honesty is not everyone’s priority. Peter is speaking at Semantic Technology, a few days after Yahoo!’s own SearchMonkey Developer event. I look forward to seeing how much more Yahoo! shares on those occasions. Peter did suggest that public access to SearchMonkey will be ’sooner than [we] think.’ Next week, maybe?

As Peter enjoins listeners at the end of our conversation, ‘Follow the Monkey!’ It will be interesting to see where it leads. I will also be intrigued to see whether Google and Microsoft quietly join the followers… or are found hiding behind a tree waiting for us when we get where we’re going.

March 20th, 2008

Jim Hendler shares AI's lessons for the Semantic Web

Posted by Paul Miller @ 2:34 am

Categories: Podcasts, Research, Semantic Web, Semantic Web People, Standards, Talking Semantics, W3C

Tags: Web, Vision, Hendler, Semantic Web, Internet, Paul Miller

Jim HendlerProfessor James A. Hendler goes by the daunting title of ‘Tetherless World Senior Constellation Professor’ at Rensselaer Polytechnic Institute (RPI) in Troy, New York. Behind the title stands a man who has been closely involved with Artificial Intelligence (AI) research for many years, and someone recognised as amongst the progenitors of the Semantic Web ideal. Hendler is also Associate Director of the Web Science Research Initiative (WSRI), an activity that is being pushed hard by Sir Tim Berners-Lee (a Director) and others.

I spoke to Jim recently, and in a wide-ranging conversation we touched upon early hype around the promise of Artificial Intelligence, conflicting aspirations for the Semantic Web Read the rest of this entry »

March 17th, 2008

Looking for a dominant Semantic Web search engine

Posted by Paul Miller @ 4:27 am

Categories: Podcasts, Research, Semantic Web, Standards

Tags: Search Engine, Semantic Web, Search, Internet, Paul Miller

Despite the continuing efforts of Microsoft, Yahoo! and others, Google remains the dominant horizontal search engine for most people, most of the time. In the United States, comScore reports 58.5% of searches during January were via a Google property. In the Semantic Web space, search is far less established and a number of much smaller sites offer their own solutions to the problem of locating appropriate semantic content from across the open web. Whether those sites are complementary or competitive seems to depend upon one’s perspective, and it is also interesting to ponder the extent to which Yahoo!’s recent announcement is an attempt to position themselves as the search engine of choice for the growing web of semantically enriched content. Read the rest of this entry »

Paul MillerPaul Miller provides consultancy and analysis services at the interface between the worlds of Cloud Computing and the Semantic Web. See his full profile and disclosure of his industry affiliations.


Email Paul Miller

Subscribe to The Semantic Web via Email alerts or RSS.

SponsoredWhite Papers, Webcasts, and Downloads

advertisement

Recent Entries

Top Rated

    Premier Vendor Content Whitepapers, webcasts & resources from our Power Center Sponsors

    Archives

    Favorite Links

    ZDNet Blogs

    White Papers, Webcasts, and Downloads

    Enterprise Applications

    • Check out some of the easiest and most powerful ways to boost productivity while saving money on your application infrastructure. See ZDNet's comprehensive Enterprise Application resource center, now!
    • New Online Dashboard
    • Read about top issues IT decision-makers face every day, plus get cost effective solutions to real life IT problems. Oracle Topline