You are planning a trip for spring break with five of your high school buddies who are spread out all over the country. One person in the group is allergic to excessive sunlight while another cannot handle cold climates well. You would all like to fly in to a common meeting point and then drive from there. And all of it must be accommodated on a student budget.
As matters stand today, this trip would involve hours or even days of planning – poring over travel web sites, car rental sites, vacation spots, airline bookings and many, many hours spent on the phone on a five-way conference call to figure out what is best for the group. The information that is required for this effort is available but spread out among the billions of documents that constitute the “world wide web.”
Enter the “semantic web,” often referred to as “Web 3.0.” The idea, according to a recent New York Times article by John Markoff, is to find a way to mine this vast treasure-house of human intelligence to get answers that actually make sense.
The world wide web consortium, defines the semantic web as, “the web of data with meaning in the sense that a computer program can learn enough about what the data means to process it.”
According to Markoff, the world wide web can currently been seen as a catalog of information at best. Users need to know exactly what they are looking for before they can get reasonable responses to their queries – this usually means that they get results for string-based searches. They type in a string or a sequence of strings and search engines like Google return the closest match to those sequence of strings. There is hardly any meaning assigned to what the user is searching for.
Hence, the goal for researchers is to add this additional layer of meaning on top of the existing web that would make it more of a guide and basically create a foundation for systems that can reason in a human fashion. This would involve a level of artificial intelligence where machines do some thinking, instead of just following simple commands.
N.C. State’s computer science department is involved in research related to the semantic web. The lead is taken by the “Multiagent Systems and Service-Oriented computing” lab headed by Dr. Munindar Singh, a professor of computer science-engineering.
Nirmit Desai, a PhD student in the computer science department, is one of the students associated with this lab.
“The Semantic Web, a term coined by Sir Tim Berners-Lee, is used to denote the next evolution step of the Web,” he said. “Associating meaning with content or establishing a layer of machine understandable data would allow automated agents, sophisticated search engines and inter-operable services – will enable higher degree of automation and more intelligent applications.”
“The ultimate goal of the Semantic Web is to allow machines the sharing and exploitation of knowledge in the Web way, i.e. without central authority, with few basic rules, in a scalable, adaptable, extensible manner.”
Markoff’s article states that big names in industry, such as IBM and Google, have been working towards such ideas. Their initial focus seems to be on simple applications such as creating complete vacation packages, to predicting the next hit movie or record album.
The idea behind “Web 3.0” is that of looking for meaningful results to queries.
For example if a user was to type in, “I’m planning a vacation package for five people during spring break. Two from Raleigh, one from San Francisco and one each from Boston and Chicago. Not too much sunlight and not too cold. We would all like to fly in to a common location and rent a car. Total budget is $1000 per person.” The semantic web would find and create a complete vacation package involving meticulous detail as if it had been prepared by a human being.
There is no way to handle this level of automation using the current set up of the web.
“The most direct application that comes to mind is a search application”, Sarat Kocherlakota, a PhD student in the computer science department, said. “The semantic web is like a linked network, where each word or topic leads to another one that it is related to based on some underlying meaning and relevance.
“For example, one of the links to the word ‘Swiss’ could be ‘physicist’, hence forming the concept of ‘Swiss physicist’. So, when someone looks up ‘Swiss physicist’, the name Albert Einstein would show up, even if the page about Einstein did not contain the phrase ‘Swiss Physicist’ in it.”
Kocherlakota believes that the network of links and topics will grow over a period of time and is a dynamic process.
“The need for such a technology arises from the fact that the web is an open and heterogeneous environment,” Desai said. “Open because anyone can contribute to it, and heterogeneous because the world does not completely agree with each other on a common meaning of information being contributed to it.
“How can a computer infer that a ‘stock quote’ is the same as ‘share price’ and neither of these have anything to do with inventories or sharing?,” he asked. “These are the kinds of questions that the semantic web technology addresses.”
Desai states that tools and languages are already available to build the foundation for web 3.0. “OWL” (web ontology language), allows the description of concepts based on their properties and relationships to other concepts.
“‘Protege’ is a tool developed at Stanford University, which allows specification, validation and inferencing based on OWL ontologies,” he said.
The “Multiagent Systems and Service-Oriented computing” lab contributes towards the semantic web community by addressing applications in the business process management area. The aim of their research is to produce tools and techniques that allow organizations to dynamically inter-operate on the web.
Singh also offers a course on service-oriented computing that teaches the fundamentals of the semantic web.
Kocherlakota points towards existing web-based applications, such as the Pandora music player, that probably already use rudimentary ideas underlying semantic web concepts. Pandora is able to play music that a person likes based on music she heard before and tagged as being good. It is able to infer about her music tastes based on what someone preferred first.