Research approaches for keywords suggestion have been around for quite some time. The need to help the users chose their keywords for tagging, web search and similar task lead to the development of a number of ways to suggest relevant keywords. Today, with the advent of web advertising, the finding relevant keywords has got a completely new dimension, as suggesting keywords no longer means just helping the user navigate on the Web, but also means driving the relevant visitors to your Web page. More and more services offer to suggest you the relevant keywords that cost less in advertising campaigns and that can pull you more traffic. However, there is an important dimension that those approaches have been missing out, and that significantly improve the way we discover new relevant keywords – it is their meaning. In this blog post I talk about how we use this important dimension for our keyword discovery needs at hypios, and report about the interesting results we have had.
hyProximity
The existing keyword suggestion approaches rely on (a) co-occurrence of terms in text corpora; (b) co-occurrence in search results; (c) controlled taxonomies such as Open Directory Project (ODP), and controlled vocabularies such as Wordnet. The approaches (a) and (b) both provide quite limited potential for discovery of unknown keywords, as they are based on co-occurrence. In other words, they try to look at terms that someone else has already used in combination with your initial terms, and suggest them. This approach does not allow to discover terms that are rarely used in combination with your initial terms, but that are very close in meaning. This is important, as the language we use on the Web is highly dependent our own community of practice/thought. Going beyond the terms used by people similar to us, is very difficult if we rely solely on co-occurrence. Approaches of type (c) have more potential as they they do not use co-occurrence based statistics, but rely on taxonomies and vocabularies. However, ODP is a Web directory, and thus the relations between terms are defined by Web browsing practice. There might be semantic relations between terms, which are not commonly browsed together, and thus would not appear in ODP. Wordnet is on the other hand more oriented at finding synonyms, and remotely related terms fall ourside of its scope.
For these reasons, we have turned to a Semantic Web-based approach, using DBPedia – a Semantic Web version of Wikipedia, to discover relevant terms. In DBPedia, terms – concepts, are grouped in categories by their meaning. As such this source of encyclopedic knowledge should enable the discovery of the keywords that are semantically related, but that an average user might not even know about.
Our system, developed by Milan Stankovic, uses the distance between two terms in the graph of DBPedia semantic concepts, to calculate their semantic relatedness, called hyProximity. The shorter the distance in the graph, the higher the hyProximity. The more links the two concepts share, the higher the hyProximity will be.
Case Study
We have used hyProximity in our own use-case in hypios, and have obtained very interesting results. Our standard procedure, when we have a new innovation problem on hypios is to take the keywords related to the problem, and look for experts in our giant, cross domain, 900.000 expert base, aggregated by the SolverSurfer a system developed by Werner Breitfuss. Finding keywords relevant to the problem, that do not appear in the problem text is important in order to reach the relevant experts in most diverse domains, who might be able to bring an innovative solution. We have used hyProximity to obtain additional keywords for expert search, and compared those keywords with what we get from AdWords KeywordTool for the same inputs.
The SolverSurfer yielded 1802 experts using the keywords directly present in the problem text; 2849 experts with hyProximity keywords, and 2061 experts using the keywords from AdWords keyword tool. The most interesting phenomenon is that the overlap between the experts identified by hyProximity and AdWords keywords is very low. Finally, we measured the interest expressed by the identified experts (through their response to our e-mails). The response rate obtained in the hyProximity group was 10% greater then with the AdWords keywords, and 19% greater then with the keywords present directly in the text.
This result leads to a conclusion, that there is a significant number of semantically related keywords, that fall completely out of scope of the co-occurrence based keywords suggestion approaches. If you trust that the non-semantic keyword suggestion approaches are giving you all the relevant keywords, then you are missing out on a lot of relevant traffic.
We are preparing a research publication and a public beta version of our tool, and will be disclosing more experiences with using semantic technologies for keyword discovery soon.
ontologies.hypios.com
Our website ontologies.hypios.com is back online. You may find all the ontologies used to describe hypios problem data there.
site downtime
Dear visitors, we apologize for the recent downtime of our research website due to a server migration.
Our website onotlogies.hypios.com is still undergoing migration and will be back online soon.
Linked Data Related to Competence
evidences of users’ competence in the Linked Data Cloud
By now many datasets have been published in the Linking Open Data community project. Interlinked among each other, those datasets form an enormous and growing Web of Data that hosts much valuable information. Linked Data Cloud, maintained by Richard Cyganiak and Anja Jentzsch provides a visual overview of datasets in the cloud and their connections.
The purpose of mapping the evidences of competence on the Linked Data Web is to extract a subset of Linked Data Sets that is relevant for determining a user’s interests and competence in a particular domain; and eventually his/her capability to solve particular problems.
Hypios VoCamp Paris, 13-14 May
Dear fellow researchers, Semantic Web enthusiasts, citizens of the Web,
it is my pleasure to invite you to the second VoCamp in France, and the first ever in Paris which will take place on 13th and 14th May. The VoCamp is generously sponsored by Hypios.com – a young and innovative company that runs a marketplace for problems and innovative solutions. Our research department is working on Semantic Web technologies to support the problem solving networks, and it is our great pleasure to host this VoCamp and gather with fellow Semantic Web researchers.
For those unfamiliar with VoCamp, VoCamp is a series of free informal events where people can spend some time creating and maintaining lightweight vocabularies/ontologies/thesauruses for the Semantic Web/Web of Data/Linked Open Data. (see http://vocamp.org/ ) The VoCamp idea is influenced by BarCamp but is oriented to hands-on technical work and practical outputs to publish new vocabularies. The emphasis of the events is not on creating the perfect ontology in a particular domain, but on creating vocabularies that are good enough for people to start using for publishing data on the Web. VoCamps are free for participants.
HypiosVoCamp Paris is organized by Alexandre Monnin [1] and myself [2].
Hurry up and register while there are still places left:
http://vocamp.org/wiki/HypiosVoCampParisMay2010
Please feel free to distribute this announcement further.
[1] http://ceppa.univ-paris1.fr/spip.php?article67
[2] http://milstan.net

