Interview with Lorenzo Thione co-founder and product architect of Powerset, just a few days after Powerset, the semantic search engine, was launched
2008 at 10,30
published by ciaopeople
A few days after the launching of Powerset, the semantic search engine for Wikipedia (but not only) we interviewed Lorenzo Thione, co-founder and product architect of Powerset, the Californian company which has achieved this important technological innovation in the field of Semantic Search Engines.
Before going into the interview in which Lorenzo Thione explains in detail the work behind Powerset, how it runs and how it can improve the experience of online search users, I would like to briefly introduce his compelling career, for very specific reasons.
First of all, Lorenzo Thione, as you can guess from the name, is Italian and therefore doubly deserves our attention, because it shows what our compatriots are capable of given the right circumstances, when they are allowed to develop to their maximum potential, and secondly because his career based on Research in such a cutting edge field as that of natural language processing and semantics research is truly remarkable.
Lorenzo Thione left Milan, the city where he grew up, to move to San Francisco. He immediately focused his attention on the fields of natural language processing and computational linguistics, and in 2002 was hired at the laboratory born from the joint venture between Xerox and Fuji, one of the most prestigious and expert laboratories working with automatic texts and documentary research.
His work at the laboratory led him to study the software that resumes text and analyzes the parts of speech, and he discovered a strong interest for Question and Answering and for search engines in the field of semantics. (source: DataManager)
So basically, Powerset, the company of which he is co-founder, was created with the intention of changing the technology of online information research.
The launching of Powerset, the semantic search engine, for the moment only for Wikipedia, in fact demonstrates that the company is on the right track, and that thanks to the hard work and dedication of the researchers in the cutting edge field of semantic research, this product may very well change the current balance on web search engines.
How will this change occur? Let’s find out from Lorenzo Thione in this very interesting interview.
Could you briefly describe the technologies that Powerset develops?
Powerset is developing a completely new technology, based on language analysis, to innovate the quality of results obtained for users who perform searches on the Internet. In practice, Powerset leaves behind elements of traditional search engines, which are based only on the keyword concept and on web page graph connection analysis , and introduces a new set of elements related to the semantic connection between the search information that the user enters into the search engine and text of the documents indexed by Powerset. In addition to this new technology, which is based on years of research and innovation at Xerox PARC (the laboratory which among other things created the mouse, the graphical user interface and the ethernet protocol), Powerset is creating a new and more immersive experience, for the use of high-quality content, as is the case with Wikipedia.
As we read on the site, Powerset can “understand” our language and “answer” our questions. How is it possible for a search engine to develop human skills such as “understanding” and “answering”?
It is important to put these words into perspective. The technology we are developing does not “understand” or “answer” questions at the level at which a person can. It will take years, perhaps decades, before the software technology will really be able to achieve a complete mastery of any human language. That said, the level of “understanding” of current search engines however is pretty low. Search engines that rely only on keywords can exploit the benefits they offer (in terms of efficiency and simplicity) but are now reaching a new plateau in regards to improvement of quality of results. Users become more “sophisticated” every day and it is just when the need of users increases that traditional search engines are in trouble. Powerset is building a new search engine that combines the lessons of the past (such as keywords, and Web graph connection) with the emerging technology of Language Analysis (Natural Language Processing). In practice, when the Powerset spider, the piece of search engine that surfs the web in search of text to be indexed, finds a document that contains the text in English (as of now Powerset is available only in English, but our technology extends to many other languages which we will eventually add), it analyzes the text in detail by breaking it down into sentences and analyzing each one. Basically, the software does what Italian teachers do when they teach their middle school students to logically break down the grammar of the sentence, into subject, verb predicates and complements, and to try and analyze the role of each complement in context, and how to recognize the different types of complements. Once the software has carried out this analysis, some semantic components (called features or characteristics) are extracted and inserted into the selection algorithm and arrangement of results, in line with the other less linguistic components, such as the PageRank or keyword resemblance.
It is important to distinguish between the ability of humans to comprehend the meaning of language, and the level of sophistication that technology has reached today. If we approach the Powerset technology, expecting to obtain the same level of competence as a person, we are in for a big disappointment. But if we look at Powerset as an improved search engine, which makes that extra effort to use the search and document contents to extract higher quality connections, then we can be satisfied as “internet surfers “. For this reason we invite our users, or anyone who wants to try our product, to experiment with our search box (the text window where keywords are normally entered) and attempt to include not only questions, but also simple sentences, such as “movies with Dennis Quaid” or “protein content of bananas”, and even keywords, such as the name of a famous character, a place or a book that we particularly enjoyed. Our “improved” Wikipedia version yields very useful and easy to understand results for many of these searches.
How can Powerset improve the experience of online searching?
Powerset is using its semantic technology on three different levels. On the one hand – as I have explained so far - Powerset improves the quality of results for normal searches that users do on the Internet. The Powerset index is currently restricted to Wikipedia. In the future an actual Web index will be accessible, but our technology is more costly in terms of computations and we wanted to offer public access as soon as possible. While we continue to develop our algorithms and apply them to larger sets of available contents on the internet, we’ve decided to create a search engine for general information that could supersede competition, for all those users who regularly use wikipedia.
The reason why we chose Wikipedia does not depend on the quality of the text or the structure of the documents. Our technology is efficient enough that it can be used on the rest of the web, but since we wanted to initially confine it to a restricted set documents, we chose Wikipedia because it ranges over a wide spectrum of topics, from history to biology, from entertainment to literature, and so forth. Also, since the contents of Wikipedia can be republished, Powerset provides a version of wikipedia that while being continually updated and synchronized with the original sources, provides new functions as well.
For example, on Powerset all Wikipedia pages, present a side window that scrolls through the text together with the user and is always visible. It provides a summary of the text, a bar for semantic searches within the document, and the so-called Powerset Factz Summary, a kind of synopsis with short sentences or fragments of text that the search engine has extracted and aligned to the summary, to facilitate navigation and allow the user to see, at a glance where the searched information is found in the document. In addition to the above, Powerset has integrated the Freebase database, by another company located here in San Francisco, which promises to build a structured source of data created and maintained directly by the users, as in the case of Wikipedia. If you surf on Powerset and type in a search with the name of a famous character or a place, powerset presents a summary extracted from Freebase information on that specific subject. An example? Just try typing in Henry VIII (Henry the eighth, the British monarch).
The same abundance of information allows us to return information to users, such as all films produced by Steven Spielberg, or the height of the Eiffel Tower .
What is the main difference between previous semantic search engine experiments and Powerset?
In the past other companies have tried to create search engines that are based on the semantic structure. There are differences with any other past experiments, as well as with those that are still in progress. First of all, we are the first to carry out this detailed textual content analysis of the documents and to extract components that are indexed together with keywords to yield better results in a uniform and organized manner, along with the results, for example, originating from the keywords.
In the past, engines such as BrainBoost, have tried to expand the search by using keywords that were automatically added by the engine to increase the number of generated results (recall) and then filtered results with a technology similar to the one used by Powerset to indexize documents. This generated less than brilliant results, usually after a lengthy process, because most of the necessary actions were carried out after the user had pressed the key to initiate the search.
Powerset technology on the other hand, can be developed to full capacity to work with billions of pages and millions of users, returning results with the same speed that we are all used to with Google, Yahoo and other engines. Also, unlike, other experiments of the past, such as the START project (MIT) or the initial version of Ask Jeeves, the aim of Powerset is not to answer any question, but to yield the best selection of results and contents, and connect the contents to the request in the best possible way. This means that although Powerset may sometimes identify (in the text or Freebase) an actual answer, this is not always the case. What is important however, is that the relevance and quality of our results is better than that of other search engines that have access to the same contents.
The semantic search engine has always been considered suitable for specific fields of information with well-organized structures. In this sense Powerset has made important steps forward. Can it then be used for the whole web?
The technology has developed sufficiently to be used over the web, but for now we are focusing on Wikipedia. As I mentioned earlier, it is not a matter of quality, but of expenditure. The technology is costly in terms of computations and to apply it to the entire web requires greater financial resources (to purchase new computers to index the Web) and a few extra months.
After its launch some of the experts have predicted that Web Powerset will mainly be used in the field of business. What do you think?
The field of business is one we are keeping our eye on and the possibility remains open. But what we think is really exciting is the opportunity to build a better search engine for the daily requests of everyone. The innovation in search technology is long from over, indeed it has just started and to undermine this technology as being useful only for enterprises is to underestimate its potential.
Powerset is a great innovation compared to traditional search engines such as Google and Yahoo! which are based solely on keywords. Do you think that this innovative technology can influence and change the current balance on the web?
Absolutely. Ever since search engines have become more popular, users have become increasingly familiar with these tools, and their needs have increased. In the beginning this requirement was mainly due to the desire to have search engines that granted access to a growing number of documents, and their speed gradually increased. But for some years now, we’ve began to reach a new plateau in terms of both index size and speed of response. What users have tried harder and harder to find is an engine which that helps them choose the best keywords to use for their search. Moreover, the expectation that “whatever your interest, you can find it on the web” has created the Long Tail phenomenon, meaning that many searches on the Internet refer to very obscure niche contents and material, and therefore are less easy to achieve accurately with today’s algorithms. The length in terms of number of keywords in the average search on the Internet has continued to grow and we are confident that - when given the chance – users will appreciate the flexibility of a search whether is started with just a keyword or with a sentence or question. If users begin to understand that this way of searching is more powerful, easier to use and produces better results, then the behavior and needs of users will begin to change irreversibly . Looking back to this moment, five years from now, we will ask ourselves how the devil did we do keyword searches, and be satisfied with them!
Simona Fiore
Qbr Magazine Staff








































IT Ciaoblog