The 15th edition of the International Conference on Web Engineering has been held in Rotterdam, and has been centered around the theme of “Engineering the Web in the Big Data Era”, hereby highlighting the impact Big Data has on Web engineering research. Big Data promises new data usages contributing to a change in our business practices. As the Web is a valuable producer and consumer of Big Data, it is imperative to analyze what are the implications of the Big Data phenomenon on Web engineering.
On the second day of the main conference, I have been presenting the paper “Generating Semantic Snapshots of Newscasts using Entity Expansion“, a joint work between EURECOM and CWI in the Netherlands.
In this paper, we presented an approach for automatically generating News-cast Semantic Snapshots. By following an entity expansion process that retrieves additional event-related documents from the Web, we have been able to enlarge the niche of initial newscast content. The bag of retrieved documents, together with the newscast transcript, is analyzed with the objective of extracting named entities referring to persons, organizations, and locations. By increasing the size of the document set, we have increased the completeness of the context and the representativeness of the list of entities, reinforcing relevant entities and finding new ones that are potentially interesting inside the context of that news item. In particular, we have analyzed different ranking algorithms in order to verify which ones brings more entities contained in a News Items gold standard, which has been also proposed and published in this paper. The evaluation has shown the strength of this approach and outperformed the two studied baselines. Slides are available at my Slideshare account:
There were very interesting keynotes from Peter Mika and Enrique Alfonseca. The talk from Enrique was specially related with the subject of my Phd. Under the title "News Understanding for Knowledge Graph Freshness", Enrique described the work he has conducted at Google on understanding news, including two different systems and architectures for learning paraphrases of event patterns that they use for news understanding and headline generation from news collections. From a web-scale corpus of English news, they have mined syntactic patterns that a generative model generalizes into event descriptions. At inference time, they query the model with the patterns observed in an unseen news collection, identify the event that better match and automatically produce updates for the knowledge graph, and retrieve the most appropriate pattern to generate a headline. He gave some good hints on the main challenges they see to move forward in news understanding.
The poster and demo session was very interesting as well, with more than 20 different works showcasing the lastest trends on Web engineering applied to data sharing, education, user experience and business.