The amount of published information is growing rapidly. Much of this information comes in unstructured textual form that cannot easily be searched, mined, visualized and, ultimately, acted upon. The principal goal of our group are machines that can read and "understand" this textual information, converting it into interpretable structured knowledge to be leveraged by humans and other machines alike.
To achieve our goal we work in the intersection of Natural Language Processing, Machine Learning, Cognitive Science and Information Retrieval. We rely heavily on statistical methods of various flavours.
Our group is part of the UCL Computer Science department, affiliated with CSML and based in the London Media Technology Campus. We are organizing the South England Stat NLP Meetup. Get in touch if you're interested in attending.
Sebastian works in NLP and Machine Learning. He is particularly interested in helping machines to read more accurately by leveraging knowledge gathered through reading more accurately.
I am working on learning interpretable models, such as decision trees and Bayesian networks, from Matrix Factorization models. I'm interested in probabilistic graphical models. I'm funded by CONACYT.
Marzieh is interested in urban knowledge extraction and in particular spatial analysis of language data extracted from social media.
Matko's interest include the theory and application of machine learning (ML) methods, data, graph and text mining, ML in bioinformatics, graph theory, information retrieval and natural language processing.
Tim is interested in representation learning for NLP and automated knowledge base completion, and how such methods can take advantage of symbolic background knowledge.
I am working on Multi-Instance Text Regression and learning weakly supervised word embeddings. I am interested in structured prediction, distributional semantics, neural models and optimisation. My secondary supervisor is Steffen Petersen and I am funded by the Farr Institute of Health Informatics Research.
Jason is working in collaboration with Google on leveraging knowledge bases of semantic relations and the web's vast quantity of unstructured text to guide the learning of latent variable models for NLP. He is interested in joint inference, graphical models, and natural language acquisition.
Pontus works somewhere in the intersection between Natural Language Processing and Machine Learning. He is particularly interested in representation learning and is currently funded by a machine reading grant from the Allen Foundation.
After a PhD in Machine Learning and ten years as a researcher at Xerox Research Centre Europe, Guillaume recently joined the Machine Reading Group to pursue his long term research direction: teaching machine to understand language rather waiting that the machine learns it by itself.
Gerasimos is working on natural language generation from structured representations of weather data in collaboration with the BBC. His research interests revolve around natural language generation, question answering, text summarization, and generally machine learning and global optimization approaches in Natural Language Processing.
Prime minster said: ”Our government has halved youth unemployment!” True or False? Fact checking is one of the main tasks performed by journalists, especially in an era in which information sources abound. In order to automate it, the main challenges are the open domain nature of the task and the importance of context (temporal, geographical, conversational). Andreas is looking at how to perform fact checking on statistical claims using matrix factorisation methods and distributed representations.
In the text regression problem, given a document we want to learn how to predict a continuous quantity. The challenge is that not all parts of a document will be equally informative regarding the variable of interest. George is taking a multi-instance approach for this problem, where some of the instances are irrelevant and thus should not influence the overall prediction. There are different possible domains of application, such as aspect ratings, sentiment analysis and clinical datasets.
Currently weather sites host data (temperature, wind speed, humidity) concerning a multitude of cities and areas around the world. However, only a few locations feature textual weather forecasts as well. To automatically generate forecasts from the data itself, Gerasimos aims to develop a domain-independent Natural Language Generation (NLG) framework by imitating generation policies from unaligned corpora. This will be applied to a variety of domains, as well as weather reports and data obtained from the Met Office.