Machine Reading

An NLP research group at the UCL Computer Science department teaching machines how to read.

The amount of published information is growing rapidly. Much of this information comes in unstructured textual form that cannot easily be searched, mined, visualized and, ultimately, acted upon. The principal goal of our group are machines that can read and "understand" this textual information, converting it into interpretable structured knowledge to be leveraged by humans and other machines alike.

To achieve our goal we work in the intersection of Natural Language Processing, Machine Learning, Cognitive Science and Information Retrieval. We rely heavily on statistical methods of various flavours.

Our group is part of the UCL Computer Science department, affiliated with CSML and based in the London Media Technology Campus. We are organizing the South England Stat NLP Meetup. Get in touch if you're interested in attending.

  • Paper on Injecting Logical Background Knowledge into Embeddings for Relation Extraction accepted for oral presentation at NAACL!
  • Tutorial on Matrix and Tensor Factorization Methods for NLP accepted for presentation at ACL in Beijing!
  • EPSRC grant with the Interaction Lab accepted! If you are interested in working on applying imitation learning to natural language generation apply here.
  • UCLMR is co-organizing this year's AKBC workshop at NIPS 2014 in Montreal. Expect a great set of speakers!
  • Sebastian Riedel Lecturer

    Sebastian works in NLP and Machine Learning. He is particularly interested in helping machines to read more accurately by leveraging knowledge gathered through reading more accurately.

  • V. Ivan Sanchez 3rd year PhD Student

    I am working on learning interpretable models, such as decision trees and Bayesian networks, from Matrix Factorization models. I'm interested in probabilistic graphical models. I'm funded by CONACYT.

  • Marzieh Saeidi 3rd year PhD Student

    Marzieh is interested in urban knowledge extraction and in particular spatial analysis of language data extracted from social media.

  • Matko Bosnjak 2nd year PhD Student

    Matko's interest include the theory and application of machine learning (ML) methods, data, graph and text mining, ML in bioinformatics, graph theory, information retrieval and natural language processing.

  • Tim Rocktäschel 2nd year PhD Student

    Tim is interested in representation learning for NLP and automated knowledge base completion, and how such methods can take advantage of symbolic background knowledge.

  • George Spithourakis 2nd year PhD Student

    I am working on Multi-Instance Text Regression and learning weakly supervised word embeddings. I am interested in structured prediction, distributional semantics, neural models and optimisation. My secondary supervisor is Steffen Petersen and I am funded by the Farr Institute of Health Informatics Research.

  • Andreas Vlachos PostDoc

    I am working on automated fact checking in collaboration with the BBC. I am broadly interested in natural language understanding (e.g. information extraction, semantic parsing) and in machine learning approaches that would help us towards this goal.

  • Jason Naradowsky PostDoc

    Jason is working in collaboration with Google on leveraging knowledge bases of semantic relations and the web's vast quantity of unstructured text to guide the learning of latent variable models for NLP. He is interested in joint inference, graphical models, and natural language acquisition.

  • Luke Hewitt Master's student

    A final year 'Mathematical Computation' undergraduate, Luke is helping to build Wolfe. He has a particular interest in machine learning as a tool for neuroscience research.

  • Fact Checking

    Prime minster said: ”Our government has halved youth unemployment!” True or False? Fact checking is one of the main tasks performed by journalists, especially in an era in which information sources abound. In order to automate it, the main challenges are the open domain nature of the task and the importance of context (temporal, geographical, conversational). Andreas is looking at how to perform fact checking on statistical claims using matrix factorisation methods and distributed representations.

  • Multi-Instance Text Regression

    In this task, we try to predict a continuous variable (e.g. rating of a video) from a collection of descriptive texts (e.g. textual reviews). The challenge lies in i) the existence of off-topic texts ("why does he have oreos in his ears?") and ii) modeling the variation within relevant texts, especially those commenting on different aspects of the item ("I love his looks" vs. "I hate his voice"). George is taking a multi-instance approach for this problem, where some of the instances are irrelevant and thus should not influence the overall prediction. There are different possible domains of application, such as item reviews, sentiment analysis and clinical datasets.

  • wolfe is a framework for building rich machine learning models, based on functional programming, factor graphs, optimization and composition.
  • ucleed is a biomedical event extractor that ranked first in several tracks of the BioNLP 2011 shared task.
  • thebeast is a Markov Logic inference and learning engine.
  • What's Wrong With My NLP? is a visualizer for NLP problems.