Auto-Sizing

Neural Networks

Before training a neural network, decisions about the size of the network need to be made. Frequently this is non-obvious and hard to select. Instead of choosing in advance, auto-sizing uses complex regularizers to prune neurons during training. Our toolkit (PyPI, GitHub) integrates with any pytorch model with only 3 lines of code.

"Auto-Sizing Neural Networks: With Applications to n-gram Language Models" Kenton Murray and David Chiang. EMNLP 2015
"Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation" Kenton Murray, Jeffery Kinnison, Toan Q. Nguyen, Walter Scheirer, and David Chiang. WNGT 2019
"Efficiency through Auto-Sizing: Notre Dame NLP’s Submission to the WNGT 2019 Efficiency Task" Kenton Murray, Brian DuSell, and David Chiang. WNGT 2019

GitHub

Beam Search

Length and Exposure Bias

Neural Machine Translation systems favor shorter translations &emdash; often even if they are incorrect. I am interested in the theoretical and experimental reasons as to why.

"Correcting Length Bias in Neural Machine Translation" Kenton Murray and David Chiang. WMT 2018

Paper

Domain Adaptation

Neural networks often require large amounts of training data in order to get good performance. I am also interested in how to leverage out-of-domain data to improve performance in low-resource settings.

"Gradual Fine-Tuning for Low-Resource Domain Adaptation". Haoran Xu, Seth Ebner, Mahsa Yarmohammadi, Aaron Steven White, Benjamin Van Durme, and Kenton Murray. Adapt-NLP 2021 (to appear)

Decompostional Semantics

Joint Semantics and Syntax

Decompositional Semantics is a semantic representation where annotation is accomplished through simple questions. In work with Elias Stengel-Eskin and others, we have shown how semantic representations can improve syntactic parsing and vice versa.

"Joint Universal Syntactic and Semantic Parsing". Elias Stengel-Eskin, Kenton Murray, Sheng Zhang, Aaron Steven White, and Benjamin Van Durme. TACL 2021 (to appear)

Dissertation and Theses

Masters

A Semantic Scan Statistic for Novel Disease Outbreak Detection

Advisor: Daniel Neill
Committee: Daniel Neill Chris Dyer Roni Rosenfeld

Anomalous pattern detection is a popular subfield in computer science aimed at detecting anomalous items and groupings of items in a dataset using methods from machine learning, data mining, and statistics. For anomaly detection tasks consisting of geospatially and temporally labeled data, spatial scan statistics have been successfully applied to numerous spatiotemporal data mining and pattern detection problems such as predicting crime waves or outbreaks of diseases [12, 7, 14, 15]. However, spatial scan statistics are limited by the ability to only scan over a structured set of data streams. When spatiotemporal data sets contain unstructured free text, spatial scan statistics require preprocessing data into structured categories. Manual labeling and annotating text can be time consuming or infeasible, while automatic classification methods that assign text fields into a pre-defined set of event types can obscure the occurrence of novel events - such as a disease outbreak with a previously unseen pattern of symptoms - potentially drowning out the signal of the exact outliers the method is attempting to detect.

In this thesis, we propose the Semantic Scan Statistic, which integrates spatial scanning with unsupervised topic modeling to enable timely and ac- curate detection of novel disease outbreaks. We discuss some of the inherent challenges of working with free text data in an anomalous pattern detection framework, and we present some novel approaches to the problem using topic models by focusing on specifically adapting topic modeling algorithms to enable anomaly detection. We evaluate our approach using two years of free-text Emergency Department chief complaint data from Allegheny Country, PA, demonstrating the efficacy of the Semantic Scan Statistic and the benefits of incorporating unstructured text for spatial event detection. Using semi-synthetic disease outbreaks, a common evaluation method of the disease surveillance field, we show the ability to detect outbreaks of diseases is over 25% faster than current state-of-the-art methods that do not use textual information.

PhD

Learning Hyperparameters for Neural Machine Translation

Advisor: David Chiang
Committee: Walter J. Scheirer Meng Jiang Alexander Rush

Machine Translation, the subfield of Computer Science that focuses on translating between two human languages, has greatly benefited from neural networks. However, these neural machine translation systems have complicated architectures with many hyperparameters that need to be manually chosen. Frequently, these are selected either through a grid search over values, or by using values commonplace in the literature. However, these are not theoretically justified and the same values are not optimal for all language pairs and datasets.

Fortunately, the innate structure of the problem allows for optimization of these hyperparameters during training. Traditionally, the hyperparameters of a system are chosen and then a learning algorithm optimizes all of the parameters within the model. In this work, I propose three methods to learn the optimal hyperparameters during the training of the model, allowing for one step instead of two. First, I propose using group regularizers to learn the number, and size of, the hidden neural network layers. Second, I demonstrate how to use a perceptron-like tuning method to solve known problems of undertranslation and label bias. Finally, I propose an Expectation-Maximization based method to learn the optimal vocabulary size and granularity. Using various techniques from machine learning and numerical optimization, this dissertation covers how to learn hyperparameters of a Neural Machine Translation system while training the model itself.

Bachelors

Summarization by Latent Dirichlet Allocation: Superior Sentence Extraction through Topic Modling

Advisor: David Blei
Committee: David Blei Andrea LaPaugh

Latent Dirichlet allocation, or LDA, is a successful, generative, probabilistic model of text corpora that has performed well in many tasks in many areas of Natural Language Processing. Despite being perfectly suited for Automatic Summarization tasks, it has never been applied to them. In this paper, I introduce Summarization by LDA, or SLDA, which better models the subtopics of a document leading to more pertinent, relevant, and concise summaries than other summarization methods. This new approach is competitive with the leading methods in the field and even outperforms them in many aspects. In addition to SLDA, I introduce a novel, paradigm-shifting, evaluation technique of summarization that does not rely on gold-standards. It overcomes many of the challenges imposed by inherent disagreements amongst people of what a good summary is by evaluating over large numbers of people using the commercial service, Mechanical Turk. Overall, this paper lays the ground work for transforming the conventions of the Automatic Summarization field by challenging many definitions.

Research Interests

For a complete list of my publications, please check out google scholar.

Also check out my lab's website here.

Auto-Sizing

Neural Networks

Beam Search

Length and Exposure Bias

Domain Adaptation

Decompostional Semantics

Joint Semantics and Syntax

Dissertation and Theses

Masters

PhD

Bachelors

Contact Me

merci

sulpayki

gracias

شكرا لك

tack så mycket

Research Interests

For a complete list of my publications, please check out google scholar.Also check out my lab's website here.

Auto-Sizing

Neural Networks

Beam Search

Length and Exposure Bias

Domain Adaptation

Decompostional Semantics

Joint Semantics and Syntax

Dissertation and Theses

Masters

PhD

Bachelors

Contact Me

merci

sulpayki

gracias

شكرا لك

tack så mycket

For a complete list of my publications, please check out google scholar.

Also check out my lab's website here.