Knowledge Augmented Methods for Natural Language Processing and Beyond
The advent of pre-trained language models (PLMs) has indisputably revolutionized the field of natural language processing (NLP). Prior to their emergence, NLP research predominantly revolved around feature extraction and architecture engineering. However, the introduction of PLMs instigated a paradigm shift, with the spotlight now focusing on pre-training and fine-tuning approaches, evolving recently towards prompt-based methodologies. Impressively, language models pre-trained on a broad spectrum of web data have shown an exceptional capacity to internalize a range of parametric knowledge, including factual and commonsense knowledge.
Despite the substantial progress in the domain of PLMs, they are not immune to certain limitations. Notably, they struggle with memorizing infrequent information, are susceptible to hallucinations, and often experience temporal degradation. Furthermore, PLMs are inherently unable to capture the entirety of the continuously evolving world knowledge due to their constrained parameter size. The NLP community has recently seen a surge in interest to enrich language models with non-parametric knowledge, yielding state-of-the-art results across diverse benchmarks. Unlike conventional PLMs that solely leverage their parametric knowledge, these methods resort directly to relevant external non-parametric knowledge, such as retrieved documents from Wikipedia, to enhance the language model's understanding of the input data. Furthermore, the non-parametric knowledge is acquired explicitly in a plug-and-play manner, without requiring extensive retraining, leading to great scalability. Simultaneously, it is also crucial to acknowledge that the scaling of model parameters has led to significant improvements in the parametric knowledge contained within very large PLMs, such as GPT-3. This facet of PLMs offers unique benefits, including inherent reasoning and deductive capabilities based on multi-dimensional knowledge acquired from varied sources, which cannot be dismissed. Additionally, language models that forego the retrieval of non-parametric knowledge demonstrate enhanced efficiency.
This thesis endeavors to address limitations present in contemporary PLMs and offers a two-fold contribution to overcome these limitations. The first part of the thesis examines issues resulting from an excessive dependency on parametric knowledge in PLMs. Such issues include ineffective memorization of infrequent information, hallucination, and lack of language diversity. To overcome the limitations, the thesis introduces new approaches which involve integrating non-parametric knowledge resources, like dictionaries, knowledge graphs, and unstructured text, into language models. The integration is achieved through carefully crafted pre-training goals and fine-tuning methods. The latter half of the thesis tackles with the issue of irrelevant retrievals during non-parametric knowledge integration. In response to this, the thesis introduces a novel pipeline that generates context documents instead of retrieving them from the web. To deal with issues related to low-diversity context generation and the generation of unrealistic responses, the thesis further introduces clustering-based prompt strategies and feedback mechanisms. Moreover, the thesis provides an extensive comparison of parametric and non-parametric knowledge, relying on both qualitative and quantitative analysis, illuminating their impacts and integrations across different tasks. In conclusion, the thesis paves the way for future research, contributing significantly to the advancement of this crucial field.
History
Date Modified
2023-08-04Defense Date
2023-07-05CIP Code
- 40.0501
Research Director(s)
Meng JiangCommittee Members
Nitesh Chawla David Chiang Heng Ji Scott YihDegree
- Doctor of Philosophy
Degree Level
- Doctoral Dissertation
Alternate Identifier
1392285960OCLC Number
1392285960Additional Groups
- Computer Science and Engineering
Program Name
- Computer Science and Engineering