Towards Explainable and Trustworthy Traceability
Software traceability establishes associations between diverse software artifacts such as requirements, design, code, and test cases. Due to the non-trivial cost of manually creating and maintaining links, many researchers have proposed automated approaches to recover the underlying links. The challenge of achieving precise, complete and trustworthy traceability has been extensively investigated and discussed by researchers in the traceability community. As a result, various approaches built upon Information Retrieval, Machine Learning and Deep Learning have been proposed. However, while most of these methods focus on mining statistical features from the artifacts, few studies have been conducted to incorporate semantics into automated traceability models.
The objective and contribution of our study focuses on two important aspects. Firstly, we expect to deliver higher quality trace links by bridging the semantic gap during link generation procedures. Secondly, we establish an explainable trace model where the rationale of accepting and denying a candidate link within the trace model, can be directly visualized in order to increase human comprehension.
In this dissertation, we therefore explored the possibility of combining conventional traceability techniques with cutting edge automated concept model construction techniques to create semantically enhanced traceability. In our methods, we constructed concept models using unsupervised learning and text mining to extract semantic knowledge from a domain corpus. The established concept models were leveraged by the trace model as an external knowledge base to analyse the semantic relevance among software artifacts. We adopted this method to address two open challenges of generating trace links in bilingual project environments and supporting domain-specific traceability.
To approach our second goal, we adopted data mining and natural language processing techniques to build a fully automated pipeline to construct a knowledge base that could be used to generate trace link rationales. We integrated the concept model into our visualization tool in order to highlight related concepts in source and target artifacts. We then used this tool to explain why two specific artifacts were linked together as a result of a trace query. We quantitatively evaluated our approach using project artifacts from three different domains by reporting coverage, correctness, and potential utility of the generated definitions. Then from a quality perspective, we report results from a user study that was conducted to evaluate the effectiveness of the explanation interface. Results showed that the explanations presented in the interface helped non-experts to understand the underlying semantics of a trace link and improved their ability to vet the correctness of the link.
History
Date Modified
2021-12-16Defense Date
2021-10-01CIP Code
- 40.0501
Research Director(s)
Jane Cleland-HuangDegree
- Doctor of Philosophy
Degree Level
- Doctoral Dissertation
Alternate Identifier
1288578654Library Record
6155072OCLC Number
1288578654Program Name
- Computer Science and Engineering