Making Tacit Knowledge in Requirements Explicit

This research investigates techniques for analysing natural language requirements, in order to discover, manage, and mitigate the negative effects of tacit knowledge in requirements. Tacit knowledge is knowledge that we know we have but can't articulate, or knowledge that we don't know that we have but nevertheless use. As engineers concerned with the development of software and systems, however, we are taught to make our assumptions explicit, and indeed any kind of knowledge that is not made explicit makes our systems analysis more difficult and error prone.

The research adopts an empirical approach to characterise and elicit tacit knowledge, and a constructive, theoretically-grounded but user-driven approach to develop practical techniques and tools to guide analysts concerned with the development of precise requirements for software-intensive systems.

For more information, see MaTREx project.

Managing Legacy Biodiversity Literature

Legacy biodiversity literature spanning over 250 years old provides a vast quantilty of valuable content for taxonomic research. However, the information contained in most of the biodiversity publications is unstructured and human-only readable, and therefore is hard to be harvested by machines.

With the help of information extraction technqiues and machine learnign approaches, we attempt to develop automated tools to convert unstructured biodiversity literature into semantically-enable, machine-readable structured data to support automated recombination and re-purpose of biodiversity knowledge, e.g., species-specific document retrieval, linking biodiversity data across diverse sources, and taxonomic database curation.

For more information, see ComTax project.

Healthcare and Medical Information Processing

This research is aimed to extract clinical information, such as named entities (e.g., disease, drug), events, and relations (e.g., treatment_of, test_for), from the free text documents in Electronic Health Records (EHR) to support searching, summarization, decision-support, or statistical analysis.

Several different techniques were used to extract information, from simple pattern- or rule-based matching to complex processing methods, such as statistical methods and machine learning based on various types of clinical features.

For more information, see MIP project.

Information Extraction (IE) and Text Mining in Biomedical Literature

This line of research is concerned with developing novel text mining techniques to extract useful information and knowledge from unstructured text, particularly in the bioinformatics field. The goal of the research is to extract potentially useful term associations from biomedical literature for supporting biological knowledge discovery.

We investigated various text mining approaches to tackle different biomedical scenarios. Two Bio-NLP systems, biological entity identification and protein/gene name normalization, had been seperately developed using machine learning approaches, rule- and pattern-based matching, and term-based exact and approximate matching.

For more information, see Bio-MITA project.

Distributed Information Retrieval (DIR) and Web Search

The main thread running through my PhD research was to investigate advanced search engine infrastructure for information source selection in distributed information environments. In particular, my research was concerned with developing a topic-based database selection system by the use of topic hierarchy.

In the topic-based search framework, firstly, distributed textual databases are hierarchically categorised into a topic hierarchy for convenience of access and management. Secondly, two-stage database language models are presented to employ topic-based database selection within the context of the hierarchy of topics. Finally, the original selection result is further refined by a set of topic-based association rules. Moreover, to overcome the drawback of the keyword-based search, we propose a concept-based search mechanism to search distributed web databases using domain-specific ontologies.

For more information, see DIR project.