Interactive biomedical ontology matching

Xingsi Xue; Zhi Hang; Zhengyi Tang

doi:10.1371/journal.pone.0215147

Abstract

Due to continuous evolution of biomedical data, biomedical ontologies are becoming larger and more complex, which leads to the existence of many overlapping information. To support semantic inter-operability between ontology-based biomedical systems, it is necessary to identify the correspondences between these information, which is commonly known as biomedical ontology matching. However, it is a challenge to match biomedical ontologies, which dues to: (1) biomedical ontologies often possess tens of thousands of entities, (2) biomedical terminologies are complex and ambiguous. To efficiently match biomedical ontologies, in this paper, an interactive biomedical ontology matching approach is proposed, which utilizes the Evolutionary Algorithm (EA) to implement the automatic matching process, and gets a user involved in the evolving process to improve the matching efficiency. In particular, we propose an Evolutionary Tabu Search (ETS) algorithm, which can improve EA’s performance by introducing the tabu search algorithm as a local search strategy into the evolving process. On this basis, we further make the ETS-based ontology matching technique cooperate with the user in a reasonable amount of time to efficiently create high quality alignments, and make use of EA’s survival of the fittest to eliminate the wrong correspondences brought by erroneous user validations. The experiment is conducted on the Anatomy track and Large Biomedic track that are provided by the Ontology Alignment Evaluation Initiative (OAEI), and the experimental results show that our approach is able to efficiently exploit the user intervention to improve its non-interactive version, and the performance of our approach outperforms the state-of-the-art semi-automatic ontology matching systems.

Figures

Citation: Xue X, Hang Z, Tang Z (2019) Interactive biomedical ontology matching. PLoS ONE 14(4): e0215147. https://doi.org/10.1371/journal.pone.0215147

Editor: Xiangtao Li, Northeast Normal University, CHINA

Received: November 6, 2018; Accepted: March 27, 2019; Published: April 17, 2019

Copyright: © 2019 Xue et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are all contained within the manuscript.

Funding: This work was supported by Natural Science Foundation of Fujian Province (No. 2016J05145), Open Fund of Key Laboratory of Hunan Province for Mobile Business Intelligence (No. 2015TP1002), Program for New Century Excellent Talents in Fujian Province University (No. GY-Z18155), Program for Outstanding Young Scientific Researcher in Fujian Province University (No. GY-Z160149) and Scientific Research Foundation of Fujian University of Technology (No. GY-Z17162).

Competing interests: The authors have declared that no competing interests exist.

Introduction

Ontologies have gained much importance in the past two decades, especially in the biomedical domain. Various biomedical ontologies such as Gene Ontology (GO) [1], National Cancer Institute (NCI) Thesaurus [2], Foundation Model of Anatomy (FMA) [3], and Systemized Nomenclature of Medicine (SNOMED-CT) [4] have emerged and been maintained, which have been widely used in the medical records annotation [5], medical data formats standardization [6], medical or clinical knowledge representation and integration [7], and medical decision making [8]. Due to continuous evolution of biomedical data, biomedical ontologies are becoming larger and more complex, which leads to the existence of many overlapping information. For example, NCI ontology defines the concept of “Myocardium” related to the concept “Cardiac Muscle Tissue” in FMA ontology, which describes the muscles surrounding the human heart. Since the utilization of these overlapping information is necessary for the integration, aggregation, and inter-operability among ontology-based biomedical systems, it is necessary to find the correspondences between these information, which is commonly known as biomedical ontology matching. However, matching biomedical ontologies is computationally intensive task with quadratic computational complexity [9], which arises from their characteristics: (1) biomedical ontologies often possess tens of thousands of classes, (2) biomedical terminologies are complex and ambiguous, frequently the same biomedical concept has several names, or the same terminology can be applied to two different entities. Although this challenge has attracted the interest of the community such as Ontology Alignment Evaluation Initiative (OAEI) which includes specific tracks on matching biomedical ontologies, the research on it is still in its infancy.

To efficiently match biomedical ontologies, it is critical to reduce the search space, which can improve the matching efficiency and the potential alignment’s quality. Recently, researchers have proposed various resolutions to reduce the search space, which mainly focus on the utilization of clustering and blocking strategies to reduce the search space [10–13]. Although divide-and-conquer strategy is a feasible solution for the large-scale ontology matching problem, it has two main issues: (1) the ontology partitioning algorithm cannot control the size of blocks, which may be too small or too large for matching, (2) the ontology partitioning process would make the elements on the boundaries of blocks lose some semantic information, which directly affect the quality of the alignment. Moreover, since none of the existing similarity measures can distinguish the biomedical concepts in all contexts, the user knowledge should be utilized in an automatic ontology matching process to ensure the quality of the final matching results [14]. To this end, a number of interactive ontology matching methods are developed, and various strategies on user interaction exploitation are presented. AgreementMakerLight (AML) [15] employs an interactive selection algorithm, which utilized the alignments returned by various ontology matchers to detect suspicious mappings. Above the threshold 70%, AML queries the user for suspicious mappings, otherwise, it rejects all the suspicious mappings. AML ensures that the reasonable workload for the user by setting the query limit as 45% of the determined correspondences for small scale ontology matching tasks, and 15% for the others. ALIN [16] generates an initial set of candidate correspondences, and requires the user to validate them. If the user judges a candidate mapping as correct, it will be moved to the final alignment. Then, ALIN removes all candidate mappings that are not consistent with the approved correspondences. The interactions continue until there are no more candidate correspondences left. LogMap [17] presents problematic mappings to the user for validation, and the validated results are utilized to detect the conflicts with already found mappings. LogMap allows to pause the user interaction and continue the validation work in the future. XMap [18] cooperates with the user in the post-matching steps to filter the final alignment. It uses two thresholds to implement this procedure, where the mappings with similarity value higher than the upper threshold are directly added to the final alignment, and those mappings with similarity values lower than the lower threshold are presented to the user for validation. The above interactive ontology matching systems exploit the user involvement in either pre-matching or post-matching phrase, and do not take the error made by a user into consideration, which can not ensure the quality of the ontology alignments.

Due to the complexity of the ontology matching problem (large-scale optimal problem with lots of local optimal solutions), Evolutionary Algorithm (EA) can present a good methodology for determining the ontology alignments [19]. The most notable one that utilizes EA to match ontologies is GOAL (Genetics for Ontology ALignments) [20], which determines the optimal weights to aggregate different alignments determined by various similarity measures. Alexandru et al. [21] further proposes to optimize both the aggregating weights and the threshold for filtering the final alignment to improve the alignment’s quality. GAOM (Genetic Algorithm based Ontology Matching) [22] tries to directly optimize the ontology alignment through the fitness function. However, the slow convergence and premature convergence are two main shortcomings of these EA-based matchers, which make them incapable of effectively searching the optimal solution for biomedical ontology matching problems. To improve the efficiency of EA-based matcher, in this paper, an Evolutionary Tabu Search (ETS) algorithm is proposed, which can improve EA’s performance by introducing the tabu search algorithm as a local search strategy into the evolving process. This marriage between global search and local search allows keeping high solution diversity via EA (reducing the possibility of the premature convergence) and increasing the convergence speed via the local search (improving the solution quality and thus makes the solutions approach to the optimal solution more quickly). On this basis, we further propose an interactive biomedical ontology matching technique, which can make the ETS-based ontology matching technique cooperate with a user in a reasonable amount of time to efficiently create high quality matchings, and makes use of EA’s survival of the fittest to eliminate the wrong correspondences brought by erroneous user validations. In particular, the contributions made in this paper are as follows:

An interactive framework is proposed to match biomedical ontologies in an iterative way,
An ETS-based ontology matching technique is presented to implement the efficient automatic matching process, which can adaptively determine the timing of getting a user involved,
A hierarchy-based approach is presented, which can make use of partial biomedical concept mappings to reduce the algorithm’s search space.

The rest of the paper is organized as follows: Section 1 presents the framework of interactive biomedical ontology matching; Section 2 shows the automatic biomedical ontology matching technique based on ETS; Section 3 presents the interactivity during the evolving process of ETS; Section 4 presents the experimental studies and analysis; finally, Section 5 draws the conclusions and presents the future work.

1 Interactive biomedical ontology matching framework

In this work, the proposed interactive biomedical ontology matching framework is shown in Fig 1. As can be sen from the figure, three working phases, i.e. initialization, ETS-based ontology matching, and user interaction, are outlined by dotted-line boxes. A rectangle inside the dotted-line box represents a working step, and a rectangle with a picture outside the dotted-line box indicates the input or output data, e.g. source and target ontologies, reference alignment and evaluation result. Specifically, the description of three working phases is given as follows:

Initialization: before matching biomedical ontologies, the anchors (high-confidence concept correspondences), are presented to a user for validation to initialize the Partial Reference Alignment (PRA),
ETS-based matching process: ETS algorithm is utilized to match the biomedical ontologies in an iterative way, and when the evolving process gets stuck, the algorithm will get a user involved,
User interaction: the candidate correspondences are presented to a user for validation, and the validated results are further used to update PRA, elite, and reduce the search space of ETS through the hierarchy-based approach.

Download:

Fig 1. Interactive biomedical ontology matching framework.

https://doi.org/10.1371/journal.pone.0215147.g001

2 Evolutionary tabu search algorithm based biomedical ontology matching

2.1 Biomedical ontology matching problem

A biomedical ontology O can be defined as a 5-tuple (C, P, I, A), where C, P, I, A are referred to the set of classes, properties, instances and axioms, respectively. In general, class, property and instance are referred to as entities. A correspondence can be defined as a 3-tuple (e₁, e₂, n), where e₁ and e₂ are the entities from two ontologies, n ∈ [0, 1] is the similarity value between e₁ and e₂. The correspondence set is called an ontology alignment A, and PRA is a set of correct correspondences that are provided by a domain expert [23]. Given a partial reference alignment PRA, a partial alignment A_p is the subset of A which contains all elements in A and shares at least one class with an element in PRA [23]: (1)

Given an alignment A′, whose recall, precision and f-measure on PRA [24] are defined as follows: (2) (3) (4)

On this basis, the optimal model of biomedical ontology matching problem is defined as follows: (5) where |O₁| and |O₂| refer to the cardinalities of two biomedical ontologies O₁ and O₂ respectively, x_i, i = 1, 2, ⋯, |O₁| is the i-th correspondence.

In the next, the ETS algorithm is presented in details to solve this problem and implement the automatic ontology matching process.

2.2 Partial reference alignment initialization

In this work, we utilize the HashMap http://en.wikipedia.org/wiki/Hash_table to determine the anchors, i.e. the entities with identical labels. In particular, firstly, each class of source (or target) ontology is stored in the source (or target) HashMap as the key and its label is the value associated with the key. Then, the values of source HashMap are used to query the target HashMap to determine the highly similar mappings, whose time complexity is O(n). Finally, a user is asked to validate the anchors, and those are judged as true will be further utilized to construct the PRA.

2.3 Evolutionary tabu search algorithm

Since modeling the biomedical ontology matching problem is a complex (nonlinear problem with many local optimal solutions) and time-consuming task (large scale problem), particularly when the number of biomedical concepts is significantly large, EA can represent an efficient approach for addressing it. However, the slow convergence and premature convergence are two main shortcomings that make EA-based ontology matcher incapable of effectively searching the optimal solution for biomedical ontology matching problem. Starting from these considerations, this work proposes an ETS algorithm which combines EA (global search) and tabu search algorithm (local search) to implement the automatic searching process, which can keep high population diversity and increase the convergence speed via the local search. For the sake of clarity, the pseudo-code of ETS algorithm is presented as follows:

Algorithm 1 Evolutionary Tabu Search Algorithm

Initialize the Generation t = 0;

Initialize the Population P_t;

Evaluate(P_t);

while t < MaxGeneration do

Evaluate();

localSearch();

saveElite();

t = t + 1;

end while

In the next, three key components of ETS algorithm are presented in details, i.e. encoding mechanism, genetic operator and local search process.

2.3.1 Encoding mechanism.

Let |C₁| and |C₂| be the cardinalities of the source concept set C₁ and target concept set C₂, respectively. Each chromosome in the population would be a one-dimensional array with |C₁| elements, and the elements are denoted as: N₁ N₂⋯N_|C₁|, where N_i ∈ ω_i = {0, 1, ⋯, |C₂|}, which means the ith concept in C₁ is mapped to the N_ith concept in C₂. In particular, when N_i = 0, the ith concept is not mapped to any concept in C₂.

2.3.2 Genetic operators.

In this work, we evaluate the population by f − measure_pra and then use a roulette wheel selection method, where an individual is given a probability of being selected that is directly proportionate to its fitness value, and in this way, the most suitable individuals will have more opportunities of reproduction, while the less suitable individuals also have the chance of reproduction. After choosing two individuals (the parents), we use the one-cut-point crossover operator to produce their offsprings: first, a cut position in two parents is randomly determined and this position is a cut point which cuts each parent into two parts: the left part and the right part; then, the right parts of them are switched to form two children. With respect to the mutation operator, for each gene bit N_i, we check if the mutation could be applied according to the mutation probability, and if it is, the value of N_i is then randomly changed to a value in its corresponding search space ω_i.

2.3.3 Local search process.

A local search process performs iterative search for the optimal solution in the neighborhood of a candidate. In order to tradeoff between the local search and the global search, the local search process in our work is designed according to the following rules:

the local search is applied within each evolutionary cycle,
the local search is executed after crossover and mutation,
the local search is applied to the best individual of population,
the local search method is the tabu search algorithm.

Tabu search concerns with imposing restrictions to guide a search process to negotiate otherwise difficult regions, where the restrictions can operate by direct exclusion of search alternatives classed as “forbidden”. The implementation of tabu search uses an array to describe the visited solutions, and if a potential solution has been previously visited within a certain short-term period, it is marked as “tabu” (forbidden) so that the algorithm does not consider that possibility repeatedly. Given a tabu matrix TM = [TV₁, TV₂, ⋯, TV_|C₁|] where the ith tabu list TV_i = (tv₁, tv₂, ⋯, tv_tLength)^T, i = 1, 2, ⋯, |C₁|, tv_j ∈ 0, 1, 2, ⋯, |C₂|, the pesudo-code of tabu search algorithm is given as follows:

Algorithm 2 Local Search Process

iterNum = 0;

while iterNum < maxIterNum do

for n = 0;n < neighborScale;n + + do

solution_new = solution_elite.copy();

for i = 0;i < solution_new.length;i + + do

if random(0, 1) < mutationProbability_LS then

;

end if

end for

append solution_new[i] to neiborhood;

end for

solution_localElite = elite in neiborhood;

compete(solution_Elite, solution_localElite);

if winner == solution_localElite then

for each TV_i ∈ TM do

if TV_i is not full then

append solution_localElite[i] to TV_i;

else

replace , whose corresponding class in C₂ has the lowest similarity value with c_i, with solution_localElite[i];

end if

end for

iterNum ++;

else

break;

end if

end while

During the evolving process, if solution_elite keep unchanged for certain generations, each , whose corresponding class in C₂ has the highest similarity value with c_i, will be removed.

3 User interaction

Since matching biomedical ontology matching is a complex task, ETS-based matching results need to be validated by a user to ensure the alignment’s quality and improve the algorithm’s efficiency [25]. However, it is impractical to require a user to validate all the correspondences at a time, which is both time-consuming and error prone. Thus, how to reduce a user’s workload is the first question we need to answer when implementing an effective user interaction. In addition, how to effectively exploiting the limited user intervention to improve the matching process’s efficiency is the second question that we need to answer. In this work, we get a user involved only when ETS gets stuck, and present the most problematic correspondences (those with low similarity measure value) to him for validation to reduce his workload. When a user validates all the correspondences, the validated results will be further utilized to reduce each gene bit’s search space through a hierarchy-based approach, which can improve the efficiency of hereafter matching process.

3.1 Biomedical concept similarity measure

Similarity measure is a function that takes as input two concepts and outputs a score between 0, which means two concepts are completely different, and 1, which means two concepts are identical. In particular, we first construct a profile for each biomedical concept by collecting the label, comment, and property labels from itself, and all its direct descendants. Then the similarity value between two biomedical class c₁ and c₂ is measured by calculating the similarity of their corresponding profiles P₁ and P₂, which is defined in Eq 6. (6) where:

|P₁| is the number of elements of P₁ and |P₂| is the number of elements of P₂,
p_1i is the ith property of P₁ and p_2j is the jth property of P₂, e.g. the label or comment in concept description profile,

Here, sim′(p_1i, p_2j) calculates the similarity value of two profile elements by N-gram distance [26], which is the most performing string-based similarity measure for the biological ontology matching problem, and a linguistic measure, which calculate a synonymy-based distance through Unified Medical Language System (UMLS) [27]. To be specific, given two words w₁ and w₂, their similarity sim(w₁, w₂) is equal to 1 when two words are synonymous, and otherwise, N − gram(w₁, w₂).

3.2 Improve the efficiency of matching process

It is the large search space that makes EA-based ontology matcher difficult to match the biomedical ontologies, thus, how to reduce the search space is critical for a biomedical ontology matching technique. In this work, we propose a hierarchy-based approach to exploit the validated results to effectively reduce the ETS algorithm’s search space. Our proposal works on the basis of two observations [28]: (1) a biomedical ontology is often composed of the hierarchies organized by “is-a” relationship, and a correct alignment should be consistent with such hierarchies, (2) an alignment between two biomedical ontologies has locality, i.e. most class of a region in one ontology will match to the classes of a region in another ontology, and the search space reducing process is as follows:

if a user judges a source concepts c_i and a target c_j are identical, the sub-concepts(or super-concepts) of c_i and super-concepts(or sub-concepts) of c_j should not match, i.e. c_j’s super-concepts’ indexes will be removed from the search space ω′ of each c_i’s sub-concept ’s corresponding gene bit, and c_j’s sub-concepts’ indexes will be removed from the search space ω″ of each c_i’s super-concept ’s corresponding gene bit,
if a user judge a source concepts c_i and a target c_j are not the same, the neighborhood of c_i do not match c_j too, i.e. c_j’s index will be removed from the search space ω‴ of each c_i’s neighbor ’s corresponding gene bit. In particular, c_i’s neighborhood include c_i’s direct super-concept, sub-concept and siblings.

By omitting dissimilar correspondences, the search space of ETS algorithm can be significantly reduced after each user interaction, as well as the alignment’s quality potentially.

4 Experimental studies and analysis

In this work, we exploit the Anatomy http://oaei.ontologymatching.org/2016/anatomy/index.html and Large Biomed http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2016/ track to study the effectiveness of our approach, which are provided by OAEI 2016 http://oaei.ontologymatching.org/2016. The experiment allows the matching approaches to ask an oracle who will then tell the matcher whether the correspondence is right or wrong. Tables 1, 2 and 3 show the mean value of f-measure of the alignments obtained by our approach in thirty independent runs and the results obtained by the participants of OAEI. The symbols r, p and f in the tables stand for recall, precision and f-measure, respectively, and , and respectively stand for the matcher’s non-interactive version’s f-measure, recall and precision. In this experiment, we use three metrics, i.e. f-measure, runtime and the mean improvement per request, to evaluate the performances of the interactive biomedical ontology matchers. In particular, f-measure and runtime can be used to measure the effectiveness of semi-automatic ontology matching technique, and the mean improvement per request can measure the efficiency of the user involvement.

Download:

Table 1. Comparison between OAEI 2016’s participants and our approach on interactive anatomy track.

https://doi.org/10.1371/journal.pone.0215147.t001

Download:

Table 2. Comparison between OAEI 2016’s participants and our approach on the interactive Large Biomedic track: FMA-NCI.

https://doi.org/10.1371/journal.pone.0215147.t002

Download:

Table 3. Comparison between OAEI 2016’s participants and our approach on the interactive Large Biomedic track: SNOMED-NCI.

https://doi.org/10.1371/journal.pone.0215147.t003

The configuration of EA in our work follows the following principles:

In our work, since the EA works mainly based on the crossover operator and is aided by the mutation operator, the crossover possibility should be larger and the mutation possibility just the opposite. However, if the value of the crossover operator is too great, excess solutions would appear which might increase the cost of computation. Therefore, the suggested range of crossover probability is [0.2, 1], and through the preliminary experiment, we find that the results obtained with the crossover probability 0.85 and the mutation probability 0.02 are acceptable for various heterogeneous problem in all testing cases.
Since the local searching process requires producing a local searching population with high diversity, the mutation possibility of local search should be higher than that of the genetic algorithm. However, if the value is too large, the produced individual might not be the “neighbor” of the local searching target. Therefore, the suggested range of mutation probability is [0.2, 0.8], and through the preliminary experiment, we find that the mutation probability 0.5 works better.
The population size, local Search intensity and maximum number of generation for termination depend on the scale of the problem, the suggested ranges for them are [50, 120], [10, 40] and [1500, 3500], respectively. Since the problem scale in our work is relatively large, we set the size of population, local Search intensity and the maximum number of generation as 100, 30 and 3000 respectively.

In our work, we use the following parameters which represent a trade-off setting obtained in an empirical way to achieve the highest average alignment quality on all testing cases of exploited dataset. Through the configuration of parameters chosen in this way, it has been justified by the experiments that parameters chosen are robust for all the heterogeneous problems presented in the benchmarks, and it is hopeful to be robust for the common heterogeneous situations in the real world.

Numerical accuracy = 0.01,
Population size = 100,
Crossover probability = 0.85,
Mutation probability = 0.02,
Local search’s mutation probability = 0.5,
Local Search intensity = 30 iterations,
Maximum number of generation = 3000,

In addition, in order to compare with the participants of OAEI 2016, we run our approach on Conference and Anatomy tracks on a server with Intel Xeon E5-2643 CPU @ 3.46 GHz x 6 cores and 8GB RAM, and Large Bio track and Phenotype track on a laptop with an Intel Core i7-4600U CPU @ 2.10GHz x 4 and allocated 15GB RAM. The operating system of both machines is Linux.

4.1 Anatomy track

The anatomy track is a large ontology matching task about matching the Adult mouse anatomy (2744 classes) and a part of the NCI Thesaurus (3304 classes) which describes the human anatomy. Adult mouse anatomy is a structured controlled vocabulary describing the anatomical structure of the adult mouse, whereas NCI depicts the human anatomy for the purpose of cancer research.

As can be seen from Table 1, our approach’s f-measure is the highest. In particular, comparing with the non-interactive version of our approach, both recall and precision are improved by 20% and 15% respectively, which shows that our approach can effectively exploit the user intervention to improve the alignment quality. In addition, because of the high efficiency brought by the hierarchy-based approach, our approach only takes 23 seconds to obtain the ontology alignment, which is the lowest among all matching systems. Our approach’s mean improvements per request are all higher than other systems, which illustrate that our approach can efficiently utilize the user involvement’s value. With the introduction of an erroneous oracle and moving towards higher error rates, each system’s performance starts to deteriorate in comparison to the all-knowing oracle. To sum up, our approach can efficiently exploit the user involvement to achieve the great improvement.

4.2 Large Biomedic track

This track aims at finding alignments between the large and semantically rich biomedical ontologies FMA, SNOMED, and NCI, which contains 78,989, 306,591 and 66,724 classes, respectively.

On the first track of Large Bio, as can be seen from Table 2, our approach improves the non-interactive version by 5.43% in terms of f-measure, comparing with Alin’s 2.4%, AML’s 2.15% and LogMap’s 2.17% and XMap’s 1.0%. Therefore, our approach’s user validation exploitation is effective, which makes our approach can efficiently deal with the large scale ontology matching problem and improve the ontology alignment’s quality. Since our approach can effectively reduce the number of user interaction and exploit the user validation’s value, the mean improvement per request of our approach is much higher than other systems. Last but not least, due to the efficiency brought by the search space reducing approach, the average runtime of our approach also is less than other systems.

As shown in Table 3, our approach obtains the highest f-measure. Comparing with the non-interactive version, our approach’s recall and precision are both improved by 19.44% and 6.09% respectively, which shows that our approach can effectively utilize the user intervention to improve the alignment quality. In addition, the mean improvement per request of our approach is also higher than other systems, but the mean runtime is the lowest under all the user error rates. In addition, our approach’s mean improvements per request are higher than other systems. To sum up, our approach is able to efficiently exploit the user involvement to obtain high quality ontology alignments when solving large scale biomedical ontology matching problem.

To conclude, through the comparison with OAEI’s participants in the interactive ontology matching tracks with different scales, our approach is able to more effectively exploit the user validation to improve the performance of its non-interactive version, and the qualities of the alignments obtained by our approach with three user error rates ranging from 0.1 to 0.3 are all better than the state-of-the-art interactive biomedical ontology matching techniques.

5 Conclusion and future work

To efficiently match biomedical ontologies, in this work, an interactive biomedical ontology matching approach is proposed, which can effectively utilize the user’s knowledge to guide the ETS-based ontology matcher’s search direction and improve its efficiency by reducing the algorithm’s search space. The experimental results show that our approach is able to efficiently exploit the user validation to improve its non-interactive version, and the performance of it outperforms the state-of-the-art interactive biomedical ontology matching techniques. In the future, we are interested in the strategies that can reuse a user’s validation results to further reduce the search space of the algorithm. In addition, we are also interested in decreasing the user’s error rate by warning him when contradicting validations are made.

References

1. Consortium GO. The Gene Ontology (GO) database and informatics resource. Nucleic acids research. 2004;32(suppl_1):D258–D261.
- View Article
- Google Scholar
2. Golbeck J, Fragoso G, Hartel F, Hendler J, Oberthaler J, Parsia B. The National Cancer Institute’s thesaurus and ontology. Web Semantics: Science, Services and Agents on the World Wide Web. 2011;1(1):1–5.
- View Article
- Google Scholar
3. Rosse C, Mejino JL Jr. A reference ontology for biomedical informatics: the Foundational Model of Anatomy. Journal of biomedical informatics. 2003;36(6):478–500. pmid:14759820
- View Article
- PubMed/NCBI
- Google Scholar
4. Schulz S, Cornet R, Spackman K. Consolidating SNOMED CT’s ontological commitment. Applied ontology. 2011;6(1):1–11.
- View Article
- Google Scholar
5. Lopez-Fernandez H, Reboiro-Jato M, Glez-Pena D, Aparicio F, Gachet D, Buenaga M, et al. BioAnnote: A software platform for annotating biomedical documents with application in medical learning environments. Computer methods and programs in biomedicine. 2013;111(1):139–147. pmid:23562645
- View Article
- PubMed/NCBI
- Google Scholar
6. Cimino JJ, Zhu X. The practical impact of ontologies on biomedical informatics. Yearbook of medical informatics. 2006;15(1):124–135.
- View Article
- Google Scholar
7. Isern D, SaNchez D, Moreno A. Ontology-driven execution of clinical guidelines. Computer methods and programs in biomedicine. 2012;107(2):122–139. pmid:21752487
- View Article
- PubMed/NCBI
- Google Scholar
8. De Potter P, Cools H, Depraetere K, Mels G, Debevere P, De Roo J, et al. Semantic patient information aggregation and medicinal decision support. Computer methods and programs in biomedicine. 2012;108(2):724–735. pmid:22640816
- View Article
- PubMed/NCBI
- Google Scholar
9. Amin MB, Khan WA, Hussain S, Bui DM, Banos O, Kang BH, et al. Evaluating Large-Scale Biomedical Ontology Matching Over Parallel Platforms. Iete Technical Review. 2016;33(4):415–427.
- View Article
- Google Scholar
10. Nasir SAM, Noor NLM. Analysing the effectiveness of COMA++ on the mapping between traditional Malay textile (TMT) knowledge model and CIDOC CRM. 2010 International Symposium in Information Technology (ITSim). IEEE; 2010. p. 1–6.
11. Jauro F, Junaidu S, Abdullahi S. Falcon-AO++: An improved ontology alignment system. International Journal of Computer Applications. 2014;94(2):1–7.
- View Article
- Google Scholar
12. Xue X, Pan JS. A segment-based approach for large-scale ontology matching. Knowledge and Information Systems. 2017;52(2):467–484.
- View Article
- Google Scholar
13. Faria D, Pesquita C, Mott I, Martins C, Couto FM, Cruz IF. Tackling the challenges of matching biomedical ontologies. Journal of biomedical semantics. 2018;9(1):4. pmid:29335022
- View Article
- PubMed/NCBI
- Google Scholar
14. Dragisic Z, Ivanova V, Lambrix P, Faria D, Jimenez-Ruiz E, Pesquita C. User validation in ontology alignment. International Semantic Web Conference. Springer; 2016. p. 200–217.
15. Faria D, Pesquita C, Balasubramani BS, Martins C, Cardoso J, Curado H, et al. OAEI 2016 Results of AML. Ontology Matching. 2016; p. 138–145.
16. Da Silva J. ALIN Results for OAEI 2016. Ontology Matching. 2016; p. 130–137.
17. Jimenez-Ruiz E, Grau BC, Cross V. LogMap family participation in the OAEI 2016. Ontology Matching. 2016; p. 185–189.
18. Eddine Warith D, Tarek Mohamed K, YAHIAb SB. XMap Results for OAEI 2016. Ontology Matching. 2016; p. 222–226.
19. Xue X, Pan JS. A Compact Co-Evolutionary Algorithm for sensor ontology meta-matching. Knowledge and Information Systems. 2018;56(2):335–353.
- View Article
- Google Scholar
20. Martinez-Gil J, Alba E, Montes JFA. Optimizing ontology alignments by using genetic algorithms. Proceedings of the First International Conference on Nature Inspired Reasoning for the Semantic Web-Volume 419. CEUR-WS. org; 2008. p. 1–15.
21. Alexandru-Lucian G, Iftene A. Using a genetic algorithm for optimizing the similarity aggregation step in the process of ontology alignment. 2010 9th. IEEE Roedunet International Conference (RoEduNet); 2010. p. 118–122.
22. Wang J, Ding Z, Jiang C. Gaom: Genetic algorithm based ontology matching. 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC’06). IEEE; 2006. p. 617–620.
23. Xue X, Wang Y, Ren A. Optimizing ontology alignment through memetic algorithm based on partial reference alignment. Expert Systems with Applications. 2014;41(7):3213–3222.
- View Article
- Google Scholar
24. Rijsberge CJV. Information Retrieval. Butterworth, London: University of Glasgow; 1975.
25. Xue X, Wang Y. Optimizing ontology alignments through a Memetic Algorithm using both MatchFmeasure and Unanimous Improvement Ratio. Artificial Intelligence. 2015;223:65–81.
- View Article
- Google Scholar
26. Kondrak G. N-gram similarity and distance. International symposium on string processing and information retrieval. Springer; 2005. p. 115–126.
27. Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research. 2004;32(suppl 1):D267–D270. pmid:14681409
- View Article
- PubMed/NCBI
- Google Scholar
28. Wang P. Lily-lom: An efficient system for matching large ontologies with non-partitioned method. CEUR Workshop Proceedings. Citeseer; 2010. p. 69–72.

[ref1] 1. Consortium GO. The Gene Ontology (GO) database and informatics resource. Nucleic acids research. 2004;32(suppl_1):D258–D261.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Golbeck J, Fragoso G, Hartel F, Hendler J, Oberthaler J, Parsia B. The National Cancer Institute’s thesaurus and ontology. Web Semantics: Science, Services and Agents on the World Wide Web. 2011;1(1):1–5.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Rosse C, Mejino JL Jr. A reference ontology for biomedical informatics: the Foundational Model of Anatomy. Journal of biomedical informatics. 2003;36(6):478–500. pmid:14759820
View Article
PubMed/NCBI
Google Scholar

[8] View Article

[9] PubMed/NCBI

[10] Google Scholar

[ref4] 4. Schulz S, Cornet R, Spackman K. Consolidating SNOMED CT’s ontological commitment. Applied ontology. 2011;6(1):1–11.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref5] 5. Lopez-Fernandez H, Reboiro-Jato M, Glez-Pena D, Aparicio F, Gachet D, Buenaga M, et al. BioAnnote: A software platform for annotating biomedical documents with application in medical learning environments. Computer methods and programs in biomedicine. 2013;111(1):139–147. pmid:23562645
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Cimino JJ, Zhu X. The practical impact of ontologies on biomedical informatics. Yearbook of medical informatics. 2006;15(1):124–135.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref7] 7. Isern D, SaNchez D, Moreno A. Ontology-driven execution of clinical guidelines. Computer methods and programs in biomedicine. 2012;107(2):122–139. pmid:21752487
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref8] 8. De Potter P, Cools H, Depraetere K, Mels G, Debevere P, De Roo J, et al. Semantic patient information aggregation and medicinal decision support. Computer methods and programs in biomedicine. 2012;108(2):724–735. pmid:22640816
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref9] 9. Amin MB, Khan WA, Hussain S, Bui DM, Banos O, Kang BH, et al. Evaluating Large-Scale Biomedical Ontology Matching Over Parallel Platforms. Iete Technical Review. 2016;33(4):415–427.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref10] 10. Nasir SAM, Noor NLM. Analysing the effectiveness of COMA++ on the mapping between traditional Malay textile (TMT) knowledge model and CIDOC CRM. 2010 International Symposium in Information Technology (ITSim). IEEE; 2010. p. 1–6.

[ref11] 11. Jauro F, Junaidu S, Abdullahi S. Falcon-AO++: An improved ontology alignment system. International Journal of Computer Applications. 2014;94(2):1–7.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref12] 12. Xue X, Pan JS. A segment-based approach for large-scale ontology matching. Knowledge and Information Systems. 2017;52(2):467–484.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref13] 13. Faria D, Pesquita C, Mott I, Martins C, Couto FM, Cruz IF. Tackling the challenges of matching biomedical ontologies. Journal of biomedical semantics. 2018;9(1):4. pmid:29335022
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref14] 14. Dragisic Z, Ivanova V, Lambrix P, Faria D, Jimenez-Ruiz E, Pesquita C. User validation in ontology alignment. International Semantic Web Conference. Springer; 2016. p. 200–217.

[ref15] 15. Faria D, Pesquita C, Balasubramani BS, Martins C, Cardoso J, Curado H, et al. OAEI 2016 Results of AML. Ontology Matching. 2016; p. 138–145.

[ref16] 16. Da Silva J. ALIN Results for OAEI 2016. Ontology Matching. 2016; p. 130–137.

[ref17] 17. Jimenez-Ruiz E, Grau BC, Cross V. LogMap family participation in the OAEI 2016. Ontology Matching. 2016; p. 185–189.

[ref18] 18. Eddine Warith D, Tarek Mohamed K, YAHIAb SB. XMap Results for OAEI 2016. Ontology Matching. 2016; p. 222–226.

[ref19] 19. Xue X, Pan JS. A Compact Co-Evolutionary Algorithm for sensor ontology meta-matching. Knowledge and Information Systems. 2018;56(2):335–353.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref20] 20. Martinez-Gil J, Alba E, Montes JFA. Optimizing ontology alignments by using genetic algorithms. Proceedings of the First International Conference on Nature Inspired Reasoning for the Semantic Web-Volume 419. CEUR-WS. org; 2008. p. 1–15.

[ref21] 21. Alexandru-Lucian G, Iftene A. Using a genetic algorithm for optimizing the similarity aggregation step in the process of ontology alignment. 2010 9th. IEEE Roedunet International Conference (RoEduNet); 2010. p. 118–122.

[ref22] 22. Wang J, Ding Z, Jiang C. Gaom: Genetic algorithm based ontology matching. 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC’06). IEEE; 2006. p. 617–620.

[ref23] 23. Xue X, Wang Y, Ren A. Optimizing ontology alignment through memetic algorithm based on partial reference alignment. Expert Systems with Applications. 2014;41(7):3213–3222.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref24] 24. Rijsberge CJV. Information Retrieval. Butterworth, London: University of Glasgow; 1975.

[ref25] 25. Xue X, Wang Y. Optimizing ontology alignments through a Memetic Algorithm using both MatchFmeasure and Unanimous Improvement Ratio. Artificial Intelligence. 2015;223:65–81.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref26] 26. Kondrak G. N-gram similarity and distance. International symposium on string processing and information retrieval. Springer; 2005. p. 115–126.

[ref27] 27. Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research. 2004;32(suppl 1):D267–D270. pmid:14681409
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref28] 28. Wang P. Lily-lom: An efficient system for matching large ontologies with non-partitioned method. CEUR Workshop Proceedings. Citeseer; 2010. p. 69–72.

Abstract

Figures

Introduction

1 Interactive biomedical ontology matching framework

2 Evolutionary tabu search algorithm based biomedical ontology matching

2.1 Biomedical ontology matching problem

2.2 Partial reference alignment initialization

2.3 Evolutionary tabu search algorithm

2.3.1 Encoding mechanism.

2.3.2 Genetic operators.

2.3.3 Local search process.

3 User interaction

3.1 Biomedical concept similarity measure

3.2 Improve the efficiency of matching process

4 Experimental studies and analysis

4.1 Anatomy track

4.2 Large Biomedic track

5 Conclusion and future work

References

Cookie Preference Center

Customize Your Cookie Preference