MULON - MULtilingual Ontology mergiNg

Content

Overview

MULON is an approach, for merging monolingual ontologies in different natural languages producing a multilingual ontology. In a multilingual ontology, resources (classes and properties) can be published in a language-independent way, associated with language-dependent (linguistic) information, which supports access across various natural languages. Merging means creating a single ontology to provide a unified view of the input ontologies by maintaining all information contained in them. There are two types of ontology merging: a) Symmetric merging; which aims to integrate all resources (classes and properties) in the input ontologies to a single ontology, i.e. preserves all input resources (which is used in MULON), and b) Asymmetric merging; which considers one of the input ontologies as the target ontology and merges the remaining non-redundant resources of the input ontologies into the target ontology.
The creation of such ontology is a complex task and requires considerable adaptation and rigorous techniques to control various steps of the creation, especially when merging ontologies in different natural languages. Identification of mappings between multilingual input ontologies, the first step in the merging process, plays a vital role in the ontology merging process. As an illustration example, first, MULON identifies cross-lingual matches between input ontologies using cross-lingual matching techniques, then adds them to the merged multilingual ontology by adding rdfs:label for each language (using language-tagged strings) as shown in Figure 1. Cross-lingual matching helps to lower redundancy in the merged ontology.

Example
Fig.1. An illustration of generating a multilingual ontology (with double line nodes)from merging German and English ontologies.

Architecture

The following figure shows MULON’s architecture:

Architecture
Fig.2. MULON Architecture

The input to MULAN is the two ontologies in two different natural languages. The output is a multilingual in addition to an assessment sheet presenting the quality of the merged ontology.

MULON is comprised of three modules:

Source code and documentation

The latest code is available in a public repository in GitHub. A description for each configurable parameter and function can be found here.

Installation

All implementations are based on Scala 2.11.11 and Apache Spark 2.3.1.. After installing them, download MULON using:

git clone https://github.com/shmkhaled/MULON.git
cd MULON
mvn clean package

After you are done with the configurations mentioned above, you will be able to open the project. The following figure shows MULON in IntelliJ

intelliJ
Fig.3. MULON in IntelliJ

Example

The following example describing merging two ontologies: Conference ontology in German (O1), from the MultiFarm dataset, and the Scientific Events Ontology in English (O2). First, MULON reads the two input ontologies in Turtle (line 1 and 2) and convert them into N-Triples format and generates RDD representation of them (line 4 and 5). MULON uses SANSA-RDF library with Apache Jena framework to parse and manipulate the input ontologies (as RDF triples) in a distributed manner. SANSA support different RDF serialization formats (e.g. NTRIPLES/N3, XML/RDF, TURTLE, QUAD). Second, MULON generates the multilingual merged ontology (line 6) and gets its statistics (line 9) such as number of classes, number of object properties, number of annotation properties, and number of datatype properties. Finally, MULON creates the assessment sheet for the input and output ontologies (line 12).

    val O1 = ".../conference-de.ttl"
    val O2 = ".../SEO.ttl"

    val lang = Lang.NTRIPLES 
    val O1triples: RDD[graph.Triple] = spark.rdf(lang)(O1) 
    val O2triples: RDD[graph.Triple] = spark.rdf(lang)(O2) 

    val multilingualMergedOntology: RDD[graph.Triple] = ontoMerge.MergeOntologies(O1triples, O2triples) 

    val ontStat = new OntologyStatistics(sparkSession1) 
    println("Statistics for merged ontology") 
    ontStat.GetStatistics(multilingualMergedOntology) 

    //Assessemnt sheet 
    val quality = new QualityAssessment(sparkSession1) 
    quality.GetQualityAssessmentSheet(O1triples, O2triples, multilingualMergedOntology)

A sample output for the merged ontology, in N-Triples format, is presented in the following script where every resource has English and German labels. For example "ConferenceProceedings" is a class with the German label "Fortschritte der konferenz".

Quality assessment sheet for the input and merged ontologies where O1 is the Conference ontology in German, O2 is the SEO ontology in English, and Om is the multilingual merged ontology.

Relationship richness for O1 is 0.48
Relationship richness for O2 is 0.51
Relationship richness for Om is 0.5
==============================================
Attribute richness for O1 is 0.78
Attribute richness for O2 is 0.58
Attribute richness for Om is 0.65
==============================================
Inheritance richness for O1 is 0.83
Inheritance richness for O2 is 0.55
Inheritance richness for Om is 0.66
==============================================
Readability for O1 is 0.99
Readability for O2 is 0.63
Readability for Om is 2.22
==============================================
Isolated Elements for O1 is 0.11
Isolated Elements for O2 is 0.44
Isolated Elements for Om is 0.21
==============================================
Missing Domain Or Range for O1 is 0.02
Missing Domain Or Range for O2 is 0.29
Missing Domain Or Range for Om is 0.19
==============================================
Redundancy for O1 is 0.03
Redundancy for O2 is 0.00
Redundancy for Om is 0.00
==============================================
Class coverage for merged ontology Om is 1.00
Property coverage for merged ontology Om is 0.99
Compactness for merged ontology Om is 0.97