Language resource management -- Lexical markup framework (LMF)

Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF)

Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 5. del: Serializacija leksikalne osnovne izmenjave (LBX)

General Information

Status
Published
Current Stage
4060 - Close of voting
Start Date
16-Mar-2021
Completion Date
15-Mar-2021

RELATIONS

Buy Standard

Draft
ISO/DIS 24613-5:2021
English language
36 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day

Standards Content (sample)

SLOVENSKI STANDARD
oSIST ISO/DIS 24613-5:2021
01-marec-2021
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 5. del:
Serializacija leksikalne osnovne izmenjave (LBX)
Language resource management -- Lexical markup framework (LMF) - Part 5: Lexical
base exchange (LBX) serialization

Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Partie 5:

Sérialisation de l’échange de bases lexicales (LBX)
Ta slovenski standard je istoveten z: ISO/DIS 24613-5
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
oSIST ISO/DIS 24613-5:2021 en,fr

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
oSIST ISO/DIS 24613-5:2021
---------------------- Page: 2 ----------------------
oSIST ISO/DIS 24613-5:2021
DRAFT INTERNATIONAL STANDARD
ISO/DIS 24613-5
ISO/TC 37/SC 4 Secretariat: KATS
Voting begins on: Voting terminates on:
2020-12-21 2021-03-15
Language resource management — Lexical markup
framework (LMF) —
Part 5:
Lexical base exchange (LBX) serialization
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENT AND APPROVAL. IT IS
THEREFORE SUBJECT TO CHANGE AND MAY
NOT BE REFERRED TO AS AN INTERNATIONAL
STANDARD UNTIL PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
This document is circulated as received from the committee secretariat.
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
Reference number
NATIONAL REGULATIONS.
ISO/DIS 24613-5:2020(E)
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION. ISO 2020
---------------------- Page: 3 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2020

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved
---------------------- Page: 4 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)
Contents Page

Foreword ..........................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 General requirements ..................................................................................................................................................................................... 1

5 Serialization of the LMF core model (ISO 24613-1) ......................................................................................................... 2

5.1 Implementing the LexicalResource class ......................................................................................................................... 2

5.2 Implementing the GlobalInformation class.................................................................................................................... 2

5.3 Implementing the Lexicon class ............................................................................................................................................... 3

5.4 Implementing the LexiconInformation class ................................................................................................................ 3

5.5 Implementing the LexicalEntry class .................................................................................................................................. 4

5.6 Implementing the OrthographicRepresentation class ......................................................................................... 5

5.7 Implementing the Form class ..................................................................................................................................................... 6

5.7.1 Form class .............................................................................................................................................................................. 6

5.7.2 Lemma class ......................................................................................................................................................................... 6

5.8 Implementing the GrammaticalInformation class ................................................................................................... 6

5.9 Implementing the Sense class .................................................................................................................................................... 7

5.10 Implementing the Definition class ......................................................................................................................................... 7

5.11 Implementing CrossREF class .................................................................................................................................................... 8

6 Serialization of the MRD extension (ISO 24613-2) ........................................................................................................... 9

6.1 Implementing OrthographicRepresentation for MRD .......................................................................................... 9

6.2 Implementing Form representations for the Form subclasses ..................................................................... 9

6.3 Classes derived from the Form class .................................................................................................................................10

6.3.1 General principles ........................................................................................................................................................10

6.3.2 Implementing the WordForm class ..............................................................................................................10

6.3.3 Implementing the Stem class .............................................................................................................................11

6.3.4 Implementing the WordPart class .................................................................................................................11

6.3.5 Implementing the RelatedForm class .........................................................................................................12

6.3.6 Implementing the TextRepresentation class ........................................................................................13

6.3.7 Implementing the Translation class .............................................................................................................14

6.3.8 Implementing the Example class ....................................................................................................................14

6.4 Implementing the SubjectField class ................................................................................................................................14

6.5 Implementing the Bibliography class ...............................................................................................................................15

7 Implementing theCrossREF mechanism to refer to external media files ...............................................15

8 Implementing the classes from the etymological extension (ISO 24613-3) .......................................15

8.1 Implementing the Etymology class ....................................................................................................................................15

8.2 Implementing the Etymon class............................................................................................................................................15

8.2.1 Referencing forms in an etymon .....................................................................................................................16

8.2.2 Representing the meaning of an etymon .................................................................................................16

8.2.3 Representing the language of an etymon ................................................................................................16

8.2.4 Dating an etymon .........................................................................................................................................................16

8.2.5 Providing sources associated with an etymon ....................................................................................16

8.3 Implementing the EtyLink class ............................................................................................................................................16

8.4 Implementing the CognateSet class ...................................................................................................................................17

8.5 Implementing the Cognate class ...........................................................................................................................................17

9 Additional mechanisms ..............................................................................................................................................................................18

9.1 Overview ...................................................................................................................................................................................................18

9.2 XML feature structure implementation ..........................................................................................................................18

9.3 Representing various labels with .......................................................................................................................18

9.4 Providing rendering information with the @rend attribute ........................................................................18

Annex A (informative) LBX data category selection ..........................................................................................................................19

© ISO 2020 – All rights reserved iii
---------------------- Page: 5 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)

Annex B (informative) LBX feature structure implementation .............................................................................................23

Annex C (informative) LBX examples for applying LBX serialization ..............................................................................26

Bibliography .............................................................................................................................................................................................................................31

iv © ISO 2020 – All rights reserved
---------------------- Page: 6 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www .iso .org/ patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the

World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/

iso/ foreword .html.

This document was prepared by Technical Committee ISO/TC 37, Language and terminology,

Subcommittee SC 4, Language resource management.

This first edition of ISO 24613-5, together with ISO 24613-1 to -4, cancels and replaces ISO 24613:2008,

which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivisions.
A list of all parts in the ISO 24613 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www .iso .org/ members .html.
© ISO 2020 – All rights reserved v
---------------------- Page: 7 ----------------------
oSIST ISO/DIS 24613-5:2021
---------------------- Page: 8 ----------------------
oSIST ISO/DIS 24613-5:2021
DRAFT INTERNATIONAL STANDARD ISO/DIS 24613-5:2020(E)
Language resource management — Lexical markup
framework (LMF) —
Part 5:
Lexical base exchange (LBX) serialization
1 Scope

This document describes the serialization of the LMF model defined as an XML model derived from

the LBX schema and compliant with the W3C XML schema. This serialization covers the classes, data

categories, and mechanisms of ISO 24613-1 (Core model) , ISO 24613-2 (Machine-readable dictionary

(MRD) model), and ISO 24613-3 (Etymological extension).
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

BCP 47 Tags for Identifying Languages. A. Phillips; M. Davis. IETF. September 2009. IETF Best Current

Practice. URL: https:// tools .ietf .org/ html/ bcp47

ISO 15924, Information and documentation — Codes for the representation of names of scripts

ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model

ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-

readable dictionary (MRD) model

ISO 24613-3, Language resource management — Lexical markup framework (LMF) — Part 3: Etymological

extension
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO 24613-1 and in

ISO 24613-3 apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
4 General requirements

This document aims at providing constructs for each LMF class from ISO 24613-1 (Core model),

ISO 24613-2 (MRD extension), and ISO 24613-3 (Etymological extension). It shall be compliant with

ISO 24613-1, ISO 24613-2, and ISO 24613-3 when implementing data categories referred to in the

respective parts. LBX extends the original models by means of data category selections and precise

value lists, the creation of new subclasses, and the definition of new constraints. In addition, this

document complies with the cardinalities expressed in ISO 24613-1, ISO 24613-2, and ISO 24613-3.

The LBX serialization is richer in detail than LMF, in order to meet specific design objectives. Still, this

© ISO 2020 – All rights reserved 1
---------------------- Page: 9 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)

document does not elaborate on the meta-data aspects from LMF, since the LBX schema is by essence

much richer for the representation of all the aspects related to the creation, content, versioning and

database implementation of lexical content at large. Occasionally, slightly equivalent constructs to

explicit requirements from the LMF standard will be mentioned.

The XML examples in this document are simplified by omitting namespaces. Except where otherwise

stated, it is assumed that XML elements belong to the LBX namespace and that the examples lie within

the scope of the following XML namespace declaration:
xmlns=”http:// www .lbx .org/ 2020/ schema”
5 Serialization of the LMF core model (ISO 24613-1)
5.1 Implementing the LexicalResource class

The LexicalResource class shall be implemented in LBX by means of the element

(see Table 1), which groups together one to many lexicons in a single collection. This level may be

omitted in cases where the lexical resource contains only one lexicon so that the resource starts

directly with the lexicon level. In cases where a lexical resource contains a large number of lexicons or

several very large lexicons, the lexicon (XML document) can reference a virtual lexical resource using a

@lexicalResourceID in the element and optionally the element (see 5.5).

Table 1 — LexicalResource class
LMF class LBX construct
/LexicalResource/
5.2 Implementing the GlobalInformation class

The GlobalInformation class shall be implemented in LBX by means of the element

(see Table 2) either by referencing a GlobalInformation.xsd schema using an element, or

as a direct child of a element. allows the encoding of a variety

of administrative, technical, documentary, and bibliographic information attached to the corresponding

lexical resource.
Table 2 — GlobalInformation class
LMF class LBX construct
/GlobalInformation/

Since the LBX serialization is based on the W3C recommendation for XML, it implements the @xml:

lang attribute to indicate language information corresponding to the content of specific elements.

According to the W3C recommendation, @xml: lang content shall be compliant with BCP 47. There is

no need for a specific implementation of the /language coding/ data category or the /script coding/

data category in order to ensure compliance of this document with ISO 24613-1. LBX does allow the

inclusion of these data categories in the element in order to support the validation

of equivalent metadata found in the elements of one or more lexicons (see 5.4).

When included, the /script coding/ shall use the codes from ISO 15924. The /character encoding/ data

category is implemented in the XML declaration of an LBX conformant document using the @encoding

attribute. For instance, an XML-LBX document encoded as UTF-8 according to the Unicode standard

shall begin with the following declaration:

A non-exclusive list of sub-elements, simple types indexed by value, follows:

— “ISO639-3”, a simple type enumerating the set of language codes used across all lexicons;

2 © ISO 2020 – All rights reserved
---------------------- Page: 10 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)

— “ISO15924”, a simple type enumerating the set of scripts used across all lexicons;

— globalNotationType, a simple type enumerating the set of notations used across all lexicons;

— globalPartOfSpeechType, a simple type enumerating the set of values used across

all lexicons;

— subjectFieldType, a simple type enumerating the set of values used a across lexicons.

Examples can be found in the LBX reference schema, GlobalInformation document (see Annex B).

5.3 Implementing the Lexicon class

The Lexicon class is implemented in LBX by means of the element (see Table 3), which is a

direct child of the element when is used. If the

element is not used, becomes the root element. In cases where a lexical resource contains

a large number of lexicons or several very large lexicons, the lexicon (XML document) can reference

a virtual lexical resource using a @lexicalResourceID in the element (see 5.1). In the

case of a virtual lexical resource, where the element is not part of the same XML

document as the element, the lexicon can use an include statement to reference a relevant

element. Other information within the element should be qualified

through the following child element(s) and attributes as direct children of the element or,

optimally, as children of the element (see 5.4):
— , the title of the lexicon;</br> <p>— @lexiconID, of datatype xs:ID as a unique identifier for the lexicon; as a best practice, the id should</p> <p>be a URI and be unique within a language resource; @xml:ID can be used in place of @lexiconID</p> when there is a design intent to make the entry accessible on the web;</br> <p>— @lexicalResourceID of datatype xs:ID as a unique identifier for the lexical resource; as a best</p> <p>practice, the ID should be a URI for global scope; in addition, @xml:ID can be used in place of @</p> <p>lexicalResourceID when there is a design intent to make the entry accessible on the web;</p> <p>— @lexiconType, of @datatype “xs: string”; the type of lexicon, e.g. bilingual dictionary, monolingual</p> dictionary;</br> <p>— @sourceLanguage, of @datatype-”xs: string”; the language of the <Lemma> element or its</p> inflected forms;</br> <p>— @targetLanguage, of @datatype ”xs: String”; the language the Lemma is translated to, principally</p> represented in the <Translation> element.</br> Table 3 — Lexicon class</br> LMF class LBX construct</br> /Lexicon/ <Lexicon></br> 5.4 Implementing the LexiconInformation class</br> <p>The LexiconInformation class is implemented by means of the LBX <LexiconInformation> element</p> <p>(see Table 4) either by referencing a LexiconInformation.xsd schema using an <xsd: include> element</p> <p>or as a direct child of the <Entry> element. <LexicalInformation> allows the encoding of a variety of</p> <p>administrative, technical, documentary, and bibliographic information attached to the corresponding</p> lexical entry.</br> © ISO 2020 – All rights reserved 3</br> ---------------------- Page: 11 ----------------------</br> oSIST ISO/DIS 24613-5:2021</br> ISO/DIS 24613-5:2020(E)</br> Table 4 — LexiconInformation class</br> LMF class LBX construct</br> /LexiconInformation/ <LexiconInformation></br> <p>When not included in the <Lexicon> element, information qualifying the lexicon should be included as</p> <p>elements and attributes in the <LexiconInformation> element. These include (see 5.3):</p> — <Title>;</br> — @lexiconID</br> — @lexicalResourceID;</br> — @lexiconType;</br> — @sourceLanguage;</br> — @targetLanguage.</br> <p>The <LexiconInformation> can also include elements and data categories that further qualify</p> <p>information in the lexicon and can be used to support the validation of the XML document (lexicon).</p> <p>These elements and data categories should also be included in the global set of elements and data</p> <p>categories found in the <GlobalInformation> element (see 5.2) and a comparison of the corresponding</p> <p>values in <GlobalInformation> and <LexiconInformation> should be part of the validation process.</p> <p>A non-exclusive list of these sub-elements, simple types indexed by value, follows:</p> <p>— notationType, a simple type enumerating the set of notations used in a lexicon;</p> <p>— partOfSpeechType, a simple type enumerating the set of <partOfSpeech> values used in a lexicon;</p> <p>— subjectFieldType, a simple type enumerating the set of <SubjectField> values used in a lexicon.</p> <p>Examples can be found in the LBX reference schema, LexiconInformation document (see B.1).</p> <p>NOTE In addition to the <LexiconInformation> construct, LBX allows the concatenation of lexicon</p> <p>information for a subset of lexicons grouped by language by referencing a named language data schema (e.g.</p> ArabicLanguageData.xsd) (see B.1).</br> 5.5 Implementing the LexicalEntry class</br> <p>The LexicalEntry class should be implemented by means of the <Entry> element in LBX (see Table 5).</p> <p>Lexical information inside <Entry> elements should be encoded through the following child elements:</p> — <GramFeats> for grammatical information related to the whole entry;</br> <p>— <Form> for containing the text literal and attributes qualifying the text literal (the Form class is</p> serialized through subclasses in LBX);</br> — <Etymology> for etymological aspects;</br> — <Sense> for semantic information;</br> — <Xref> for referencing internal or external elements.</br> Attributes used for the <LexicalEntry> element can include:</br> <p>— @entryID of datatype xs:ID as a unique identifier for an entry; as a best practice, the id should be</p> <p>a URI and be unique within a language resource; @xml:ID can be used in place of @entryID when</p> there is a design intent to make the entry accessible on the web;</br> 4 © ISO 2020 – All rights reserved</br> ---------------------- Page: 12 ----------------------</br> oSIST ISO/DIS 24613-5:2021</br> ISO/DIS 24613-5:2020(E)</br> <p>— @lexiconID of datatype xs:ID as a unique identifier for the parent lexicon; as a best practice, the</p> <p>id should be a URI and be unique within a language resource; @xml:ID can be used in place of @</p> entryID when there is a design intent to make the lexicon accessible on the web;</br> <p>— @lexicalResourceID, a reference to the @lexicalResourceID of the associated lexicon collection</p> when there is more than one lexicon.</br> Table 5 — LexicalEntry class</br> LMF class LBX construct</br> /LexicalEntry/ <Entry></br> <p>The following example in French illustrates the encoding of a simple dictionary entry with two senses.</p> EXAMPLE</br> <Entry xml:lang="fr"></br> <p> <Etymology> XIIIe; languste, v. 1120, «sauterelle»; encore dans Corneille (Hymnes, 7);</p> anc. provençal langosta, altér. du lat. class. locusta «sauterelle».</Etymology></br> <Lemma></br> <GramFeats></br> <POS>noun</POS></br> <Gender>fem</Gender></br> </GramFeats></br> <FormRep xml:lang=”fr” notation=”French”>langouste</FormRep></br> <FormRep xml:lang=”fr” notation=”IPA”>lägust</FormRep></br> </Lemma></br> <Sense senseNR="1"></br> <Def></br> <p> <DefRep> xml:lang=”fr”>Grand crustacé marin (Décapodes macroures) aux pattes</p> <p>antérieures dépourvues de pinces, aux antennes longues et fortes, et dont la chair est</p> très appréciée.</DefRep></br> </Def></br> </Sense></br> <Sense senseNR="2"></br> <Note type="socioCultural">Fig. et fam. (vulg.).</Note></br> <Def></br> <DefRep xml:lang=”fr”> Femme, maîtresse</DefRep></br> </Def></br> </Sense></br> </Entry></br> <p>NOTE 1 The style in the above example would be appropriate for use in a lexical resource that contains a</p> <p>collection of bilingual lexicons in a variety of source languages, e.g. French, Spanish, Russian, Chinese. A simpler</p> <p>style could be used for a collection of monolingual French lexicons. For example, <Orth> and <Pron> could be</p> <p>used in place of the equivalent <FormRep> elements and the <Def> element could directly contain the text content</p> <p>rather than employing a <DefRep> child element for managing text content (see 5.10). See 6.2 for an example of</p> simplification using the <Orth> and <Pron> elements.</br> NOTE 2 The @notation value “French” is short for “Canonical French”.</br> 5.6 Implementing the OrthographicRepresentation class</br> <p>Classes containing an OrthographicRepresentation class include the Form, Lemma, and Definition</p> <p>classes. LBX typically implements orthographic representations by means of elements corresponding</p> <p>to OrthographicRepresentation subclasses that are introduced in ISO 24613-2 (Machine-readable</p> <p>dictionary (MRD) model), Some of those elements are introduced in 5.7.2 and 5.10 in association with</p> <p>classes introduced in ISO 24613-1 (Core model). Those classes (and classes introduced in ISO 24613-2</p> (MDR)) are</br> <b>...</b>

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.