Internet Working Group YP. Chen Internet-Draft H. Xia Intended status: Informational ZM. Wang Expires: May 29, 2018 P. Yang CW. Tang Shaanxi Key Laboratory of Network Data Intelligent Processing Xi'an University of Posts and Telecommunications November 25, 2017 INTERNET-DRAFT A Unified Description Method for Data Service draft-chen-ds-description-00 Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on May 29, 2018. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must Chen Expires May 29, 2018 [Page 1] Internet-Draft Data Service Unified Description November 2017 include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Abstract The rapid development of Internet has driven more and more enterprises or individuals encapsulate operations on key data entities we call data service (DS). Due to the different fields between enterprise or individual, resulting in the description of data services appear semantic heterogeneity. In this paper, we propose a more principled approach to the problems of heterogeneous data service on the Web. We start with a data service description document pre-processing. Finally, we propose a unified description language model for data service, the Unified Description Language for Data Service (UDL4DS). Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions Used in This Document . . . . . . . . . . . . . . 3 3. Data Service Description . . . . . . . . . . . . . . . . . . 4 3.1. Data Service Overview . . . . . . . . . . . . . . . . . 4 3.2. Data Service Preprocessing . . . . . . . . . . . . . . . 5 3.2.1. Data Service Acquisition . . . . . . . . . . . . . 6 3.2.2. Feature Word Extraction for Data Service . . . . . 6 3.3. Data Service Classification . . . . . . . . . . . . . . 7 3.4. Data Service Description Language Design . . . . . . 8 3.4.1. Semantic Annotation of Data Service . . . . . . . . 8 3.5. Data Service Description Model . . . . . . . . . . . . . 9 4. Security Considerations . . . . . . . . . . . . . . . . . . 10 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . 10 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 10 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 7.1. Normative References . . . . . . . . . . . . . . . . . 10 7.2. Informative References . . . . . . . . . . . . . . . . 10 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 11 1. Introduction 1.1. Background With the development of computer Internet and cloud computing, various forms of data information have generated. Due to these data service use different description standards and technology on the Web, there is no common data model and access method so that it is Chen Expires May 29, 2018 [Page 2] Internet-Draft Data Service Unified Description November 2017 difficult to realize the mutual sharing of heterogeneous data source information. In order to solve the above problems, a large number of heterogeneous data are published on the Internet in the form of services to provide data services for service users. The essence of the data service is to use network service protocols and standards such as Hyper Text Transfer Protocol (HTTP), Web Services Description Language (WSDL), XML (Extensible Markup Language), SOAP (Simple Object Access Protocol), Universal Description Discovery and Integration (UDDI) to encapsulate heterogeneous data sources in the Internet by opening up an agent or interface access and providing data services for users. However, as data in various fields is continuously encapsulated as services, data services are becoming more and more frequent, leading to higher and higher requirements for data services. In the process of data service release and invocation, there are critical problems of data service description as following: The existing promulgators of data service are from different industries or fields that cause the lack of a unified data standards and norms as a result of semantic heterogeneity description in the data service. With the development of data services and the increasing complexity of demands requested by service consumers, a single service can not accurately and quickly satisfy the complex demands. It becomes an urgent problem about how to effectively integrate these data services to solve actual demands required by the customer. The method of sorting and semantic annotation for data service is not good enough. In this paper, we propose a data service description language model named UDL4DS based on XML Schema, including the classification of data services, the construction of domain ontology and semantic annotation to solve the semantic heterogeneity between data service in different fields. In addition, XML Schema description of the key elements of the language model was designed to form a common specification to achieve a unified description of data services. 2. Conventions Used in This Document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Chen Expires May 29, 2018 [Page 3] Internet-Draft Data Service Unified Description November 2017 In this document, these words will appear with that interpretation only when in ALL CAPS. Lower case uses of these words are not to be interpreted as carrying significance described in RFC 2119. 3. Data Service Description At present, the data service description is generally based on the XML specification, which describes the access interface and other information of data service. As the constant changes of needs required by users, the description of data service is changing from the grammatical level to the semantic level, which solves the problem that computers are difficult to understand for their semantic expression and provides the best data service more quickly and intelligently. However, due to the data service providers are from different industries which have their own standards for the services they publish, as a result of shared-nothing and interoperated-nothing with each other. In this paper, through the division of data service and the solution of data service published in different fields, we propose a unified description language for data service (UDL4DS) based on the XML Schema specification. We complete the description of unity in two ways: on the one hand, we propose a new data service semantic annotation method based on the domain ontology library to solve the semantic heterogeneity between data services. On the other hand, design a unified description language model, which describes the data service according to the designed description language. 3.1. Data Service Overview In different fields, the meaning of data service is very different. Manu MR and Richard Manning believe that data service layer applies SOA architecture and plays an important role in data integration. Carey M J believes that data service is a software service that provides a unified data model and various access operations to data resources. WS. Zhang believes that the data Service is an XML access interface that can access the database and return the Web Service of the XML format result set. Zhang Peng believes that data service only encapsulates data resources in the information system. Before and after invocation, data service does not change the state of the outside world, and does not have the logic function of handling any business by itself. Following the principles of Web architecture[W3C.REC-webarch-20041215]. The data service directly encapsulates the data of the underlying data source and opens an access interface for the data service requester to invocate, thus the cost of updating and maintaining the Chen Expires May 29, 2018 [Page 4] Internet-Draft Data Service Unified Description November 2017 system will be reduced. In addition, it can facilitate the user to easily discover and transparently access the data from data source. Therefore, data services are becoming more and more popular on the encapsulation of data. 3.2. Data Service Preprocessing Data service exists in the form of XML specification on the Web. The service requester accesses the published data service by calling the open interface of the data service publisher. However, the data service publishers have different industries or fields, and the data services perform semantic heterogeneity in service descriptions, resulting in data service requesters can not exactly and quickly access the best data service that satisfies their needs. In order to discover and invoke the data service better, we implement the preprocessing of data service by analyzing the basic information described in the data service description document and extracting the attribute values of the key tags in the description document, we can obtain the feature word text that can represent the data service, classifying data services by feature word text, dividing the fields into which they belong, and providing keywords that can represent the data service, as shown in Figure 1, which illustrates the preprocessing of data service. +----------------------------------+ | Web | | | +----------------------------------+ <-------- | | | | WSDL Document for Data Service | Obtain | | | +----------------------------------+ <-------- | Feature Vector of WSDL Document | Extract | | | +----------------------------------+ <-------- | | | | Domain Ontology Library | Construct | | | +----------------------------------+ <-------- | | | | Subset of Data Service | Classify | (Weather) (News) | | +----------------------------------+ <-------- Figure 1: Data Service preprocessing Chen Expires May 29, 2018 [Page 5] Internet-Draft Data Service Unified Description November 2017 3.2.1. Data Service Acquisition In this paper, we mainly study the data service described by WSDL. We find that the existing form of description document is WSDL, ASMX based on the manifestations of WSDL description document on the Web. We obtain these kinds of data services through the preparation of the crawler. First, we set a certain rule according to our own needs. second, we crawl on the Web to match the rules of document from a given URL. Finally, end crawl as the number of crawling documents reached the set threshold. Figure 2 shows the process of crawling. +------------------------+ | URL | +------------------------+ | | | Regular Expression | | | +---> +------------------------+ <--+ | | Extract Web link | | | | | | | +------------------------+ | | | | +-----------+ | | Link queue | | WSDL | | | | |Description| | +------------------------+ |Document | | | | | (Match) | | | Lenght of link | +-----------+ |Less | | +---> +------------------------+ | End | | (more) | +------------------------+ Figure 2: The process of Crawling 3.2.2. Feature Word Extraction for Data Service Each data service corresponds to a WSDL description document that describes the basic information of the data service, such as "What does the data service do", "Where is the data service", and "how to invoke data service". In this paper, in order to better and easier to represent a data service, we extract some of the more representative tags in the data service description document as attributes of the document, such as (WSDL: service) describes the Chen Expires May 29, 2018 [Page 6] Internet-Draft Data Service Unified Description November 2017 name of data service, (WSDL: operation) describes what kind of functional information the data service can accomplish. For example, a data service "Weather Service" whose method name "Get Weather By IP" can clearly illustrate that the data service is a service that obtains the weather information of the city or region represented by the IP address through the IP address. Each element in the WSDL description document represents a certain meaning. In order to extract the unique attribute representing the data service, the elements in the document need to be parsed. In this document, the content of the name attribute from the (WSDL: service) and (WSDL: operation) tags are extracted as the document's unique attribute value. 3.3. Data Service Classification At present, the ontology construction generally consists of requirements analysis, information collection, terminology recognition, formal coding and assessment, as shown in Figure 3.4. There are many ontology libraries built by the above aspects, but considering the different fields and projects, the constructed ontology base not only considers the general process but also combines with the actual situation. In order to construct a domain ontology suitable for this study, we cluster the feature words of WSDL description document for obtained data service and construct Vector Space Mode (VSM) for all feature words, that is each WSDL description document feature word as a column to form a word - document matrix D, the document matrix D on behalf of N WSDL document, to facilitate the calculation of each feature word weight in any feature word document. Based on the prototype model of domain ontology, the ontology was modeled by OWL ontology description language, the result of clustering the feature words of WSDL document using K-center algorithm, combine of domain information and the tool developed by Stanford University. We implement the classification of data service based on domain ontology from three aspects. First, we parse the obtained WSDL description document of data service, extract the feature word document that represents the basic information of the data service, and construct the feature word vector according to the space vector model. Second, we use the WordNet to calculate the semantic distance between the feature word vector and the vector formed by the domain ontology. Finally, we select the appropriate dividing line to divide the document into its own field. Chen Expires May 29, 2018 [Page 7] Internet-Draft Data Service Unified Description November 2017 The extraction of feature words for data service and the construction of feature word space vector models, and will generate a data service feature vector (SFV). In order to better calculate the similarity between the feature word vector of data service and the domain, domain ontology can be generalize to a domain vector (DV). We can divide the data service belongs to which field according to the similarity between two vectors. 3.4. Data Service Description Language Design In this section, we first improve the formula for calculating the similarity of feature words in the WSDL description document. Then, we present an approach of calculating the similarity based on domain ontology to complete the semantic processing of data service. On the basis pf semantic annotation, we propose a unified description language model of data service as well as complete the design of description language. 3.4.1. Semantic Annotation of Data Service In order to describe the data service uniformly, it is necessary to solve the semantic difference between heterogeneous data services. In this paper, we propose a new semantic annotation method for data service which combines the domain ontology library constructed above. The problem of semantic differences between heterogeneous data services can be solved by semantic annotation for data service. The idea of this method is as follows: Firstly, we extract feature word from WSDL description document of data service to form a feature word set that represents the description document. Secondly, we cluster the feature word set by using K-center algorithm and construct the domain ontology library by combining with the domain information. Finally, we calculate the weight of each feature word combining with the domain ontology, and the set of feature words and their weights are stored according to ontology space vector model VSM. The WSDL document containing these feature words Is associated with the corresponding feature word, thus the mapping between the data service description document and the domain ontology concept is formed. Because ontology is a detailed description of the constraints of the related concepts, concept attributes and the concepts of various hierarchies in this field, semantic annotation of data services based on domain ontology can not only reflect the relationship between service description documents and semantic relevance of Chen Expires May 29, 2018 [Page 8] Internet-Draft Data Service Unified Description November 2017 categories, as well as display the implicit semantic information of data service description documents. In this way, the data service description documents have a certain semantic relationship between them, so as to solve the problem of heterogeneous data services, provide more accurate and comprehensive data services, and lay down unified descriptions for implementing data services. 3.5. Data Service Description Model At present, the data service description methods and standards published on the Web are different. In order to enable the sharing of heterogeneous service resources, it is necessary to solve the semantic heterogeneity between data service resources to make the data service resources to complete a unified semantic description in service description as well as automatically judge the service access mechanism in the implementation of service. In this paper, we present a unified data service description language model (UDL4DS), Figure 3 illustrates the model of UDL4DS. +--> +-----------+ <--+-------------------------+ | | Execution | | | | |Information| |Execute | | +-----------+ | DSExecute /DSExecute | +--+--+ | JDSExecute /JDSExecute | | | | | | | |/Execute | | | +-------------------------+ |DS|+----> +-----------+ <--+--------------------+ | | | Basic | |BaseInfo | | | |Information| | DSID /DSID | | | +-----------+ | DSName /DSName | | | | | +--+--+ |/BaseInfo | | +--------------------+ | | +-----+ +-----------+ <-+-------------------------------+ | | Semantic | |Semantic | | |Information| |ClassifyName /ClassifyName | +-----> +-----------+ |ClassifyMethod /ClassifyMethod | |ClassfyTime /ClassfyTime | |Semantic | +-------------------------------+ Figure 3: UDL4DS Language Model Chen Expires May 29, 2018 [Page 9] Internet-Draft Data Service Unified Description November 2017 4. Security Considerations In this paper, we mainly focus on the unified description of the heterogeneous data service described in the existing WSDL. However, when considering the heterogeneous data sources such as text or webpage data and other forms of data services, the study is not comprehensive enough. 5. IANA Considerations There are no IANA considerations related to this document. 6. Conclusions This document proposes a unified description method for heterogeneous data service, which can make data service share to solve the complex needs of users. We start with a pre-processing of data service description document. Second, we propose a unified description language model for data service, the Unified Description Language for Data Service (UDL4DS). Finally, we implement description system of data service based on Web. 7. References 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, 7.2. Informative References [W3C.REC-webarch-20041215] Jacobs, I. and N. Walsh, "Architecture of the World Wide Web, Volume One", World Wide Web Consortium Recommendation REC-webarch-20041215, December 2004. 8. Acknowledgments Thanks for comments and suggestions provided by H. Wang. This document was prepared using 2-Word-v2.0.template.dot. Chen Expires May 29, 2018 [Page 10] Internet-Draft Data Service Unified Description November 2017 Authors' Addresses YP Chen Shaanxi Key Laboratory of Network Data Intelligent Processing Xi'an University of Posts and Telecommunications China Email: CHENYP@XUPT.edu.cn H Xia Shaanxi Key Laboratory of Network Data Intelligent Processing Xi'an University of Posts and Telecommunications China Email: XIAHONG@XUPT.edu.cn ZM Wang Shaanxi Key Laboratory of Network Data Intelligent Processing Xi'an University of Posts and Telecommunications China Email: ZMWANG@XUPT.edu.cn P Yang Xi'an University of Posts and Telecommunications China Email: YANGPING@163.com CW Tang Xi'an University of Posts and Telecommunications China Email: 1316904833@qq.com Chen Expires May 29, 2018 [Page 11]