...collaborate on

XML to SQL Conversion Tool

Student: RicardoVilaca

Project

Introduction

In this project I have to develop a tool to XML <-> SQL Conversion based on Haskell/Strafunski. The conversion scheme is based on the refinement calculus proposed in link

Objectives

  1. Define the datatypes of XMLSchema in Haskell
  2. Convert XML Schema datatypes to VDM-SL type definitions, using the datatypes defined in VooDooMFront.

XMLSchema

The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD. An XML Schema:

  • defines elements that can appear in a document
  • defines attributes that can appear in a document
  • defines which elements are child elements
  • defines the order of child elements
  • defines the number of child elements
  • defines whether an element is empty or can include text
  • defines data types for elements and attributes
  • defines default and fixed values for elements and attributes

Purpose of XML Schemas

The XML have many schema languages. This languages specify the structure of instance documents (e.g this element contains these elements, which contains these other elements, etc) and the datatype of each element/attribute (e.g this element shall hold an integer with the range 0 to 12,000).

Motivation for XML Schemas

The main motivation for the XMLSchema is that people are dissatisfied with DTDs:
  • The DTD it's in a different syntax. The XML (instance) document is writing using one syntax and the DTD using another syntax
  • Limited datatype capability.
    • DTDs support a very limited capability for specifying datatypes.
    • People desire a set of built-in datatypes compatible with those found in databases. DTD supports only 10 datatypes and XML Schemas supports 44+ datatypes

VDM-SL

VDM-SL Data Types

  • Basic Data Types
    • The Boolean Type
    • The Numeric Types: The five numeric types denote a hierarchy where real is the most general type followed by rat, int, nat and nat1.
      • Reals
      • Rationals
      • Integers
      • Naturals
      • Positive Naturals
    • The Character Type
    • The Quote Type: The quote type corresponds to enumerated types in a typical programming language.
  • Compound Types
    • Set Types: A set is an unordered collection of values, all of the same type, which is treated as a whole. All sets are finite, i.e. they contain only a finite number of elements. The elements of a set type can be arbitrarily complex, they could for example be sets themselves.
    • Sequence Types: A sequence value is an ordered collection of elements of some type indexed by 1, 2, ..., n; where n is the length of the sequence. A sequence type is the type of nite sequences of elements of a type, either including the empty sequence (seq0 type) or excluding it (seq1 type). The elements of a sequence type can be arbitrarily complex; they could e.g. be sequences themselves.
    • Map Types: A map type from a type A to a type B is a type that associates with each element of A (or a subset of A) an element of B. A map value can be thought of as an unordered collection of pairs. The first element in each pair is called a key, because it can be used as a key to get the second element (called the information part) in that pair. All key elements in a map must therefore be unique. All maps in VDM-SL are finite. The domain and range elements of a map type can be arbitrarily complex, they could e.g. be maps themselves. A special kind of map is the injective map. An injective map is one for which no element of the range is associated with more than one element of the domain. For an injective map it is possible to invert the map.
    • Product Types: The values of a product type are called tuples. A tuple is a xed length list where the i'th element of the tuple must belong to the i'th element of the product type
    • Composite Types: Composite types correspond to record types in programming languages. Thus, elements of this type are somewhat similar to the tuples described in the section about product types above. The di erence between the record type and the product type is that the di erent components of a record can be directly selected by means of corresponding selector functions. In addition records are tagged with an identi er which must be used when manipulating the record.
    • Union and Optional Types: The union type corresponds to a set-theoretic union, i.e. the type de ned by means of a union type will contain all the elements from each of the components of the union type. The optional type [T] is a kind of shorthand for a union type T | nil, where nil is used to denote the absence of a value.

Datatype Definition of XMLSchema in Haskell

For accomplish the first step I tried the following alternatives:

  1. UUXML (A Type-Preserving XML Schema Haskell Data Binding): Not a real tool (no implementation), just a theorethical conversion defined by Frank Atanassow, Dave Clarke and Johan Jeuring
  2. Haifa : Mapping of XML Schema data-types to Haskell by Simon Foster. This is a very simple data-type mapper, just for very simple XML Schema data-types. It also needs GHC 6.3/6.4 with full Generics.
  3. DtdToHaskell - HaXml: Download of the DTDs that define XMLSchema from W3C (http://www.w3.org/2001/XMLSchema.dtd and http://www.w3.org/2001/datatypes.dtd).The DtdToHaskell has a bug on parsing of Parameter Entity entities. Because of that I change the original DTDs using AltovaXMLSpy DTD Editor, replacing the entities by their definition, obtaining a equivalent version of the DTDs. Then using the DtdToHaskell tool from HaXml I define the datatypes from the changed version of the DTDs (XMLSchema.dtd and datatypes.dtd).

Conversion

Conversion of the Built-in Datatypes

  • string: A regular string. It is converted to seq of char
  • normalizedString: A string without tabs, line feeds, or carriage returns. It is converted to seq of char
  • token: String without tabs, l/f, leading/trailing spaces, consecutive spaces. It is converted to seq of char
  • integer: A regular integer. It is converted to int
  • int: -2147483648 to 2147483647. It is converted to int
  • unsignedInt: 0 to 4294967295. It is converted to nat
  • long: -9223372036854775808 to 9223372036854775807. It is converted to int
  • unsignedLong: 0 to 18446744073709551615. It is converted to nat
  • positiveInteger: 1 to infinity. It is converted to nat1
  • nonNegativeInteger: 0 to infinity. It is converted to nat
  • time: format: hh:mm:ss.sss Conversion ToDo
  • dateTime: format: CCYY-MM-DDThh:mm:ss Conversion ToDo
  • duration: Example P1Y2M3DT10H30M12? .3S Conversion ToDo
  • date: format: CCYY-MM-DD Conversion ToDo
  • gMonth: format: --MM-- Conversion ToDo
  • gYear: format: CCYY Conversion ToDo
  • gYearMonth: format: CCYY-MM Conversion ToDo
  • gDay: format: ---DD Conversion ToDo
  • gMonthDay: format: --MM-DD Conversion ToDo
  • Name: Conversion ToDo
  • byte: -127 to 128 Conversion ToDo
  • unsignedByte: 0 to 255 Conversion ToDo
  • base64Binary: a base64 string Conversion ToDo
  • hexBinary: a hex string Conversion ToDo
  • nonPositiveInteger: negative infinity to 0 Conversion ToDo
  • negativeInteger: negative infinity to -1 Conversion ToDo
  • Qname: a namespace qualified name Conversion ToDo
  • NCName: Conversion ToDo
  • anyURI: An URL. Conversion ToDo
  • language: Any valid xml:lang value, e.g., EN, FR, .. Conversion ToDo
  • ID: must be used only with attributes. Conversion ToDo
  • IDREF: must be used only with attributes. Conversion ToDo
  • IDREFS: must be used only with attributes. Conversion ToDo
  • ENTITY: must be used only with attributes. Conversion ToDo
  • ENTITIES: must be used only with attributes. Conversion ToDo
  • NOTATION: a NOTATION from the XML spec. Conversion ToDo
  • NMTOKEN: must be used only with attributes. Conversion ToDo
  • NMTOKENS: must be used only with attributes. Conversion ToDo

All the components have a type associated with it and some of them will define new type definitions.

Conversion of the components of XML Schema

  • Primary Schema Components
    • XML namespaces: ToDo
    • Simple Types Definitions:
    • Complex Type Definitions:
    • Element Declarations:
    • Attribute Declarations:
  • Secondary Schema Components: ToDo
    • Model Group Definitions
    • Attribute Group Definitions
    • Identity-constraint Definitions - Similar to ID / IDREF
    • Notations Declarations
    • Wildcards - Similar to a DTD with “ANY”
    • Annotations

Presentation of the studied papers

In 2003.12.15 I present the studied papers:

  1. Haskell and XML: Generic Combinators or Type-Based Translation?, Malcolm Wallace and Colin Runciman, 1999
  2. Comparative Analysis of Six XML Schema Languages, Dongwon Lee and Wesley W. Chu, 2001
  3. UUXML: A Type-Preserving XML Schema-Haskell Data Binding, Frank Atanassow, Dave Clarke, and Johan Jeuring 2004

r7 - 12 Feb 2007 - 19:32:09 - JoseBacelarAlmeida
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
Syndicate this site RSSATOM