...collaborate on
Spreadsheet Under Scrutiny - Project Report

Under construction

Introduction

Spreadsheet tools can be viewed as programming environments for non-programmers. Their language of cell references and formulas offers only low-level expressiveness compared to regular programming languages. Still, huge numbers of large and often complex spreadsheets are created and maintained every day in all companies around the world.

In this project i have developed a few tools and respective API for Spreadsheet Analysis.

Project Objectives

  • Completely read and create a data structure to Spreadsheet understanding.
  • Add more capabilities to Data-flow graph
  • Add slicing and chopping analysis
  • Implement a type inference system using the already implemented algorithm.

Data Structure

Basic Notions

At the beginning of the project there already exist a Schema for the Gnumeric XML file, and a little of this was already been converted to DTD. With this DTD file was already being generated (by DTDtoHaskell) a Haskell Structure for representation of the Gnumeric content. In this structure the cell content was represented as a String, and for that reason it was converted to a type Formula.

data Formula
  = Var String                      -- ^ Variables
  | Fun String [Formula]            -- ^ Function applications
  | Ref CellRef (Maybe CellRef)     -- ^ Cell or cell block reference

This modification to the data structure is made in the Makefile that generates the file with all the structure and that instances for XmlContent? (Syntax.hs). Equally in the make file is possible to generate the file SyntaxTermInstances.hs that contains instances for Term and Typeable.

For reading gnumeric files there is the function

readGnumeric :: (Term a, XmlContent a) => FilePath -> IO a
that given the file name it returns the Gmr'Workbook (root element) with cells content as Formulas This conversion is made by a type unifying strategic function that applies to every cell content the function 'parseFormula'.

In this part of the project i have totally (expected) convert the Schema to DTD and now every possible element of the Gnumeric file is represented in the data structure.

All this processed can be reverted so we can pretty print all the structure. For that already HaXML? already has the function showXML and Formula is instance of Show.

Dataflow Graph

The data-flow graph was one of the things that were previously made.

This graph is generated by a traversal function that for every Reference it creates a relation between the cell that contains the formula and the element referenced.

This was the first approach to data-flow graph, but it has a few problems because it doesn't support References to elements in diferent Sheets. I have resolved the problem by adding Sheet notion to

data CellRef
  = CellRef {
      sheet :: String,     -- ^ Sheet name
      colNr :: Int,        -- ^ Column index
      rowNr :: Int         -- ^ Row index
    }
and creating a new traversal function that for every Gmr'Sheet runs the previous explained data-flow function.

For the next slicing and chopping was important to have a expanded data-flow graph. For that reason i generalised the previous explained functions to accept as parameter the following functions

collectBlockRefs :: Term a => a -> [CellBlock] collectCellRefs :: Term a => a -> [CellRef]

that will be applied in all the formulas generating a Gph CellBlock? or a Gph CellRef? respectively.

Results

(under Construction)

Slicing and Chopping

For the slicing and chopping i already had generic functions and the only thing i have implemented was the filtering functions. This functions

filterExpandedDataflow :: Set CellRef -> Gmr'Workbook -> Gmr'Workbook
filterCells :: Set CellBlock -> Gmr'Workbook -> Gmr'Workbook
were created to filter Spreadsheets by some list of cells. This two functions were created so it would be possible to filter by CellRef? and CellBlock? for expanded and normal graphs respectively.

  • Slicing
    • Forward - group of cells that contribute to a specified group of cells.
    • Backward - group of cells that have value contribution from a specified group of cells.
  • Chopping - Is the interception between a Forward and Backward Slice.

For building the slicing/chopping already exist the sliceOrChopWith function that given a list of elements for sources and sinks and the dataflow graph it returns the list of elements that represent the slice/chop. This function was already defined in the Haskell Uminho Library.

Results

(under construction)

Type Inference

Future Work

Conclusion

-- CupertinoMiranda - 04 Mar 2005

r5 - 05 Mar 2005 - 16:45:42 - CupertinoMiranda
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
Syndicate this site RSSATOM