Lectures

[T11]: SPARQL Query Language

SPARQL Query Language; Query forms; Variables, blank nodes, datatypes; Graph provenance; Graph Patterns; Union and Optional Patterns; Filters; Modifiers, Aggregates; Negation; Returning Graphs. SPARQL Semantics. SPARQL Protocol.
Recommended Readings: (i) A Semantic Web primer from page 103 to 108
To Know:
  • Why using RDF is not appropriate for querying RDF graphs
  • Why using XML query languages are not appropriate for querying RDF graphs
  • SPARQL is a language and a Protocol to make queries on the Web of data.
  • The different Query forms and what are the differences
  • The basic mechanism of patterns and variables
  • The difference between variables and blanc nodes
  • Why no FROM clause is required in SPARQL?
  • What is the main reason to use FROM in a SPARQL query
  • In which cases the query return may contain Unbound results
  • How to use UNION and OPTIONAL. In which cases OPTIONAL may mne useful
  • How to use Filters. Different types of filters
  • Modify the results: modifers (Order by, distinct, limit)
  • Aggregates
  • Subqueries. How they are integrated in a query
  • Forms of Negation in SPARQL

[T10]: RDFS Semantics

RDFS semantics. Simple, RDF and RDFS-entailment. Interpolation lemma. Inference systems for RDF and RDFS-entailment. Minimal deductive systems. Complexity of reasoning of reasoning.
Recommended Readings: (i) A Semantic Web primer from page 94 to 103
To Know: soon

[T09]: RFD and RFD Schema

The Resource Description Framework (RDF) and Resource Description Framework Schema (RDFS).
Recommended Readings: (i) A Semantic Web primer from page 65 to 94
To Know:
  • The RDF data model components: resources, URIs, properties, statements
  • The different views of a statement: A triple (Object, Property, Value) or (Subject,Predicate,Object); an arc connecting two nodes in a graph; a piece of XML code, representing the triple
  • Available syntaxes: Turtle; XML; JSON
  • What can be a Subject, a Predicate and an Object
  • Literals and data-types
  • How to represent N-ary predicates
  • The role of blanc nodes
  • Containers. What are? Which types? What for?
  • Reification: What is and how to apply
  • What are the differences between RDF Schema e XML Schema. What is the purpose of RFDF Schema
  • The mains components of RDF schema
  • Core Classes and Core properties of RDF schema

[T08]: Semantic Web

The World Wide Web; Daily use of the WWW; Querying the Web; A Web of Data; The Semantic Web; Automatic data integration; Semantic Web Principles; The Semantic Web Cake; The Semantic Web at work.
Recommended Readings: (i) A Semantic Web primer up to page 24
To Know:
  • The actual The World Wide Web. Its main contents : documents.
  • Querying the Web. Serch engines: their limitations
  • The vision of a Semantic Web
  • Sematic Web: definitions and approach
  • Understand the exemple of data integration
  • The Semantic Web Principles
  • The Semantic Web Cake: its layers and the role of each one
  • Linked Data Project

[T07]: Graph Databases

Graph Databases. Limitation of Relational Databases and Aggregate Stores to represent interconnected data. Data Modeling for Graphs. Guidelines and antipatterns. Cypher language.
Recommended Readings: (i) Graph Databases, from Ian Robinson, Jim Webber & Emil Eifrem - pages 1 - 64
To Know:
  • The components of Labeled Property Graphs, their roles and relations
  • What are the major issue of RDMS to deal with the typical relations between entities
  • What are the major issue of Document based database to deal with the typical relations between entities
  • The main guidelines to graph modeling
  • Understand the fine-Grained versus Generic Relationships issue
  • Understand the guideline to model Facts as Nodes
  • Understand how the graph data model can accommodate Cross-Domains models
  • Understand how the graph data model can accommodate Cross-Domains models
  • Understand the guideline to Represent Complex Value Types as Nodes
  • Understand the discussed Patterns for Geospatial and Time data

[T06.3]: Complememtary notes on MD modeling

Order Transactions; Fact Normalization; Dimension Role-Playing; Product Dimension Revisited; Customer Ship-To Dimension; Deal Dimension; Degenerate Dimension for Order Number; Junk Dimensions; Multiple Currencies; Header and Line Item Facts with Different Granularity; Invoice Transactions; Accumulating Snapshot for the Order Fulfillment Pipeline, Fact Table Comparison; Designing Real-Time Partitions.
Recommended Readings: (i) (Kimball, 2002) - from page 165 to 199
To Know:
  • The differences between the three types of transaction tables and its main concerns
  • The following concepts: Dimension Role-Playing; Product Dimension Revisited; Customer Ship-To Dimension; Deal Dimension; Degenerate Dimension for Order Number; Junk Dimensions; Multiple Currencies; Header and Line Item Facts with Different Granularity
  • The need for real-time partitions and the proposed approach

[T06.2]: Slowly Changing Dimensions

Discussion of Slowly Changing Dimensions: the need, the motivation and the techniques. Discussion of problems arising with Large Changing Customer Dimensions and the use of min-dimensions.
Recommended Readings: (i) (Kimball, 2002) - from page 154 to 160
To Know:
  • Understand the kind of changes that may occurs on a dimension attribute values and its impact on analytical answers
  • Understand the proposed techniques to cope with change on dimensions and its applicability
  • How to apply these techniques including the hybrid ones
  • Understand how the analytical requirements influence the choice of an appropriate technique
  • Understand the problems that arise on very large dimensions (namely the customer dimension in some business)
  • Understand the concept of mini-dimension and their use to manage the change on large dimensions

[T06.1]: Mudimensional Modeling for procurement

Introduction to procurement process (in different business) and its activities and common analytic requirements. The discussion of approaches with one or more transaction tables.
Recommended Readings: (i) (Kimball, 2002) - from page 89 to 105
To Know:
  • To understand the factors to take into account for the decision of Multiple versus single transaction tables: different transactions viewed or not as separated processes; different source systems; different transactions with the same or different dimensionality; the existence of multiple control numbers associated to some transactions;

[T05]: Multidimensional models for inventory

Introducing the concept of Value Chain and its relevance for Multidimensional Modeling. Introduction to the inventory and the possible multidimensional models for inventory: Inventory Periodic Snapshot; Inventory Transactions; Inventory Accumulating Snapshot. Fact tables Sparsity and growth rates. Semi-additive Facts.
Recommended Readings: (i) (Kimball, 2002) - from page 67 to 88
To Know:
  • To understand he concept of Value Chain and its relevance for Multidimensional Modeling.
  • To understand the multidimensional models for inventory: Inventory Periodic Snapshot; Inventory Transactions; Inventory Accumulating Snapshot. To understand their differences and when one is more appropriate than another. To understand that they have different ways of loading new data
  • To estimate the Fact table growth rate and how to deal with very large fact tables
  • To understand the concept of semi-addictive facts and to know how to check the additivity. WHy the metrics that record static levels are non-additive across the date dimension (and possibly other dimensions). Possible aggregations and the SQL AVG trap.
  • How to derive new metrics (based on inventory example)

[T04]: The process for multidimensional modeling

The MDM process and the necessary elements. Presentation of a case study for retail. Modeling a MDM for the presented case: the metrics, the dimensions and their attributes. Extending the model. Discussion on snowflaking.
Recommended Readings: (i) (Kimball, 2002) - from page 29 to 65
To Know:
  • To understand the basics steps for the MD modeling. Which elements should be considered: avalable data. The impact of a chosen granularity. The granularity and the basic dimensionality.
  • The choice of appropriate additive facts and how to derive new metrics.
  • How to model dimensions: the particular cases of date and time of day dimensions; attributes and hierarchies on dimensions; surrogate keys; non applicable records on dimensions.
  • The discussion on promotion dimension. The use of a fact-less table to represent the products covered by a promotion.
  • Degenerate dimensions. adding new dimensions (check the granularity, and the importance of non-applicable records); new attributes on existing dimensions; new facts;

[T03]: From Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design

The paper "Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design” from Daniel L. Moody e Mark A. R. Kortink. presents a methodology to build different Analytical models from OLTP models. This lecture presents and discuss in detail the proposed methodology. Methodology overview. Entity classification. Hierarchy and the concepts of minimal and maximal entities. The basic operations: collapse and aggregation. Different type of models. Different type of models and the appropriated procedures to derivate them.
Recommended Readings: (i) (Moody, 2000)
To Know:
  • To understand the entity classification in terms of transactional, component and classification. The justification for this classification and the heuristics to detect the appropriate entity classification. How to relate theses entities with fact tables and with dimension tables
  • To understand the concepts: hierarchy, maximal hierarchy, minimal and maximal entities. The importance and role of minimal entities.
  • To understand the basic operations used to build the analytical models: collapse and aggregation. It should be clear the impact on the model and on the data.
  • To get a basic understand of the possible analytical models: Flat Schema; Terraced Schema; Star Schema. To understand that on some models the information is aggregated and some losses are present. The necessary attention when the flat schema is used.

[T02]: Introduction to DW

Introduction to Decision Support Systems: evolution, fields and approaches. DW and OLAP in the context of DSS. Patterns of analytical activities of OLAP users. Typical iteration process of OLAP users. Data Warehouse: an historical perspective. The growing need of Analyzing the organization performance based on the data produced by the operational systems. The "EXtract" program and its proliferation and the issues with it. The need of a different approach.
The Data Warehouse architecture. Corporate Information Factory Architecture: Data Acquisition - (ETL); Data Delivery - (ETL); Data Warehouse; Operational Data Store; Data Mart; Metadata Management; Information feedback; Information Workshop. Role and Purposes of the Data Warehouse. DW Design (inmon school).
Quick overview of OLAP cube concepts: Mulridimensional Cube; Basic operations: slice; aggregation. Basics of Multidimensional Modeling: Star Schema (Dimensions and Fact tables): Querying a star schema and its relation to a typical result of OLAP query. Drill-Down and Drill-Up.

Recommended Readings: (i) at wiwkipedia: Data Warehouse, OLAP, OLTP, DSS, (ii) at [Inmon, 2002] - from page 1 to 30. (iii) at [Caludia Imohff, 2003] - Chapter 1: page 1 to 27. (iv) at [Kimball - The Data Warehouse toolkit, 2002) - pag 16 to 27.
To Know:
  • The general concept of Decision Support Systems their evolution and the different approaches.
  • Data Warhouse and OLAP as DSS "Data-Driven". Understand its actual relevance and importance
  • Basic understand of the DW reference architecture.
  • Fundamental differences from OLTP and OLAP systems, models, use, and users.
  • Most common Patterns of analytical activities of OLAP users
  • Understand the Corporate Information Model (CIF): The different roles for the DW, the ODS and the Data Marts (specially the OLAP data marts). The fundamental aspect of feedback from the knowledge and information gathered at DSS systems into the architecture (operational systems and the DW).
  • Understand the fundamental differences between OLTP and the analytical activities developed on the DW or on the Data Marts: data, access, users, etc..
  • Understand the concept of Multidimensional Cube and the two basic operations of slice and aggregate.
  • Basic understanding of the relational implementation of a multidimensional cube: DImension and Fact-tables (and its roles).
  • The typical OLAP query and its relational implementation

[T01]: Course overview

Course Organization and Overview: Syllabus; Bibliography; Evaluation rules; important dates, etc..
Big data and challenges. The NoSQL movement, CAP and PACELC theorems. Types of NoSQL database systems. Labeled Property Graphs.
Recommended Activities: (i) see the following videos "DT&SC 7-3: What is Big Data?, DT&SC 7-4: Digital Big Data Footprint. from Martin Hilbert (ii) Visit the various sections of this site.
To Know:
  • A first understand of what is BigData, its main characteristics, and the main challenges
  • Understand the examples of digital footprint from the second video Martin Hilbert
  • What is the the Semantic Web Vision
  • What is about the CAP theorem and its implications
  • understand the NoSQL quadrants and the motivation for each one