Lectures

[T11]: SPARQL Query Language

22 Dec 2021, 12:00 PM

SPARQL Query Language; Query forms; Variables, blank nodes, datatypes; Graph provenance; Graph Patterns; Union and Optional Patterns; Filters; Modifiers, Aggregates; Negation; Returning Graphs. SPARQL Semantics. SPARQL Protocol.

Recommended Readings: (i) A Semantic Web primer from page 103 to 108

To Know:

Why using RDF is not appropriate for querying RDF graphs
Why using XML query languages are not appropriate for querying RDF graphs
SPARQL is a language and a Protocol to make queries on the Web of data.
The different Query forms and what are the differences
The basic mechanism of patterns and variables
The difference between variables and blanc nodes
Why no FROM clause is required in SPARQL?
What is the main reason to use FROM in a SPARQL query
In which cases the query return may contain Unbound results
How to use UNION and OPTIONAL. In which cases OPTIONAL may mne useful
How to use Filters. Different types of filters
Modify the results: modifers (Order by, distinct, limit)
Aggregates
Subqueries. How they are integrated in a query
Forms of Negation in SPARQL

[T10]: RDFS Semantics

22 Dec 2021, 11:00 AM

RDFS semantics. Simple, RDF and RDFS-entailment. Interpolation lemma. Inference systems for RDF and RDFS-entailment. Minimal deductive systems. Complexity of reasoning of reasoning.

Recommended Readings: (i) A Semantic Web primer from page 94 to 103

To Know: soon

[T09]: RFD and RFD Schema

15 Dec 2021, 11:00 AM

The Resource Description Framework (RDF) and Resource Description Framework Schema (RDFS).

Recommended Readings: (i) A Semantic Web primer from page 65 to 94

To Know:

The RDF data model components: resources, URIs, properties, statements
The different views of a statement: A triple (Object, Property, Value) or (Subject,Predicate,Object); an arc connecting two nodes in a graph; a piece of XML code, representing the triple
Available syntaxes: Turtle; XML; JSON
What can be a Subject, a Predicate and an Object
Literals and data-types
How to represent N-ary predicates
The role of blanc nodes
Containers. What are? Which types? What for?
Reification: What is and how to apply
What are the differences between RDF Schema e XML Schema. What is the purpose of RFDF Schema
The mains components of RDF schema
Core Classes and Core properties of RDF schema

[T08]: Semantic Web

24 Nov 2021, 11:00 AM

The World Wide Web; Daily use of the WWW; Querying the Web; A Web of Data; The Semantic Web; Automatic data integration; Semantic Web Principles; The Semantic Web Cake; The Semantic Web at work.

Recommended Readings: (i) A Semantic Web primer up to page 24

To Know:

The actual The World Wide Web. Its main contents : documents.
Querying the Web. Serch engines: their limitations
The vision of a Semantic Web
Sematic Web: definitions and approach
Understand the exemple of data integration
The Semantic Web Principles
The Semantic Web Cake: its layers and the role of each one
Linked Data Project

[T07]: Graph Databases

17 Nov 2021, 11:10 AM

Graph Databases. Limitation of Relational Databases and Aggregate Stores to represent interconnected data. Data Modeling for Graphs. Guidelines and antipatterns. Cypher language.

Recommended Readings: (i) Graph Databases, from Ian Robinson, Jim Webber & Emil Eifrem - pages 1 - 64

To Know:

The components of Labeled Property Graphs, their roles and relations
What are the major issue of RDMS to deal with the typical relations between entities
What are the major issue of Document based database to deal with the typical relations between entities
The main guidelines to graph modeling
Understand the fine-Grained versus Generic Relationships issue
Understand the guideline to model Facts as Nodes
Understand how the graph data model can accommodate Cross-Domains models
Understand how the graph data model can accommodate Cross-Domains models
Understand the guideline to Represent Complex Value Types as Nodes
Understand the discussed Patterns for Geospatial and Time data

[T06.3]: Complememtary notes on MD modeling

03 Nov 2021, 11:10 AM

Order Transactions; Fact Normalization; Dimension Role-Playing; Product Dimension Revisited; Customer Ship-To Dimension; Deal Dimension; Degenerate Dimension for Order Number; Junk Dimensions; Multiple Currencies; Header and Line Item Facts with Different Granularity; Invoice Transactions; Accumulating Snapshot for the Order Fulfillment Pipeline, Fact Table Comparison; Designing Real-Time Partitions.

Recommended Readings: (i) (Kimball, 2002) - from page 165 to 199

To Know:

The differences between the three types of transaction tables and its main concerns
The following concepts: Dimension Role-Playing; Product Dimension Revisited; Customer Ship-To Dimension; Deal Dimension; Degenerate Dimension for Order Number; Junk Dimensions; Multiple Currencies; Header and Line Item Facts with Different Granularity
The need for real-time partitions and the proposed approach

[T06.2]: Slowly Changing Dimensions

03 Nov 2021, 11:10 AM

Discussion of Slowly Changing Dimensions: the need, the motivation and the techniques. Discussion of problems arising with Large Changing Customer Dimensions and the use of min-dimensions.

Recommended Readings: (i) (Kimball, 2002) - from page 154 to 160

To Know:

Understand the kind of changes that may occurs on a dimension attribute values and its impact on analytical answers
Understand the proposed techniques to cope with change on dimensions and its applicability
How to apply these techniques including the hybrid ones
Understand how the analytical requirements influence the choice of an appropriate technique
Understand the problems that arise on very large dimensions (namely the customer dimension in some business)
Understand the concept of mini-dimension and their use to manage the change on large dimensions

[T06.1]: Mudimensional Modeling for procurement

03 Nov 2021, 11:10 AM

Introduction to procurement process (in different business) and its activities and common analytic requirements. The discussion of approaches with one or more transaction tables.

Recommended Readings: (i) (Kimball, 2002) - from page 89 to 105

To Know:

To understand the factors to take into account for the decision of Multiple versus single transaction tables: different transactions viewed or not as separated processes; different source systems; different transactions with the same or different dimensionality; the existence of multiple control numbers associated to some transactions;

[T05]: Multidimensional models for inventory

27 Oct 2021, 11:10 AM

Introducing the concept of Value Chain and its relevance for Multidimensional Modeling. Introduction to the inventory and the possible multidimensional models for inventory: Inventory Periodic Snapshot; Inventory Transactions; Inventory Accumulating Snapshot. Fact tables Sparsity and growth rates. Semi-additive Facts.

Recommended Readings: (i) (Kimball, 2002) - from page 67 to 88

To Know:

To understand he concept of Value Chain and its relevance for Multidimensional Modeling.
To understand the multidimensional models for inventory: Inventory Periodic Snapshot; Inventory Transactions; Inventory Accumulating Snapshot. To understand their differences and when one is more appropriate than another. To understand that they have different ways of loading new data
To estimate the Fact table growth rate and how to deal with very large fact tables
To understand the concept of semi-addictive facts and to know how to check the additivity. WHy the metrics that record static levels are non-additive across the date dimension (and possibly other dimensions). Possible aggregations and the SQL AVG trap.
How to derive new metrics (based on inventory example)

[T04]: The process for multidimensional modeling

20 Oct 2021, 11:10 AM

The MDM process and the necessary elements. Presentation of a case study for retail. Modeling a MDM for the presented case: the metrics, the dimensions and their attributes. Extending the model. Discussion on snowflaking.

Recommended Readings: (i) (Kimball, 2002) - from page 29 to 65

To Know:

To understand the basics steps for the MD modeling. Which elements should be considered: avalable data. The impact of a chosen granularity. The granularity and the basic dimensionality.
The choice of appropriate additive facts and how to derive new metrics.
How to model dimensions: the particular cases of date and time of day dimensions; attributes and hierarchies on dimensions; surrogate keys; non applicable records on dimensions.
The discussion on promotion dimension. The use of a fact-less table to represent the products covered by a promotion.
Degenerate dimensions. adding new dimensions (check the granularity, and the importance of non-applicable records); new attributes on existing dimensions; new facts;

[T03]: From Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design

13 Oct 2021, 11:10 AM

The paper "Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design” from Daniel L. Moody e Mark A. R. Kortink. presents a methodology to build different Analytical models from OLTP models. This lecture presents and discuss in detail the proposed methodology. Methodology overview. Entity classification. Hierarchy and the concepts of minimal and maximal entities. The basic operations: collapse and aggregation. Different type of models. Different type of models and the appropriated procedures to derivate them.

Recommended Readings: (i) (Moody, 2000)

To Know:

To understand the entity classification in terms of transactional, component and classification. The justification for this classification and the heuristics to detect the appropriate entity classification. How to relate theses entities with fact tables and with dimension tables
To understand the concepts: hierarchy, maximal hierarchy, minimal and maximal entities. The importance and role of minimal entities.
To understand the basic operations used to build the analytical models: collapse and aggregation. It should be clear the impact on the model and on the data.
To get a basic understand of the possible analytical models: Flat Schema; Terraced Schema; Star Schema. To understand that on some models the information is aggregated and some losses are present. The necessary attention when the flat schema is used.

[T02]: Introduction to DW

06 Oct 2021, 11:10 AM

Introduction to Decision Support Systems: evolution, fields and approaches. DW and OLAP in the context of DSS. Patterns of analytical activities of OLAP users. Typical iteration process of OLAP users. Data Warehouse: an historical perspective. The growing need of Analyzing the organization performance based on the data produced by the operational systems. The "EXtract" program and its proliferation and the issues with it. The need of a different approach.
The Data Warehouse architecture. Corporate Information Factory Architecture: Data Acquisition - (ETL); Data Delivery - (ETL); Data Warehouse; Operational Data Store; Data Mart; Metadata Management; Information feedback; Information Workshop. Role and Purposes of the Data Warehouse. DW Design (inmon school).
Quick overview of OLAP cube concepts: Mulridimensional Cube; Basic operations: slice; aggregation. Basics of Multidimensional Modeling: Star Schema (Dimensions and Fact tables): Querying a star schema and its relation to a typical result of OLAP query. Drill-Down and Drill-Up.

Recommended Readings: (i) at wiwkipedia: Data Warehouse, OLAP, OLTP, DSS, (ii) at [Inmon, 2002] - from page 1 to 30. (iii) at [Caludia Imohff, 2003] - Chapter 1: page 1 to 27. (iv) at [Kimball - The Data Warehouse toolkit, 2002) - pag 16 to 27.

To Know:

The general concept of Decision Support Systems their evolution and the different approaches.
Data Warhouse and OLAP as DSS "Data-Driven". Understand its actual relevance and importance
Basic understand of the DW reference architecture.
Fundamental differences from OLTP and OLAP systems, models, use, and users.
Most common Patterns of analytical activities of OLAP users
Understand the Corporate Information Model (CIF): The different roles for the DW, the ODS and the Data Marts (specially the OLAP data marts). The fundamental aspect of feedback from the knowledge and information gathered at DSS systems into the architecture (operational systems and the DW).
Understand the fundamental differences between OLTP and the analytical activities developed on the DW or on the Data Marts: data, access, users, etc..
Understand the concept of Multidimensional Cube and the two basic operations of slice and aggregate.
Basic understanding of the relational implementation of a multidimensional cube: DImension and Fact-tables (and its roles).
The typical OLAP query and its relational implementation

[T01]: Course overview

29 Sep 2021, 11:10 AM

Course Organization and Overview: Syllabus; Bibliography; Evaluation rules; important dates, etc..
Big data and challenges. The NoSQL movement, CAP and PACELC theorems. Types of NoSQL database systems. Labeled Property Graphs.

Recommended Readings: (i) Consistency Tradeoffs in Modern Distributed Database System
Design, from Daniel J. Abadi (ii) the page in http://blog.nahurst.com/visual-guide-to-nosql-systems.

Recommended Activities: (i) see the following videos "DT&SC 7-3: What is Big Data?, DT&SC 7-4: Digital Big Data Footprint. from Martin Hilbert (ii) Visit the various sections of this site.

To Know:

A first understand of what is BigData, its main characteristics, and the main challenges
Understand the examples of digital footprint from the second video Martin Hilbert
What is the the Semantic Web Vision
What is about the CAP theorem and its implications
understand the NoSQL quadrants and the motivation for each one

MD 21/22

(Advanced) Data Modeling

[T11]: SPARQL Query Language

[T10]: RDFS Semantics

[T09]: RFD and RFD Schema

[T08]: Semantic Web

[T07]: Graph Databases

[T06.3]: Complememtary notes on MD modeling

[T06.2]: Slowly Changing Dimensions

[T06.1]: Mudimensional Modeling for procurement

[T05]: Multidimensional models for inventory

[T04]: The process for multidimensional modeling

[T03]: From Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design

[T02]: Introduction to DW

[T01]: Course overview