Sunday, March 25, 2012

Dimensional Modeling

As I took a week's break from blogging - since this is the first week of classes after the spring break - I  decided to adopt this new Q/A format for blogging  for the rest of the semester. And so I begin blogging about dimensional modeling in this format

What is Dimensional Modeling?

It is a set of techniques used to build a data warehouse. Each model has a set of dimensions, which are analogous to the operational tables in the database and a fact table which is analogous to the associative entity in an ER model.

How does Dimensional Modeling differ from ER Models?

Dimensional models are quiet different from ER models in the sense that they do not necessarily involve objects from the relational database. The dimensions can be even flat files. Moreover, the fact tables in dimensional modeling are loaded at fixed intervals when the operational tables are either under maintenance or under least load due to customer transactions while the dimension tables are loaded  in real time as and when a transaction takes place. Also the data in dimension tables is seldom deleted while  that in dimension tables/Associative entities in the ER model may be deleted once it is regarded as obsolete under the business rules.


What are some of the properties of facts?

Facts are designed to capture interesting patterns about your business that may not be evident in your transactional tables.  Facts can be typically aggregated across dimensions and provide knowledge valuable to businesses.

 Is it possible to normalize/de-normalize dimensions as we do in normal transactional databases?

Normalization of dimensions is possible though it is an expensive operation and thus not done usually.





Saturday, March 3, 2012

Properties of Networks

It's a very well known fact that we live in an era dominated by online social networks. However, in order to leverage these networks, it would be interesting to know the science behind the success of these networks. An understanding of this science provides a preliminary step towards leveraging these networks for various purposes like advertising, promotions, campaigns etc., And so here I provide you with some interesting properties of networks.

Degree : - It is the number of nodes to which a particular node is connected. In case of a directed graph it can be split into in degree and out degree according to the number of incoming links to and out going links from a particular node.

Centrality
-----------------
In general centrality is a measure of the importance of a node. However, to just state that a node is important is vague. There has to be a criterion for the same. Hence we have the following types of centrality

Betweenness Centrality : -It is the number of shortest paths - between every pair of nodes- on which a particular node lies. A high betweenness centrality implies a that a node act  as a mediator or bridge between two components of a network. The node "Heather" in the graph below has a high betweenness centrality.



Closeness Centrality : - It is the inverse of the distance from a node to all other nodes. A higher value of this metric indicates a higher closeness of this node to all other nodes in the network. It essentially indicates the importance of a node in terms of its ability to reach all other nodes easily.  The largest node in the figure below as a high closeness centrality.

Eigen Vector Centrality : - It is a measure of the influence of the node in a network. It assigns relative scores to nodes on the assumption that nodes closer to important nodes are more important. Google's page rank is a variant of Eigen-vector centrality.