Introduction to Data Warehousing

by Thato Tshukudu, March 14, 2021


A data warehouse can be described as a system that stores data from the operations of a company and other external sources. They store historical information to allow for analysis of ... data over a period of time.

Due to the many disparate systems that may exist in an organization, data exists in different formats in these systems, but a data warehouse is there to apply a uniform formatting to the collected data from all these systems. Due to their ability to allow analysis of prehistoric data, data warehouses are effective in presenting facts about the organization thus making it easier for business leaders to make better business decisions about the organization’s future and current market position.

Due to the increase in data from across all departments in all organizations, it is almost a definite benefit for every business to adopt data warehouses. SMEs also suffer from disparate information across their systems and would benefit too from having all their data integrated into one format. The main purpose of a data warehouse is to enhance decision making and that applies to businesses of different sizes too. Although the implementation will be different from a small business to a large business, there are different approaches which may be applied to develop a data warehouse which is most appropriate for each size.

Ralph Kimball proposed the bottom-up approach which supports the development of data marts first which collective integrate to produce a larger data warehouse. The benefit of such a approach is that it is less complex to develop, thus a small business start-up will not have to hire a large team for the development of their data warehouse. Due to start-up’s limited resources, comparative to larger businesses, it is essential that development time of projects is minimized. Data warehouses developed using the Kimball approach take less time and less initial costs.

The availability of these two approaches can simplify the decision of implementing a data warehouse, whether in a large and old-existing organization or a start-up business. Small startups may also look into cheap and lightweight architectures for their business intelligence tools. As a result, no business (regardless of size) is exempt from the possibility of taking full advantage of data warehousing’s organizational benefits.

Data Warehouse: Architecture & Metadata

by Thato Tshukudu, April 27, 2021


This blog post is written to satisfy the requirements of Portfolio Task 3 - INF 715 ...

Given your Covid-19 analysis data warehouse, which of the architectural types of a data warehouse will you implement and why?

(Ponniah, 2010) describes a centralized data warehouse as having no data marts and information delivery coming from the data warehouse itself. An independent data warehouse is seen by (Ponniah, 2010) as a collection of separated data marts that each serve a department. A federated architecture is an independent architecture-style with data elements and warehouses integrated “physically or logically”. A hub-and-spoke architecture is one that has data marts that depend on the main data warehouse for data feed (Ponniah, 2010).

My choice: Data-mart bus architecture
According to (Poolet, 2007), a database warehouse architecture is one that is made up of tightly-integrated data marts which draw their essence from a “common set of conformed dimensions and facts”. My suggestion in applying this architecture stems from the fact that unlike an independent data mart architecture, which is built up by data marts from departments which are unrelated in dimensions, a data-mart allows stakeholders to agree on dimensions that must be the same throughout all the data marts. This is important to note because the COVID-19 data warehouse captures the same COVID-related data from each province. Each province does not capture its own unique data but uses a set of dimensions decided by scientists to be applied to each province for gathering the COVID-19 data. Furthermore, a data-mart bus still maintains the principles of a centralized data warehouse, which the COVID-19 data warehouse is.

What will be the role of metadata in your warehouse?
According to (Hamad & Jihad, 2012), metadata serves for “administrative, maintenance and usage” purposes. Metadata helps to ensure consistency of definitions and clarifying the different relationships that exist within the data.
List at least 5 types of meta-data that are important and relevant to your data warehouse.

  1. Back-end metadata: this will help the DBA with the ETL processes
  2. Front-end metadata: this will help end-users to work with the reports
  3. Process metadata: this will monitor process data and record them as facts, such as the time taken to load into the data warehouse.
  4. Technical metadata: will store technical data such as table attributes, primary and foreign key attributes.
  5. Business metadata: derived from the business rules of the COVID-19 DW and used for business terms for analysts and managers.

References
  • Hamad, M.M and Jihad, A.A (2012). The role of metadata for effective data warehouse. Journal of university of anbar for pure science. Vol (6), no.2, pp. 1-6.
  • Ponniah, P. (2010). Data warehousing fundamentals for IT professionals. United States: Wiley.
  • Poolet, M.A. (2007). The data warehouse bus architecture. Available at: https://www.itprotoday.com/sql-server/data-warehouse-bus-architecture (Accessed: 27 April 2021).

About Me

Thato Tshukudu is the national winner of the 2017 Poetry in Mcgregor competition, is 1 out of 63 South African poets to be selected for publication in volume 8 of the Sol Plaatje European Union Anthology, ... has international features in the Better than Starbucks literary magazine and the 2016 and 2017 Best New African Poets anthology. His work has been selected for publication in Douglas-Reid Skinner’s September 2018 special edition Stanzas Magazine, 2017 Avbob poetry competition and is the second South African writer to be featured in Ghana Writes’ guest writer of the week project. He was also part of the Top 4 finalists in Whisper Poetry's International Poetry Competition in 2019. Thato published a poetry book titled fly in a beehive through esteemed African writer Tendai Mwanaka’s mentorship. Fly in a beehive is collected in African Books Collective and John Hopkin’s University Project Muse to name but a few. His TEDx talk has a poetical story element that aims to start a dialogue about fatherless homes and to encourage action on how to solve the far-reaching matter

Popular Post

Introduction to Data Warehousing