The concept of data warehousing was introduced in 80s as a non volatile repository of historical data mainly used for organizational decision making. (Reddy, G, Srinivasu, R, Rao, M, & Rikkula, S 2010). While the data warehouse consist of information gathered from diverse sources, it maintains its own database, separated from operational databases, as it is structured for analytical processes rather than transactional processes (Chang-Tseh, H, & Binshan, L 2002).
Traditionally, data warehouses were used by medium and large organizations to “perform analysis on their data in order to more effectively understand their businesses” (Minsoo, L, Yoon-kyung, L, Hyejung, Y, Soo-kyung, S, & Sujeong, C 2007) which was designed as a centralized database used to store, retrieve and analyze information. Those systems were expensive, difficult to build and maintain, and in many cases made internal business processes more complicated.
With the wide adoption of Web (the Internet) as a successful distributed environment, data warehouses architecture evolved to a distributed collection of data marts and a metadata servers which describe the data stored in each individual repository (Chang-Tseh, H, & Binshan, L 2002). Moreover, the usage of web browsers made deployment and access the data warehouses less complicated and more affordable for businesses.
As a further matter, according to Pérez, J at. al. (2008) the Web is “the largest body of information accessible to any individual in the history of humanity where most data is unstructured, consisting of text (essentially HTML) and images” (Pérez, J, Berlanga, R, Aramburu, M, & Pedersen, T 2008). With the standardization of XML as a flexible semistructured data format to exchange data on the Internet (i.e. XHTML, SVG, etc), it became possible to “extract from source systems, clean (e.g. to detect and correct errors), transform (e.g. put into subject groups or summarized) and store” (Reddy, G, Srinivasu, R, Rao, M, & Rikkula, S 2010) the data in the data warehouse.
On the other hand, it is important to consider the “deep web” which accounts for close to 80% of the web (Chang-Tseh, H, & Binshan, L 2002), the data access, retrieval, cleaning and transformation could present further obstacles to overcome. In addition, as the information stored in the data warehouses becomes more accessible through Internet browsers (as compare to corporate fat-clients), so does the risk of data theft (through malicious attacks) and leakage. Chang-Tseh at. al. (2002) further notes that the security of the warehouse is dependent primary on the quality and the enforcement of the organizational security policy.
- Chang-Tseh, H, & Binshan, L 2002, 'WEB-BASED DATA WAREHOUSING: CURRENT STATUS AND PERSPECTIVE', Journal Of Computer Information Systems, 43, 2, p. 1, Business Source Premier, EBSCOhost, viewed 5 November 2011.
- H.M. Deitel, P.J, Deitel and A.B. Goldber, 2004. “Internet & World Wide Web How to Program”. 3Rd Edition. Pearson Education Inc. Upper Saddle River, New Jersey.
- Minsoo, L, Yoon-kyung, L, Hyejung, Y, Soo-kyung, S, & Sujeong, C 2007, 'Issues and Architecture for Supporting Data Warehouse Queries in Web Portals', International Journal Of Computer Science & Engineering, 1, 2, pp. 133-138, Computers & Applied Sciences Complete, EBSCOhost, viewed 5 November 2011.
- Pérez, J, Berlanga, R, Aramburu, M, & Pedersen, T 2008, 'Integrating Data Warehouses with Web Data: A Survey', IEEE Transactions On Knowledge & Data Engineering, 20, 7, pp. 940-955, Business Source Premier, EBSCOhost, viewed 5 November 2011.
- Reddy, G, Srinivasu, R, Rao, M, & Rikkula, S 2010, 'DATA WAREHOUSING, DATA MINING, OLAP AND OLTP TECHNOLOGIES ARE ESSENTIAL ELEMENTS TO SUPPORT DECISION-MAKING PROCESS IN INDUSTRIES', International Journal On Computer Science & Engineering, 2, 9, pp. 2865-2873, Academic Search Complete, EBSCOhost, viewed 5 November 2011.