The concept of data warehousing was introduced in
80s as a non volatile repository of historical data mainly used for
organizational decision making. (Reddy, G, Srinivasu, R, Rao, M, &
Rikkula, S 2010). While the data warehouse consist of information
gathered from diverse sources, it maintains its own database,
separated from operational databases, as it is structured for
analytical processes rather than transactional processes (Chang-Tseh,
H, & Binshan, L 2002).
Traditionally, data warehouses were used by medium
and large organizations to “perform analysis on their data in order
to more effectively understand their businesses” (Minsoo, L,
Yoon-kyung, L, Hyejung, Y, Soo-kyung, S, & Sujeong, C 2007) which
was designed as a centralized database used to store, retrieve and
analyze information. Those systems were expensive, difficult to build
and maintain, and in many cases made internal business processes more
complicated.
With the wide adoption of Web (the Internet) as a
successful distributed environment, data warehouses architecture
evolved to a distributed collection of data marts and a metadata
servers which describe the data stored in each individual repository
(Chang-Tseh, H, & Binshan, L 2002). Moreover, the usage of web
browsers made deployment and access the data warehouses less
complicated and more affordable for businesses.
As a further matter, according to Pérez, J at. al.
(2008) the Web is “the largest body of information accessible to
any individual in the history of humanity where most data is
unstructured, consisting of text (essentially HTML) and images”
(Pérez, J, Berlanga, R, Aramburu, M, & Pedersen, T 2008). With
the standardization of XML as a flexible semistructured data format
to exchange data on the Internet (i.e. XHTML, SVG, etc), it became
possible to “extract from source systems, clean (e.g. to detect and
correct errors), transform (e.g. put into subject groups or
summarized) and store” (Reddy, G, Srinivasu, R, Rao, M, &
Rikkula, S 2010) the data in the data warehouse.
On the other hand, it is important to consider the
“deep web” which accounts for close to 80% of the web
(Chang-Tseh, H, & Binshan, L 2002), the data access, retrieval,
cleaning and transformation could present further obstacles to
overcome. In addition, as the information stored in the data
warehouses becomes more accessible through Internet browsers (as
compare to corporate fat-clients), so does the risk of data theft
(through malicious attacks) and leakage. Chang-Tseh at. al. (2002)
further notes that the security of the warehouse is dependent primary
on the quality and the enforcement of the organizational security
policy.
Bibliography
- Chang-Tseh, H, & Binshan, L 2002, 'WEB-BASED DATA WAREHOUSING: CURRENT STATUS AND PERSPECTIVE', Journal Of Computer Information Systems, 43, 2, p. 1, Business Source Premier, EBSCOhost, viewed 5 November 2011.
- H.M. Deitel, P.J, Deitel and A.B. Goldber, 2004. “Internet & World Wide Web How to Program”. 3Rd Edition. Pearson Education Inc. Upper Saddle River, New Jersey.
- Minsoo, L, Yoon-kyung, L, Hyejung, Y, Soo-kyung, S, & Sujeong, C 2007, 'Issues and Architecture for Supporting Data Warehouse Queries in Web Portals', International Journal Of Computer Science & Engineering, 1, 2, pp. 133-138, Computers & Applied Sciences Complete, EBSCOhost, viewed 5 November 2011.
- Pérez, J, Berlanga, R, Aramburu, M, & Pedersen, T 2008, 'Integrating Data Warehouses with Web Data: A Survey', IEEE Transactions On Knowledge & Data Engineering, 20, 7, pp. 940-955, Business Source Premier, EBSCOhost, viewed 5 November 2011.
- Reddy, G, Srinivasu, R, Rao, M, & Rikkula, S 2010, 'DATA WAREHOUSING, DATA MINING, OLAP AND OLTP TECHNOLOGIES ARE ESSENTIAL ELEMENTS TO SUPPORT DECISION-MAKING PROCESS IN INDUSTRIES', International Journal On Computer Science & Engineering, 2, 9, pp. 2865-2873, Academic Search Complete, EBSCOhost, viewed 5 November 2011.
No comments:
Post a Comment