Abstract Many corporations are experiencing significant business benefits of using data warehouse technology. Users report gains in market competitiveness through increased revenue and reduced costs through information management. Data warehousing is thus a major issue within most organizations, and thus the development of a data warehouse with a strong base is essential. This paper aims to present the important concepts of Data Warehousing such as Data Warehousing tools and the benefits of Data Warehousing, that a manager must understand in order to execute a successful Data Warehousing project in his/her company.
Keywords: Data Warehouse Technology, Market Competitiveness, Data Warehousing tools, benefits of Data Warehousing, Data warehousing applications, Data warehousing platforms. Data warehousing Introduction It has become a strategic imperative for corporations to know more about its customers and prospects than ever before. Corporations are competing in a world that is moving faster, and in more directions, than at any time in history. The need for information is growing at an increasing rate thus, the more we know, the more we need to know.
Where once it was the job of the information technologist to study customer data; now-a-days even the president of the company may need to sieve through the databases of the corporation to retrieve clues for better marketplace performance. (Rolleigh and Thomas, 2002) The importance of data warehousing in the commercial segment arises from the need for enterprises to gather all of their information into a single place for in-depth analysis, and the desire to decouple such analysis from online transaction processing systems. Wisdom, 1995) But this isn’t an easy endeavor. Before people can get ready access to data, and be able to make meaningful analysis of it, a lot of behind-the-scenes groundwork must be done. It is insane to just go out and have your information technology organization buy a data warehouse. If any corporation does take such an action it will undoubtedly join the ranks of many big-name corporations that have made humongous investments that have failed to provide any return on investment (ROI).
If a company does indeed want to succeed with data warehousing, it has to build cross-organizational consensus and support for a way of business that is empowered by real customer data. And then the warehouse has to be tailored for the specific requirements of the company, this means a lot of careful, tedious and time-consuming steps are required. (Rolleigh and Thomas, 2002) Let’s look at what goes into creating a rich data warehouse, and what we need to know about it.
This paper will first introduce the concept of data warehouse in a simple straightforward manner followed by the major components of a data warehouse and the various structures of a data warehouse. The paper will then follow on by presenting the data warehousing methodologies. And finally it will discuss the advantages and disadvantages of data warehousing ending with the conclusion. What is a data warehouse? Data warehousing is a concept. It is a set of hardware and software components that can be used to better analyze the massive amounts of data that companies are accumulating to make better business decisions.
Bill inmon widely considered as the ‘father’ of data warehousing describes it as: “a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision -making process” This implies that within a warehouse, the data will be organized by entity (such as customer or product) rather than application (sales or purchase), that both current and historical, time stamped data will be present and that once stored in the warehouse it cannot be changed (davis et al, 1999) (Figure. ) However, a simple definition of data warehouse, as Ralph Kimball puts it, is “(a) data warehouse is a copy of transaction data specifically structured for querying and analysis. ” (Kimball, 1996) But this definition does not encompass the entirety of data warehousing. Sometimes non-transaction data are stored in a data warehouse – though probably 95-99% of the data usually are transaction data.
Additionally, “querying and reporting” rather than “query and analysis” is a key when talking about the functionality of data warehousing because the main output from data warehouse systems are either tabular listings (queries) with minimal formatting or highly formatted “formal” reports. Queries and reports generated from data stored in a data warehouse may or may not be used for analysis. (Greenfield) Nevertheless, data warehousing doesn’t just make data available.
Data warehousing is the process of making your operational data available to your business managers and decision support applications. Proper warehousing focuses on efficient information access. Of course, this efficiency doesn’t happen magically. Corporations must first identify what it is that they require from the data and the decision support applications, and then they must evaluate the current operational data to determine how to transform that data into what adds value to the output provided by the corporations.
The tools that you choose for your warehousing solution will take data from your operational systems (extract it), convert your operational data into business information using your defined business rules (transform it), and create a data warehouse (load it). (www. sas. com/rnd/warehousing) Finally, we can explain data warehouse as it being analogous to a physical warehouse. Operational systems create data “parts” that are loaded into the warehouse. Some of those parts are summarized into information “components” and stored in the warehouse.
Data Warehouse users make requests and are delivered information “products” that are created from the components and parts stored in the warehouse. A Data Warehouse is typically a blending of technologies, including relational and multidimensional databases, client/server architecture, extraction/transformation programs, graphical user interfaces, and more. Data warehousing is one of the hottest industry trends – for good reason. A well-defined and properly implemented data warehouse can be a valuable competitive tool. (Perkins). The following describes the components of a data warehouse (Figure. ) Summarized data: There are two kinds of summarized data, lightly summarized data and highly summarized data. Lightly summarized data are the hallmark of a Data Warehouse. All departments in a corporation do not have the same information requirements, so effective Data Warehouse design provides for customized, lightly summarized data for every department. Highly summarized data are primarily for the executives. Highly summarized data can come from either the lightly summarized data used by enterprise elements or from current detail.
If executives require more detailed information they have the capability of accessing increasing levels of detail through a “drill down” process. Current Detail: The heart of a Data Warehouse is its current detail, where the bulk of data resides. Current detail comes directly from operational systems and may be stored as raw data or as aggregations of raw data. Every data entity in current detail is a snapshot, at a moment in time, representing the instance when the data are accurate. Current detail is typically two to five years old. Current detail refreshment occurs as frequently as necessary to support enterprise requirements.
System of Record: A system of record is the source of the data that feed the data warehouse. Data in a data warehouse differ from operational systems data in that they can only be read, not modified. Thus, it is necessary that a data warehouse be populated with the highest quality data available, i. e. , data that are most timely, complete, accurate, and have the best structural conformance to the data warehouse. Integration and Transformation Programs: Even the highest quality operational data cannot usually be copied, as is, into a data warehouse.
As operational data items pass from their systems of record to a data warehouse, integration and transformation programs convert them from application-specific data into enterprise data. These integration and transformation programs perform functions such as: Reformatting, recalculating, or modifying key structures; Adding time elements; Identifying default values; Supplying logic to choose between multiple data sources; Summarizing, tallying, and merging data from multiple sources. When either operational or Data Warehouse environments change, integration and transformation programs are modified to reflect that change.
Archives: Data Warehouse archives contain old data (normally over two years old) of significant, continuing interest and value to the enterprise. There is usually a massive amount of data stored in the Data Warehouse archives, with a low incidence of access. Archive data are most often used for forecasting and trend analysis. Archives include not only old data (in raw or summarized form); they also include the metadata that describes the old data’s characteristics. Metadata: One of the most important parts of a Data Warehouse is its metadata – or data about data.
Also called Data Warehouse architecture, metadata is integral to all levels of the Data Warehouse, but exists and functions in a different dimension from other warehouse data. Metadata that is used by Data Warehouse developers to manage and control Data Warehouse creation and maintenance resides outside the Data Warehouse. (Perkins) Along with the various components of a data warehouse there are various structures of a data warehouse too. Structures of a Data Warehouse There are various structures of a data warehouse that a corporation can adopt based on its needs.
The physical data warehouse, the logical data warehouse and the data mart. (Figure. 3) Physical Data Warehouse: It is a physical database in which all the data for the data warehouse are stored, along with metadata and processing logic for scrubbing, organizing, packaging and processing the detail data. Logical Data Warehouse: Like physical data warehouse, it also contains metadata, including enterprise rules and processing logic for scrubbing, organizing, packaging and processing the data, but does not contain actual data. Instead, it contains the information necessary to access the data wherever they reside.
This structure is effective only when there is a single source for the data and they are known to be accurate and timely. Data Mart: It is a subset of an enterprise-wide data warehouse, which typically supports an enterprise element (department, region, function, etc. ). As part of an iterative data warehouse development process, an enterprise builds a series of physical (or logical) data marts over time and links them via an enterprise-wide logical data warehouse or feeds them from a single physical warehouse. (Perkins) Data Warehousing Methods
Several warehousing methodologies are used throughout the warehousing community. All of these fall into one of two categories: the big bang approach or the iterative approach. Big Bang Approach: A big bang methodology tries to solve all known problems by creating a huge data warehouse before you release it for evaluation and testing. Many people believe that this process is necessary to deliver on your objectives. Based on your objectives, amount of data to be incorporated, and your intimate knowledge of your business and data, you may be able to accomplish your warehousing project with a big bang methodology.
But, there are some considerations to take heed of: To create a data warehouse, the corporation must plan its warehouse, evaluate and install the necessary software and hardware, collect business requirements, and become familiar with its corporate data. While these tasks are taking place: The business goals of the corporation can change due to changes in the market or technology. Management supporters can lose interest in this project if you don’t keep them involved and show rapid results. The corporate data could change (they may start collecting Web log data).
New releases of the firm’s chosen software may become available (warehousing is still an evolving market and even the best tools continue to improve and change). The items listed above are just a few of the business and technical changes that could impact your plans. If you are not plugged into the proper channels, any one of these changes could cause your project to fail because you cannot quickly respond the necessary changes. Iterative Approach: With an iterative methodology, the corporation breaks its warehousing project into small, manageable chunks, referred to as projects.
In the iterative approach, the same planning tasks are performed that are required in the big bang approach, but evaluation of all of your deliverables up front is not required. The corporation must design its overall architecture, but when entering the planning phase, it needs to concentrate only on its first project or iteration. After each project, review of the architecture, its development process, and the corporation’s business requirements is done. The value of smaller projects within the larger warehousing process is: It shows a faster return on the company investment because it delivers one solution quickly.
This keeps the management supporters involved and interested in the project. It can adjust to changes in the business requirements faster because the team is small. Manageable projects that have short delivery schedules. Early involvement by the corporation’s user community provides real-situation testing, which provides it with user needs and defect reports. Additionally, users are provided with better feedback when they can see the system than when they have to envision it from a slide presentation.
This feedback can improve the corporation’s goals and processes for the next iteration. (www. sas. com/rnd/warehousing) Although the business may ask for everything to be delivered by the warehouse at once, taking a “big bang” approach may not be a prudent step. Instead, breaking the project into parts or releases, and establishing a clear set of objectives and definition of success for each release of the project will be appropriate. Each release will need to go through the complete system development lifecycle of requirements, design, build, test, implement, and support.
By delivering functionality and business value with each short release, a corporation can integrate its data, create the proper roles, train its users, gain insight and assimilate lessons learned. At the same time, by using an iterative approach a corporation can adjust its data warehouse’s content to add new sources, which were not previously considered, as it gains insight into its business and analytics. Perhaps most important, an iterative approach enables the project team to demonstrate a few quick wins in terms of business value delivered.
Due to its iterative nature, a data warehouse is a journey not a destination. There is always more data which can be integrated. (Karakizis) As it is exhibited in Figure 4, the iteration methodology thus starts with the Initial Organization phase, identifying the corporation’s readiness for undertaking a data warehousing project. Followed by the Analysis phase; evaluation of the feasibility of the data warehouse, gathering business requirements, and getting agreement on the goals and purpose of the warehouse. Third is the Design phase; analyzing and designing the warehouse system architecture.
Followed by the Development phase; creation of the actual data warehouse structure and population of the data warehouse. Fifth is the Testing phase; data cleansing and parameter clarification (may send back to the design phase for another iteration). Followed by, the Implementation phase; the rolling out the production environment and providing user training. And finally the Maintenance phase; monitoring, updating and cleansing data. (Wierschem et al) (Figure. 4) Initial Organization Analysis Design Testing Implementation Maintenance Development
At this point, provisions for an easy path for user feedback should be established and review of the five steps must be done and necessary adjustments should be made. After the review of the process, the next iteration or project should be started from the assessment phase. The assessment and requirements phases should require less process time after successive iterations. Get the right tools to build a data warehouse. As in any endeavor selecting the correct tools is paramount for success. As the old Chinese adage says,” To accomplish a goal, make sure the proper tools are selected. Given the complexity of the data warehousing system and the cross-departmental implications of the project, it is easy to see why the proper selection of tools and personnel is very important. This section of the paper will present information on such selections. There are two steps that top management is concerned with when building a data warehouse. One step is choosing a vendor. However, in doing so there are certain basic but nonetheless critical issues that have to be evaluated. Support: What type of support is offered?
It is industry standard for vendors to charge an annual support fee that is 15-20% of the software product license. But the question is, will any software issues be handled promptly by the vendor or not? Professional Services: This encompasses consulting and education. What type of consulting proposal does the vendor give? Are the personnel requirements and consulting rates reasonable? It might be wise to speak with members of the consulting team before signing on the dotted line. On the education front, what type of training is available? And how much is the consulting team willing to do knowledge transfer?
Does the consulting team purposely hold off information so that either 1) you will need to send more people to vendor’s education classes, or 2) You will need to hire additional consulting to make any changes to the system. Stability: More than anything else, this is probably the most important measure. It may even be more important than the current functionalities that the tool itself provides, for the simple reason that questions whether the vendor is going to be around for a while or not, or will it be able to make enhancements to its tool?
The other step is selecting the right team to build the data warehouse. In this there are two possibilities; one is to use external consultants and the other, to hire permanent employees. The pros of hiring external consultants are: They are usually more experienced in data warehousing implementations. The fact of the matter is, even today, people with extensive data warehousing backgrounds are difficult to find. With that, when there is a need to ramp up a team quickly, the easiest route to go is to hire external consultants. The pros of hiring permanent employees are: They are less expensive.
With hourly rates for experienced data warehousing professionals running from $100/hr and up, and even more for Big-5 or vendor consultants, hiring permanent employees is a much more economical option. They are less likely to leave. With consultants, whether they are on contract, via a Big-5 firm, or one of the tool vendor firms, they are likely to leave at a moment’s notice. This makes knowledge transfer very important. Of course, the flip side is that these consultants are much easier to get rid of, too. However, management must understand that there are various entities that play important roles in a data warehouse project.
They are; Project Manager: This person will oversee the progress and be responsible for the success of the data warehousing project. DBA: This role is responsible to keep the database running smoothly. Additional tasks for this role may be to plan and execute a backup/recovery plan, as well as performance tuning. Technical Architect: This role is responsible for developing and implementing the overall technical architecture of the data warehouse, from the backend hardware/software to the client desktop configurations.
ETL Developer: This role is responsible for planning, developing, and deploying the extraction, transformation, and loading routine for the data warehouse. Front End Developer: This person is responsible for developing the front-end, whether it be client-server or over the web. OLAP (On-Line Analytical Processing) Developer: OLAP is the foundation for a range of essential business applications, including sales and marketing analysis, planning, budgeting, statutory consolidation, profitability analysis, balanced scorecard, performance measurement and data warehouse reporting. Pendse and Creeth) The role of the OLAP Developer is thus very crucial. She/he is responsible for the development of OLAP cubes. Trainer: A significant role is the trainer. After the data warehouse is implemented, a person on the data warehouse team needs to work with the end users to get them familiar with how the front end is set up so that the end users can get the most benefit out of the data warehouse system. Data Modeler: This role is responsible for taking the data structure that exists in the enterprise and model it into a schema that is suitable for OLAP analysis. Chuo-Han Lee, 2002) Advantages and Disadvantages of data warehousing. Data warehousing has been increasingly popular in many organizations around the world. It is not with blind belief that corporations are investing millions of dollars in data warehousing projects. A successful data warehouse project will provide numerous lasting benefits to a company. Data warehousing makes retrieving information so easy that when a user query is submitted to the warehouse, the needed information is already there, with inconsistencies and differences already resolved.
This makes it much easier and more efficient to run queries over data that originally came from different sources. (WHIPS) Along with its numerous ease of use benefits data warehousing provides other qualitative advantages too. It enables improved knowledge of relationships among products and services and their performances, ability to make quick and proper analysis that pave the way for better decision making can be gained from a successful data warehousing project and thus, give the company a strong competitive advantage over the competition. Smith) The benefits of the development of a data warehouse would include: More accurate predictions of customer demand based on the use of trends analysis. The response improvement in direct marketing campaigns through the use of household demographics and current customer analysis. The improvement in vendor relations and price reductions by targeting selected vendors with increased level of purchasing over the enterprise. The significant savings from improved data quality across the enterprise (Smith).
The ability to run complex queries easily and efficiently since query execution does not involve data translation and communication with remote sources. Convenience for end users since they can use a single data model and query language. Simplicity of the system design. For example, there is no need to perform query optimization over heterogeneous sources, a very difficult problem faced by other approaches. Information at the warehouse is under the control of the warehouse users; thus it can be stored safely and reliably for as long as necessary. WHIPS) From the management’s point of view the benefits and rewards are abounding for a company that builds and maintains a data warehouse correctly. The corporation will make dramatic cost savings and its revenues will soar. Furthermore, there will be increase in analysis of marketing databases to cross-sell products, less storage on the mainframe and the ability to identify and keep the most profitable customers while getting a better picture of who they are, and it’s easy to see why data warehousing is spreading faster.
For example, the telecom industry uses data warehouses to target customers who may want certain phone services rather than doing “blanket” phone and mail campaigns and aggravating customers with unsolicited calls during dinner. Some of the soft benefits of data warehousing come in the technology’s effect on users. When built and used correctly, a warehouse changes users’ jobs, granting them faster access to more accurate data and allowing them to give better customer service. A company must not forget, however, that the goal for any data warehousing project is to lower operating costs and generate revenue this is an nvestment, after all, and quantifiable ROI should be expected over time. (Wail gum, 2001) Even though the benefits of data warehousing by far outweigh the disadvantages there are certain disadvantages of data warehousing that companies must pay heed to. The most important disadvantages are: Expensive initial data warehouse set up. However, after the system is in place the cost should be low and cover only the maintenance and future modifications of the system. Also, there is high cost in getting data translated and copied to existing databases in time for being useful for the end user.
A data warehouse takes time to build and time should be given to the project and the difficulties in getting a data warehouse up and running and developed should not be underestimated. The re-education of the programmers often proves to be a disadvantage as change is often resisted until familiarity is gained with the new approach. (OCS Consulting) Cost and time is also borne to develop the required new skill-set for warehouse developers and end users. A data warehouse is complex to develop it cannot just be bought as an off-the-shelf product and is designed specific for an organization needs.
Choice of hardware, software and structure requires careful consideration and how they will progressively work together in the future. The data warehouse will require management, although in comparison to the management of the current environment it will mean that overall less time is actually required in the Data Warehousing approach. Data held in one place highlights data integrity problems and vulnerability from the public domain thus advanced security to prevent unwanted users, including competitors from accessing the data base will be of critical importance for the company. Davis et al, 1999) Some of the applications Data warehousing can be used for are Decision support, Trend analysis, Financial forecasting, Churn Prediction for Telecom subscribers, Credit Card users etc. And also in Insurance fraud analysis, Call record analysis, Logistics and Inventory management, Agriculture. Some Data Warehousing Platforms Teradata: Even though all the vendors included in the comparison offer efficient and trustworthy solutions, each one differs a bit from the rest. Teradata became – maybe a bit surprisingly – a leader of ranking, but no one can say it’s undeserved.
The first and also the most significant feature of Teradata solutions is their scalability. Undoubtedly, sales strategy was a key to the success. Teradata corporation attached weight to their products availability – price-competition brought on wider range of interest from customers representing different market segments. As a consequence, need of fulfilling extremely different expectations resulted in improved universality of the solutions. Teradata’s portfolio also focus attention not only for 550, 2550, and 550 enterprise data warehousing solutions, but also standalone licenses, databases, tools, and utilities.
In enterprise data warehousing area, as a main target for Teradata, a few important features let Teradata distinguish itself, overshadowing the rivals: While comparing to other vendors, appears the real strength of options offered by Teradata. EDW packaging, pricing, and licensing options make it one of the most adjustable and – therefore – suitable solution. Teradata designers understand a need of wide cooperation. Solutions are well-prepared for supporting and managing work of external application and middleware. Wide range of products and services made Teradata the best choice for customers of different size and requirements.
All of them have a possibility of adjusting solution to make it suit them best. Optimized especially for analysis, database management system attests to the strength of Teradata portfolio. Almost legendary scalability allows Teradata customers to scale data warehouses through massively parallel processors (MPP) to a few petabytes across over thousand nodes, and, finally, store them in different enterprise data warehouses or business intelligence topologies. Teradata enterprise data warehouses let customers manage diverse workloads – reporting, query, OLAP, inner analytics, and prime extract, transform, and load.
Finally, worth mentioning is Teradata’s functionality in areas of in-database analysis, workload managing, query optimizing, indexing, partitioning, compressing, and caching. In spite of all the praises Teradata truly deserves, its autonomy seems to have gone a bit over the top. Solutions by Teradata demands exclusively determined hardware and software – Teradata hardware units, Intel Xeon processors, and Bynet interconnect, while refusing working on other platforms. Furthermore, Teradata doesn’t provide SaaS on-demand EDW. First mentioned restriction seems particularly painful as it quite significantly limits solutions’ availability.
To sum up, Teradata’s market position is already established and company’s recent strategies apply especially to maximizing each solution best features. Incessantly proceeded scalability and modularity, along with cost-competition, might result only in even better sale and popularity. Oracle: Oracle Database 11g and Oracle Warehouse Builder (OWB) are the tools that have let Oracle reach its great position in the comparison, along with significant market share. Efficient partnership hosted by Oracle is also worth noticing, the same as solutions’ high performance and affordability.
Oracle solutions – unlike Teradata – might be equally well deployed on different hardware and software platforms, they’re not tied to standard configuration. Furthermore, there aren’t any complications while integrating solutions with Siebel, Hyperion or Fusion Middleware. In-database analytics by Oracle are especially worth noticing, providing well-worked out query optimizing, partitioning, compressing, and caching. Oracle solutions meet the expectations of customers representing different sizes companies and diversified requirements. Oracle gives its customers a possibility of choosing different Optimized Warehouse EDW appliances.
Although Oracle’s solutions have plenty of disadvantages, their weak points are noticeable, as well. First of all, there’s no possibility of deploying as a single-tier grid of notes. Furthermore, scaling out beyond 1 petabyte or to a grid of hundreds of nodes leave much to be desired. Finally, there’s also a matter of isolation – Oracle Enterprise Data Warehousing solutions are provided separately from the whole products’ family – BI, OLAP, MDM, and data integration software. Are these disadvantages significant enough to cause resignation from Oracle solutions? Not, especially because of the company’s plans.
Extending partnerships, facilitating maintenance and multiplying features are only a part of Oracle’s strategy. It states clearly that all the solutions’ imperfections might be quickly fixed. All in all, Oracle provides solutions really worth trusting and future prognosis announce only a growth of its market position. IBM: IBM is widely known for its customer care, providing solutions adjusted for clients of diverse sizes and requirements, therefore no one should be surprised with IBM’s high position in the ranking, as long as improving IOD (Information On Demand) portfolio remains its main goal.
Thereupon, all the new solutions are designed to even supplement, better, and broaden the portfolio, offering the wide range of services. These tendencies ensured IBM a place among market leaders. In a word, the aspect distinguishing IBM from other vendors is functionality of its solutions. All of them are prepared to suit even the most exacting customers. What features are especially worth mentioning? Among diverse IBM solutions, there is DB2, efficient enterprise database. IBM might be praised for its universality.
Information Server solutions meet extremely different expectations, providing services able to satisfy even the utmost requirements regardless of customers’ size or trade. EDW appliances by IBM might be scaled out in hundreds of terabytes and diverse topologies (EDW and BI). Full integration with Optim, Rational, FileNet, WebSphere, Cognos, and InfoSphere. Like Oracle’s, solutions by IBM might be implemented on different software and hardware platforms, thereupon they’re not tied to standard platforms, and – as a consequence – available for more customers.
IBM appliances are always well-prepared for supporting and managing mixed workloads – OLAP, ETL, in-database analytics, ad hoc query, and reporting. Furthermore, IBM ensures efficient database security systems, information governance tools, and life-cycle management. Even though the prognosis seem “IBM-friendly”, its solutions still might be improved, as long as a few bugs definitely should be fixed. InfoSphere Balanced Warehouse lacks for petabyte-scalability. That restrains IBM’s usage in enterprises and service providers (not fully satisfying very-high-end demands).
Microsoft: Among vendors making most significant headways, Microsoft is definitely on the first place. Important acquisitions made by Microsoft – especially DATAllegro – along with aggressive scalability push, let Redmond Giant become one of the most solid EDW vendors in the comparison. SQL Server 2008 is a platform definitely worth noticing, featuring: Microsoft prepared very well scaling into range of tens of terabytes. SQL Server, main enterprise database, is efficiently adjustable, so that might be used by customers representing companies of diverse size and trade.
Worth noticing is also Microsotf’s solutions’ flexibility – SQL Server might be easily deployed in different EDW and BI technologies. Furthermore, database management systems might be integrated with specified SOA architectures, platforms, middleware, BI, and other solutions. Microsoft SQL Server is also very well-prepared for supporting diversified OLAP, BI, query, and advanced analytics workloads. Finally, workload management functionality, cost-based query optimization, indexing, partitioning, compression, and caching are one of the best-of-breed.
Unfortunately, Microsoft’s solutions still have a lot to be improved – firstly, scaling. There’s no possibility of petabyte-scale massively parallel processing. Secondly, information life-cycle management is weak. Then, information governance, data quality, federation, and hierarchy managing tools seem a bit half-baked. Finally, Microsoft solutions for EDW work only on Windows platform, restraining its usage’s width. Also a few other lacks might be mentioned – not providing own EDW appliances or SaaS-based EDW services – but Microsoft declares to fix that quickly.
On the other hand, so far all the lacks are replenished by efficient partnerships – with Dell, HP, and SaaS. All in all, if believe Microsoft’s announcements, “Redmond Giant” is working hard on its own solutions, so the customers might expect strongly improved solution in a nearest future. SAP: What features might settle the choice on SAP? Both NetWeaver Business Intelligence 7. 1 and BI Accelerator make SAP provide extremely efficient EDW appliances. Impressive ability to persist data to a choice of database management systems (IBM DB2, Oracle Database, Microsoft SQL Server, and MaxDB).
Triple availability of SAP’s EDW offerings – as appliance (BIA), software (NetWeaver BI/BW), and SaaS components (Business ByDesign). Row-based storage supporting capabilities. Plus columnar and cache persistence through BIA. Fully integration with service oriented architecture capabilities, application platforms, BI, middleware, desktop software, and PM. Broad range of EDW services by SAP itself or efficient partnerships. Even though, SAP doesn’t seem to do its best for collecting new customers, it’s rather concentrated on already-existing users.
Furthermore, there are also a few significant lacks – no cost-based query optimizing capabilities, query predicate pushdown to storage layers, and compression by data types. Finally, the general universality of SAP’s enterprise data warehousing appliances is somehow weak – low-cost solutions for the midmarket have been totally omitted. Beside those few weak points, SAP is thought to get better and better as every new solutions’ edition gets released (and of course after acquisition of Business Objects). In a word, SAP is also a good choice, but – surely – might be significantly better, what’s only a matter of time.
Sybase: Among other vendors offerings, Sybase’s portfolio seems to be slightly different as it contains especially low-cost solutions, while the midmarket and budget-constrained enterprises are thought to be Sybase’s main target. What features argue for investing in Sybase’s solutions? EDW offerings might be fully integrated with IQ – columnar database by Sybase. Full support for integration and modeling tools. EDW solutions by Sybase are fully compatible with diverse platforms of software and hardware. They’re not tied to the default ones.
Extensibility options multiply EDW’s universality, broadening its usage capabilities. Weight’s been attached also to increasing flexibility of workload managing, cost-based query optimizing, indexing, partitioning, and caching. Sybase might be praised for its scalability options. Lastly, Sybase’s partnerships net looks pretty impressing, that guarantee efficient support in every situation. As a typically niche vendor, Sybase’s solutions have a few weak points disabling its wider usage. There’s no support for shared-nothing MPP, specialized in-database analytics beyond the pale of pre- or user-defined functions.
Furthermore, EDW offerings by Sybase might be integrated only with MDM, OLAP, BI, predictive analytics, and data mining solutions through the partnerships among vendors of the same market segment. Finally, Sybase has omitted providing own life-cycle management systems. To sum up, the solutions by Sybase in the very beginning distinguish themselves because of the price. Low-cost EDW offerings might have a really successful future if only Sybase do its best to extend capabilities and broaden functionality range. Netezza: The vendors listed above are the unquestionable leaders. Their solutions are proven, trustworthy, and services complex.
Then must have resulted in great popularity. Netezza, the representative of the “strong performers” group in this comparison doesn’t have to stand down in favor of market leaders that’s even more admirable because of its quick advance from total beginners to established solution provider. Netezza has many times proven functionality and capability of its solutions, while deploying mission-critical environments. What features make Netezza a good choice? First thing, that should be mentioned, is incessant growth of Netezza’s portfolio. Successive solutions get better and better, therefore all the expectations might be quickly satisfied.
The features distinguishing Netezza from the other vendors are hybrid shared-nothing massively parallel processing and symmetric multiprocessing approaches (storage and host tier). Worth noticing is also a unique model of physical data storage implementing. Netezza might be praised for its in-database analytic framework, offering wide range of powerful capabilities. BI, OLAP, query, and advanced analytic workloads seem to be very well-worked out. Cost-based query optimizing is possible and significantly flexible. Finally, workload management capability is also worth mentioning.
Those were the advantages. On the other hand, solutions offered by Netezza might be better, but their advance is restrained by company’s strategy itself. Generally, Netezza’s weak points might be divided into two categories – go-to-market, and technical standpoints. From the first point of view, Netezza doesn’t really seem to make efforts for advance into leader vendors group. Its appliances for the midmarket are still quite expensive and nothing announces its quick changes. From the second, technical, point of view, Netezza’s solutions disclose their freshness and inexperience.
They demand predefined hardware and software platforms, refusing to work with different configurations. Although almost all the lacks are replenished by external partnerships, that’s not the model appropriate for leaders group. Finally, high-speed interconnects, and MOLAP supports don’t exist, the same as database encryption functionality. All in all, Netezza is already offering a few interesting solutions, and successive ones might be expected to quickly appear. Even though there’s still a lot to do, Netezza guarantees great perspectives; therefore it’s a really worth-considering vendor. Source: bidw directory. Business Intelligence Data Warehousing, Data Warehouse Software) Conclusion Thus, we see that data warehousing does not have to be an enigma to the managers of companies. Even though it is an exhausting task a successful data warehousing project is crucial for the companies to run a successful business. Management must believe and understand that Data warehousing is of strategic importance to a company. Therefore, even if management has to invest a massive amount of capital to build a data warehouse it must do in hindsight of the myriad benefits that will crop up.
Any company that doesn’t see the importance and benefits of data warehousing and is blinded by the cost and daunted by the size of the task will feel the devastating impact of the competing business that have undertaken successful data. References Anna Marie Smith, Data Warehousing: A Short Overview. Alan Perkins, A Strategic Approach to Data Warehouse Engineering. White Papers, copyright, 1997-1998, Visible Systems Corporation. Bidw directory. Business Intelligence Data Warehousing, Data Warehouse Software. © 2008- 2010 GoliInfo All Rights Reserved, http://www. bi-dw. info/data-warehouse-software. tm Chuo-Han Lee (2002), Data Warehouse and Data Warehousing. Copyright © 2001, 2002 Davenport, Thomas H. and Harris, Jeanne G. Competing on Analytics: The New Science of Winning (2007) Harvard Business School Press. ISBN 978-1-4221-0332-6. David Wierschem, Ph. D. Jeremy McMillen, M. S. Randy McBroom, Ph. D. , Methodology for Developing an Academic Data Warehouse, Office of Institutional Resource and Planning. Texas A&M University-Commerce. Ganczarski, Joe. Data Warehouse Implementations: Critical Implementation Factors Study (2009) VDM Verlag ISBN 3-639-18589-7, ISBN 978-3-639-18589-8.
Jennifer Widom (1995), Research Problems in Data Warehousing, Proceedings of the 4th Int’l Conference on Information and Knowledge Management (CIKM), Nov. Kimball, Ralph and Ross, Margy. The Data Warehouse Toolkit Second Edition (2002) John Wiley and Sons, Inc. ISBN 0-471-20024-7 Larry Greenfield, A Definition of Data Warehousing. Copyright 1995-2002. Linstedt, Graziano, Hultgren. The Business of Data Vault Modeling Second Edition (2010) Dan linstedt, ISBN 978-1-4357-1914-9 Louis Rolleigh and Joe Thomas, Data Integration: The Warehouse Foundation. White Papers, copyright 2002 Acxiom Corporation.
M. Davis, Z. Galzie, M. Silcox (1999), Data Warehousing. Nigel Pendse and Richard Creeth, The OLAP Report. OCS Consulting, Data Warehousing Presentation To VIEWS Pharmaceutical SIG. Ralph Kimball (1996), The Data Warehouse Tool Kit, Wiley. Spyro D. Karakizis, How to Win With Your Data Warehouse: Advice from A Data Warehouse Expert, ©1996-2002 Accenture. The Data Warehousing Concept. Available at: http://www. sas. com/rnd/warehousing/concept. html Tom Wail gum (2001), What is a Data Warehouse? WHIPS (Warehouse Information Prototype at Stanford), Data Warehousing at Stanford.