Beginning Database Design- P15

Chia sẻ: Cong Thanh | Ngày: | Loại File: PDF | Số trang:20

lượt xem

Beginning Database Design- P15

Mô tả tài liệu
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Beginning Database Design- P15:This book focuses on the relational database model from a beginning perspective. The title is, therefore, Beginning Database Design. A database is a repository for data. In other words, you can store lots of information in a database. A relational database is a special type of database using structures called tables. Tables are linked together using what are called relationships. You can build tables with relationships between those tables, not only to organize your data, but also to allow later retrieval of information from the database....

Chủ đề:

Nội dung Text: Beginning Database Design- P15

  1. Planning and Preparation Through Analysis The ADVERTISEMENT table is decided on being the fact table. The MUSICIAN table is a single layer fact table. The BAND, MERCHANDISE, SHOWS, and DISCOGRAPHY tables constitute a two-layer dimensional hierarchy. There are some gray areas between static and dynamic information in this situation. There are other ways of building this data warehouse database model. Project Management When it comes to planning and establishing timelines, there are some remarkably good project-planning software tools available. When planning any software project, including a database model design, there is often more than one person involved in a project. This excludes the planner. If not using a software tool, and multiple people are to perform multiple overlapping, interdependent tasks, you could end up with lots of drawings, and an awfully full garbage can. These things change. Plans can change. People get sick and go on vacation. They find new jobs. Sometimes people simply don’t show up at all! In other words, expect things to change. The fact is when it comes to planning, establishing timelines, and budgeting there is no enforceable or reasonably useful applicable set of rules and steps. It just doesn’t exist. In any planning process (including budgeting), for any type of project, there is no way to accurately predict to any degree of mathematical certainty. None of us can really read the future. Planning, establishing timelines, and budgeting entail a flexible process, which is almost entirely dependant on the expertise, and past experience, of the planner. And then again, if changes and unexpected snags or events are not anticipated, a project can very quickly spiral out of control. Project Planning and Timelines A software development company can meet with its financial demise by not allowing enough time in project plans. It is quite common in the software development field for project-tender budgets from multiple companies to be as much as 50 times different. In other words, the difference in dollar estimates between the lowest and highest bidders can be enormous. This indicates guesswork. It is often the case that the most inexperienced companies put in the lowest bids. The highest bids are probably greedy. Those between the average and the highest are probably the most realistic. Also, they are likely to make a profit, and thus be around to support very expensive software tools, say, 5 to 10 years from now. As already stated, there is much research into project planning in general. Unfortunately, few, if any quantifiable and useful results have been obtained. Expert level assessment based on past experience of experts, is usually the best measure of possibilities. There is an International Standards Institute (ISO) model called the “ISO 9000-3 Model.” This model is used more to give a method of quality assurance against the final product, of, say, an analysis of a database model. This ISO model does not give a set of instructions as to how to go about performing analysis process itself, but rather presents a method of validation after the fact. 253
  2. Chapter 9 The accuracy of a planned budget is dependent on the experience of the planner. Database designers, administrators, and programmers think that their project managers and planners do nothing. They are wrong. The planners take all the risk, make all the wild guesses; the programmers get to write all that mathematically precise programming code. So, the next time you see your project manager in a cold sweat, you know why. It’s not your job on the line — it’s theirs. Here are some interesting — and sometimes amusing — terms, often used to describe planning, budgeting, and project management: ❑ “Why did the chicken cross the road?” — This is a well-known quotation about a now unfortu- nately discontinued software development consultancy company. This company was famous for producing enormous amounts of paper. Lots of paper can help to make more profit for the software development company. The more paper produced, the more useless waffle. However, the more paper produced, the less likely something has been overlooked, and the more likely the plan and budget are accurate. The other problem with lots of paper is no one has time to read it, and doesn’t really want to either. ❑ “Get someone with a great, big, long signature and a serious title to sign off on it ASAP.” — This one is also called “passing the buck.” That buck has to stop somewhere, and the higher up, the better, and preferably not with the planner. The planner has enough things to think about without worrying about whether their wildest guesses will bear fruit, or simply crash and burn. ❑ “Don’t break it if it’s already fixed.” — If something is working properly, why change it? ❑ “Use information in existing systems, be they computerized or on paper.” — Existing structures can usually tell more about a company that its people can. There is, however, a danger that if a sys- tem is being replaced, there is probably something very wrong with the older system. ❑ “Try not to reinvent the wheel.” — When planning on building software or a new database model, use other people’s ideas if they are appropriate. Of course, beware of outdated ideas. Do thor- ough research. The Internet is an excellent source of freely available ideas, both old and new. ❑ “More resources do not mean faster work.” — The more people involved in a project, the more con- fusion. Throwing more bodies at a project can just as likely make a project impossible to man- age, as it can make it go faster. Figure 9-30 shows a pretty picture of what a project timeline might look like, containing multiple people, multiple skills levels, multiple tasks (both overlapping and interdependent). Project timelines can become incredibly complicated. Simplify, if possible. Too much interdependency can lead to problems if one area overruns on time limitations. Keep something spare in terms of people, available hours, and otherwise. Expect changes. Plan with some resources in reserve. Figure 9-30 shows five separate tasks in a simplistic project-timeline Gantt chart. Task 1 is assigned to Jim. There is no time conflict between Task 2 and Task 3, and can both be assigned to Joe (one person). Task 3 and Task 4 overlap between both Janet and Joe, in more ways than one. Joe is going to be very busy indeed. A project manager would allow overlap where the person doing the assigned tasks is known to be capable of multiple concurrent activities. 254
  3. Planning and Preparation Through Analysis Gantt charts were invented as a graphical tool for project management. Gantt charts allow for a pictorial illustration of a schedule, thus helping with the planning, coordination between, and tracking of multiple tasks. Tasks can be independent or interdependent. Many off-the-shelf software tools allow computerized project management with Gantt charts (and otherwise), such as Microsoft Project, Excel spreadsheet plug-ins, and even Visio. 5/23/2005 5/30/2005 6/6/2005 Task Name M T W Th F M T W Th F M T W Th F Task number 1 Jim Task number 2 Joe Task number 4 Joe Task number 3 Janet Task number 5 Janet and Joe Figure 9-30: An example project timeline Gantt chart. Budgeting When it comes to budgeting for a project, budgeting is part of planning, and is thus open to expert interpretation. Once again, there is research on how to apply formal methods to budgeting. Much of that research is unsuccessful. In the commercial world, most observations offer a lot of guesswork based on experience. Also, there is an intense reluctance on the part of the experts to justify and quantify the how and why of the results they come up with. Budgeting is, as already stated, just like planning. It is educated guesswork. There is nothing to really present in a book such as this one, telling someone like you, the reader, how to go about budgeting a project, such as a database modeling project. One big problem with software development (including database model development and design) is its intangibility. Software projects are difficult to quantify, partially because they are unpredictable, they can change drastically during development. The sheer complexity of software is another factor. Complexity is not only within each separate unit and step. One step could be the database model analysis and design. Another step could be writing of front-end application code. There is a complexity issue simply because of the need to account for all possibilities. There is also complexity because of the heavy interdependence between all parts of software development. This interdependence stems from the highest level (comparing, for example, database model and application coding) even down to the smallest piece (comparing two different fields in a single table, in a database model). In short, database model analysis and design is extremely difficult to budget for. The most common and probably successful practice is to guess at an estimated cost based on estimated time to be spent (work hours). Then take the result and add a percentage, even as much as 50 percent, and sometimes multiply- ing the first conservative estimate by much more. 255
  4. Chapter 9 A number of detail areas already have been discussed previously in this chapter: ❑ Hiring help costs money — Hired help in the form of software development expertise can be astro- nomically expensive. ❑ Hardware costs — Hardware can be cheap to incredibly expensive. Generally, more expensive hardware leads to the following scenarios: ❑ Expensive hardware can help to alleviate software costs, but usually in the form of hiding performance problems. Sloppy database design and software development can be overcome by upgrading hardware. If growth is expected, and the hardware selected is ever outgrown, costly problems could be the result. The temptation when using expensive hardware is to not worry too much about building database and software applications properly. Fast-performing hardware lets you get away with a lot of things. ❑ Expensive hardware is complicated. Hiring people to set up and maintain that complexity can be extremely expensive. Simplistic and cheap hardware requires less training and lower skills levels. Less-skilled labor is much cheaper. ❑ Maintenance — Maintenance is all about complexity and quality. The more complex something is, the more difficult it is to maintain. Somewhat unrelated, but just as important, poor quality results in a lot more maintenance. Indirectly, maintenance is related to the life of a database model, and ultimately applications. How long will they last, how long will something be useful, and help to turn a profit? There is a point where constant maintenance is outweighed by doing a complete rewrite. In other words, sometimes rebuilding databases and software applications from scratch can save money, rather than continuing to maintain older software that has seen so many changes that it is no longer cost-effective to maintain. ❑ Training — Training affects all levels, from technical staff to unknown huge quantities of end-users on the Internet. Obviously, you don’t want to have to train Internet users located at far-flung parts of the globe. Attempting to train Internet users is pointless. People lose interest in applications that are difficult to use. You might as well start again in this case. Training in-house staff is different. The higher the complexity, the more training is involved. Training costs money — sometimes a lot of money! Summar y In this chapter, you learned about: ❑ The basics of analysis and design of a database model ❑ A usable set of semi-formal steps, for the database modeling analysis process ❑ Some common problem areas and misconceptions that can arise ❑ How talking, listening, and following a paper trail can be of immense value ❑ How to create a database model to cover objectives for both an OLTP and a data warehouse database model 256
  5. Planning and Preparation Through Analysis ❑ How to refine a database model using the application of business rules, through basic analysis, creating both an OLTP and a data warehouse database model ❑ How to apply everything learned so far with a comprehensive case study, creating both an OLTP and a data warehouse database model This chapter has attempted to apply everything learned in previous chapters, by going through the motions of beginning with the process of creating a database model. An online auction company has been used as the case study example. Chapter 10 expands on the analysis process (simplistic in this chapter at best) and design with more detail provided for the OLTP and database warehouse database models, as presented analytically in this chapter. The idea is to build things step-by-step, with as much planning as possible. This chapter has begun that database model building process, as the first little step in the right direction. Exercises Use the ERDs in Figure 9-19 and Figure 9-29 to help you answer these questions. 1. Create scripts to create tables for the OLTP database model Figure 9-19. Create the tables in the proper order by understanding the relationships between the tables. 2. Create scripts to create tables for the data warehouse database model Figure 9-29. Once again, create the tables in the proper order by understanding the relationships between the tables. 257
  6. 10 Creating and Refining Tables During the Design Phase “Everything in the world must have design or the human mind rejects it. But in addition, it must have purpose or the human conscience shies away from it.” (John Steinbeck) Analysis is all about what needs to done. Design does it! This chapter builds on and expands on the basic analytical process and structure discovered during the case study approach in Chapter 9, which covered analysis. Analysis is the process of discovering what needs to be done. Analysis is all about the operations of a company, what it does to make a living. Design is all about how to implement that which was analyzed into a useful database model. This chapter passes from the analysis phase into the design phase by discovering through a case study example how to build and relate tables in a relational database model. By the end of this chapter, you will have a much deeper understanding of how database models are created and designed. This chapter teaches you how to begin the implementation of what was analyzed (discovered) in Chapter 9. In short, implementation is the process of creating tables, and sticking those tables together with appropriate relationships. In this chapter, you learn about the following: ❑ Database model design ❑ The difference between analysis and design ❑ Creating tables ❑ Enforcing inter-table relationships and referential integrity ❑ Normalization without going too far ❑ Denormalization without going too far
  7. Chapter 10 A Little More About Design When designing a database model, the intention is to refine database model structure for reasons of maintaining data integrity, good performance, and adaptations to specific types of applications, such as a data warehouse. A data warehouse database model requires a different basic table structure than an OLTP database model, mostly for reasons of acceptable reporting performance. Performance is not exclusively a programming or implementation construction or even a testing or “wait until it’s in production” activity. Performance should be included at the outset in both analysis and design stages. This is particularly important for database model analysis and design for two reasons: ❑ Redesigning a database model after applications are written will change applications (a little like a rewrite — a complete waste of time and money). ❑ There is simply no earthly reason or excuse to not build for performance right from the beginning of the development process. If this is not possible, you might want additional expertise in the form of specialized personnel. Hiring a short-term consultant at the outset of a development project could save enormous amounts of maintenance, rewriting, redevelopment costs, and time in the future. Business events and operations discovered in the analysis stage should be utilized to drive the design process, which consists of refining table pictures and ERDs already drawn. For larger projects, the design stage can also consist of detailed technical specifications. Technical specifications are used by programmers and administrators to create databases and programming code. Essentially, the beginning of the design process marks the point where the thought processes of analysts and programmers begin to mix. In the analysis stage, the approach was one of a business operation (a business-wide view) of a company. In the design stage, it starts to get more technical. When designing a database model, you should begin thinking about a database model from the perspective of how applications will use that database model. In other words, when considering how to build fact and dimensional table structures in a data warehouse database model, consider how reports will be structured. Consider how long those reports will take to run. For a data warehouse, not only are the table contents and relationships important, but factors such as reconstruction of data using materialized views and alternate indexing can help with the building and performance of data warehouse reporting. Relational database model design includes the following: ❑ Refine Database Models — At this stage of the game, most of this is about normalization and denormalization. ❑ Finalization and Approval — This includes finalization (and most especially, approval) of business and technical design issues. You need to get it signed off for two reasons: ❑ Software development involves large amounts of investment in both money and time. Management, and probably even executive management approval and responsibility, is required. A designer does not need this level of worry, but may need some powerful clout to back up the development process. Other departments and managers getting in the way of the development process could throw your schedule and budget for a complete loop. 260
  8. Creating and Refining Tables During the Design Phase Every project needs a sponsor and champion; otherwise, there is no point in progressing. ❑ You need to cover your back. You also need to be sure that you are going in the right direction because you may very well not be. There is usually a good reason why your boss is your boss, and it usually has a lot do with him or her having a lot more experience. Experience is always valuable. It is extremely likely that this person will help you get things moving when you need, such as when you need information and help from other departments. So, it’s not really about passing the buck up the ladder. It’s really about getting a job done. The stronger approval and support for a project, the better the chance of success — unless, of course, it’s your money. In that case, it’s your problem! So, pinch those pennies. Of course, some entrepreneurs say that the secrets to making money are all about cash flow, spending it, and not stashing it in the bank for a rainy day. Be warned, however, that software development is a very expensive and risky venture! So, the design stage is the next stage following the analysis stage. Design is a process of figuring out how to implement what was discovered during the analysis stage. As described in Chapter 9, analysis is about what needs to be done. Design is about how it should be done. The design stage deals with the following aspects of database model creation: ❑ More precise tables, including practical application of normalization. ❑ Establishment of primary and foreign key fields. ❑ Enforcement of referential integrity by establishing and quantifying precise relationships between tables, using primary and foreign key fields. ❑ Denormalization in the design stage (the sooner the better), particularly in the case of data warehouse table structures. ❑ Alternate and extra indexing in addition to that of referential integrity, primary and foreign keys; however, alternate indexing is more advanced (detailed) design, and is discussed in Chapter 11. ❑ Advanced database structures, such as materialized views, and some specialized types of indexing. Similar to alternate indexing, this is also more advanced (detailed) design, and is discussed in Chapter 11. ❑ Precise field definitions, structure, and their respective datatypes (again advanced design). The intention of this chapter is to focus on the firm and proper establishment of inter-table relationships, through the application of normalization and denormalization for both OLTP and data warehouse database models. This process is performed as a case study, continuing with the use of the online auction company introduced in Chapter 9. Let’s create some tables. 261
  9. Chapter 10 Case Study: Creating Tables In Chapter 9, tables were created on an analytical level, creating basic pictures. Following the basic pictures, simple ERDs were constructed. In this section, basic commands are used to create the initial simple tables, as shown in the analytical process of Chapter 9. The idea is to retain the step-by-step instruction of each concept layer, in the database modeling design process, for both OLTP and data warehouse database models. These models are created for the online auction house case study database models. The OLTP Database Model Figure 10-1 shows a simple analytical diagram, covering the various operational aspects of the online auction house, OLTP database model. Notice how the BIDS table is connected to both the LISTING and BUYER tables. This is the only table in this database structure that is connected to more than one table and not as part of a hierarchy. Category tables are part of a hierarchy. Primary Category Secondary Category Tertiary Category Buyer Seller Listing Bids Buyer Seller History History Figure 10-1: Analytical OLTP online auction house database model. Figure 10-2 shows the application of business rules to the simple analytical diagram shown in Figure 10-1. Once again, notice the dual links for the BID table (now called BID for technical accuracy because each record represents a single bid), to both the LISTING and the BUYER tables. This double link represents a 262
  10. Creating and Refining Tables During the Design Phase many-to-many relationship. The BID table is in actuality a resolution of the many-to-many relationship between the LISTING and BUYER tables. In other words, each buyer can place many bids on a single item listed for auction; however, each auction listed item can also have many buyers making those many bids. Seller Category_Primary seller popularity_rating primary join_date address Category_Secondary return_policy international payment_methods secondary Seller_History Listing buyer comment_date listing# listing# Category_Tertiary description comments image start_date tertiary listing_days Buyer Buyer_History currency starting_price reserve_price buyer seller buy_now_price popularity_rating comment_date number_of_bids join_date listing# winning_price address comments Bid bidder seller bid_price bid_date Figure 10-2: Business rules OLTP online auction house database model. The easiest way to create tables in a database model is to create them in a top-down fashion, from static to dynamic tables, gradually introducing more and more dependent detail. In others words, information that does not change very often is created first. Information changing frequently is created after static tables. Technically speaking, it’s all about catering to dependencies first. The first tables created are those with no dependencies. Subsequent tables are created when tables depended on have already been created. Begin by creating the three category static tables. The SECONDARY table depends on the PRIMARY table, and the TERTIARY table depends on both the SECONDARY, and thus the PRIMARY table as well; therefore you need to create PRIMARY, then SECONDARY, followed by TERTIARY tables: CREATE TABLE CATEGORY_PRIMARY(PRIMARY STRING); CREATE TABLE CATEGORY_SECONDARY(SECONDARY STRING); CREATE TABLE CATEGORY_TERTIARY(TERTIARY STRING); Some databases (if not many databases) do not allow use of keywords, such as PRIMARY, or even SECONDARY. PRIMARY could be reserved to represent a primary key and SECONDARY could be reserved to represent secondary (alternate) indexing. If you get an error, simply use another name. 263
  11. Chapter 10 The SELLER and BUYER tables are also static. According to the ERD shown in Figure 10-2 they are not dependencies. So you can create the SELLER and BUYER tables next: CREATE TABLE SELLER ( SELLER STRING, POPULARITY_RATING INTEGER, JOIN_DATE DATE, ADDRESS STRING, RETURN_POLICY STRING, INTERNATIONAL STRING, PAYMENT_METHODS STRING ); CREATE TABLE BUYER ( BUYER STRING, POPULARITY_RATING INTEGER, JOIN_DATE DATE, ADDRESS STRING ); In these table creation script sections, I have begun to use de-formalized datatypes, such as STRING (representing text strings of any length), plus DATE for dates, and INTEGER for whole numbers. Next, you can create the two history tables because the SELLER and BUYER tables are now available. Note how the SELLER_HISTORY table does not have a SELLER field, because this is implied by the direct parent relationship to the SELLER table. The same applies to the BUYER_HISTORY table, containing the SELLER field only. CREATE TABLE SELLER_HISTORY ( BUYER STRING, COMMENT_DATE DATE, LISTING# STRING, COMMENTS STRING ); CREATE TABLE BUYER_HISTORY ( SELLER STRING, COMMENT_DATE DATE, LISTING# STRING, COMMENTS STRING ); Next, create the LISTING table: 264
  12. Creating and Refining Tables During the Design Phase CREATE TABLE LISTING ( LISTING# STRING, DESCRIPTION STRING, IMAGE BINARY, START_DATE DATE, LISTING_DAYS INTEGER, CURRENCY STRING, STARTING_PRICE MONEY, RESERVE_PRICE MONEY, BUY_NOW_PRICE MONEY, NUMBER_OF_BIDS INTEGER, WINNING_PRICE MONEY ); This table introduces a new generic datatype called BINARY, used to store an image. That image could be a JPG, BMP, or any other type of graphic file format. Binary object datatypes allow storage of binary formatted data inside relational databases. The BINARY datatype is not really important to this book; however, storing images into text strings is awkward. Lastly, create the BID table (the BID table is dependent on the LISTING table): CREATE TABLE BID ( BIDDER STRING, SELLER STRING, BID_PRICE MONEY, BID_DATE DATE ); These two tables introduce a new generic datatype called MONEY. You can assume that the MONEY datatype, for most currencies, will have two fixed decimal places. Therefore, $100 will be represented as 100.00. There is a CURRENCY field in the LISTING table. Online auction companies can operate internationally, implying international sales and international bids from buyers in other countries. Different currencies may be involved. A few unusual currencies (such as some of the Arabian Peninsula currencies) actually use three rather than two decimal places. Thus, 100 “whatever’s” would be stored as 100.000. The Data Warehouse Database Model The previous section created very basic tables for the online auction house OLTP database model. Now do exactly the same thing for the data warehouse database model of the online auction house. Figure 10-3 shows a simple analytical diagram displaying the various operational sections for the online auction house data warehouse database model. All the fact information is shoved into a single table. Later on in this chapter, this information will be separated into separate fact tables (or separate star schemas). 265
  13. Chapter 10 Category Hierarchy Seller Listing -Bids Buyer -History Location Time Figure 10-3: Analytical data warehouse online auction house database model. Multiple star schemas within a single data warehouse are sometimes known as individual data marts. Figure 10-4 shows the application of business rules to the simple analytical diagram shown in Figure 10-3 for the data warehouse database model. Once again, you must take table dependencies into account. It is significant to observe how the three category tables, shown in Figure 10-2 (the OLTP database model), have been merged into a single hierarchical category table (CATEGORY_HIERARCHY) in the data warehouse model shown in Figure 10-4. This is a form of denormalization, used especially in data warehouse databases to simplify and compress dimensional information use. 266
  14. Creating and Refining Tables During the Design Phase Category_Hierarchy parent category Seller Buyer seller popularity_rating buyer join_date Listing_Bids_History popularity_rating address join_date return_policy address international listing# payment_methods listing_description listing_image listing_start_date listing_days listing_currency listing_starting_price listing_reserve_price listing_buy_now_price listing_number_of_bids listing_winning_price Time Location listing_winner_buyer bidder bidder_price month region bidder_date quarter country history_buyer year state history_buyer_comment_date city history_buyer_comments history_seller history_seller_comment_date history_seller_comments Figure 10-4: Business rules data warehouse online auction house database model. Now you create the tables for the data warehouse model shown in Figure 10-4. In a well-designed data warehouse star schema, there is only one layer of dependence between a single layer of unrelated dimensions and a single fact table. Create the dimensions: CREATE TABLE LOCATION ( REGION STRING, COUNTRY STRING, STATE STRING, CITY STRING ); CREATE TABLE TIME 267
  15. Chapter 10 ( MONTH INTEGER, QUARTER INTEGER, YEAR INTEGER ); CREATE TABLE CATEGORY_HIERARCHY ( PARENT STRING, CATEGORY STRING ); CREATE TABLE BUYER ( BUYER STRING, POPULARITY_RATING INTEGER, JOIN_DATE DATE, ADDRESS BIGSTRING ); CREATE TABLE SELLER ( SELLER STRING, POPULARITY_RATING INTEGER, JOIN_DATE DATE, ADDRESS BIGSTRING, RETURN_POLICY BIGSTRING, INTERNATIONAL STRING, PAYMENT_METHODS BIGSTRING ); In this table script, yet another datatype called BIGSTRING has been introduced. A BIGSTRING datatype is used to represent fields that may become multiple fields, or even a subsidiary table at a later stage, through normalization. For example, ADDRESS represents an address. An address can consist of many fields, such as STREET, ZIPCODE, CITY, STATE, and so on. The field PAYMENT_METHODS is named with plurality to indicate multiple possible acceptable methods of payment. Different sellers are likely to accept a number of a group of acceptable payment methods. Thus, the BIGSTRING datatype applies to the PAYMENT_METHODS field. For example, one seller may be willing to accept personal checks and cash. Another seller might only accept cash and credit cards, but not personal checks. Now, let’s create the fact table: CREATE TABLE LISTING_BIDS_HISTORY ( LISTING# STRING, LISTING_DESCRIPTION STRING, LISTING_IMAGE BINARY, LISTING_START_DATE DATE, LISTING_DAYS INTEGER, LISTING_CURRENCY STRING, LISTING_STARTING_PRICE MONEY, LISTING_RESERVE_PRICE MONEY, LISTING_BUY_NOW_PRICE MONEY, LISTING_NUMBER_OF_BIDS INTEGER, 268
  16. Creating and Refining Tables During the Design Phase LISTING_WINNING_PRICE MONEY, LISTING_WINNER_BUYER STRING, BIDDER MONEY, BIDDER_PRICE MONEY, BIDDER_DATE DATE, HISTORY_BUYER STRING, HISTORY_BUYER_COMMENT_DATE DATE, HISTORY_BUYER_COMMENTS BIGSTRING, HISTORY_SELLER STRING, HISTORY_SELLER_COMMENT_DATE DATE, HISTORY_SELLER_COMMENTS BIGSTRING ); This fact table shows the source of all fields as being listing, bidder, buyer history, and seller history. Tables have been created for the OLTP and data warehouse database models for the online auction house. The next step is to establish and enforce referential integrity. Case Study: Enforcing Table Relationships Referential integrity maintains and enforces the data integrity of relationships between tables. In other words, referential integrity ensures that where a child table record exists, the parent table record exists as well. Referential Integrity In Figure 10-2, you cannot delete a SELLER record without deleting all the seller’s listings first. If the seller is deleted, their listings become orphaned records. An orphaned record is term applied to a record not findable within the logical table structure of a database model. Essentially, the seller’s name and address details are stored in the SELLER table and not in the LISTING table. If the seller record was deleted, any of their listings are useless because you don’t know who is selling it. Similarly, if buyer information with winning bids were deleted, the seller wouldn’t know who to mail it to. Referential integrity, through the use of primary and foreign keys, acts to ensure that the following activities are prohibited: ❑ INSERT check — A child record cannot be added to a child table unless the parent record exists in the parent table. ❑ UPDATE check — Parent and child table records cannot have their linking key field values changed, unless both are changed simultaneously (“simultaneously” implies within the same transaction). ❑ DELETE check — A parent table record cannot be deleted when a child table record exists, unless all related child table records are deleted first. ❑ DELETE CASCADE check — A parent table record can be deleted if all child records are deleted first. This is known as a cascading delete. Cascade deletion is rarely implemented because it can result in serious data loss (usually through operator or programmer error). Additionally, cascade deletions can cause serious locking conflicts because large numbers of child records could have to be deleted, when deleting a parent record. This type of locking problem can occur when a parent record has many records in child tables, through a multiple level table structure. 269
  17. Chapter 10 Primary and Foreign Keys Primary and foreign keys are used to establish referential integrity relationships between parent and child tables. The parent table contains the primary key, and the child table the foreign key. The term primary implies most significant field for a table, and thus uniquely identifying. Each seller record would have a unique seller name (the name of the person or company selling the item at auction). Two sellers can’t have the same name, leading to the obvious silly result. How would you dissimilate between two different sellers? Impossible. The term foreign implies a key that is foreign to a child table, whose uniquely identifying value lies in another table (the parent table containing the primary key). Now demonstrate implementation of primary and foreign keys by re-creating the OLTP database model tables, as shown in Figure 10-2. The ERD in Figure 10-2 has been changed to the ERD shown in Figure 10-5. Seller Category_Primary seller primary popularity_rating join_date address return_policy international payment_methods Listing Category_Secondary Seller_History listing# primary (FK) seller (FK) primary (FK) secondary buyer secondary (FK) tertiary (FK) comment_date seller (FK) listing# buyer (FK) comments description image Category_Tertiary start_date primary (FK) listing_date currency Buyer Buyer_History secondary (FK) tertiary starting_price buyer buyer (FK) reserve_price popularity_rating seller buy_now_price join_date Bid comment_date number_of_bids address primary (FK) listing# winning_price secondary (FK) comments tertiary (FK) seller (FK) listing# (FK) buyer (FK) bidder bid_price bid_date Figure 10-5: Defining primary and foreign keys for the OLTP online auction house database model. The table contents in Figure 10-5 may appear somewhat confusing. All the dotted lines have changed to solid lines (non-identifying to identifying relationships) and the primary keys are largely composite key fields. Yuk! Use surrogate keys instead. Before explaining why relationships have been changed from non-identifying to identifying, first implement surrogate keys. 270
  18. Creating and Refining Tables During the Design Phase Using Surrogate Keys Figure 10-6 shows the same model as shown in Figure 10-5, except with surrogate keys added. The only table without a surrogate primary key field is the LISTING table, still using the LISTING# field as its pri- mary key. LISTING# is likely to be some type of auto counter anyway, and so a surrogate key is not nec- essary in this case. Figure 10-6 still has composite primary key fields. Seller seller_id Category_Primary seller primary_id popularity_rating primary join_date address return_policy international payment_methods Listing Category_Secondary Seller_History primary_id (FK) primary_id (FK) seller_id (FK) secondary_id (FK) secondary_id buyer tertiary_id (FK) secondary seller_id (FK) comment_date buyer_id (FK) listing# listing# comments description image Category_Tertiary start_date listing_date Buyer primary_id (FK) secondary_id (FK) currency buyer_id Buyer_History tertiary_id starting_price buyer buyer_id (FK) reserve_price popularity_rating tertiary seller buy_now_price join_date Bid comment_date number_of_bids address primary_id (FK) listing# winning_price secondary_id (FK) comments tertiary_id (FK) seller_id (FK) buyer_id (FK) listing# (FK) bidder bid_price bid_date Figure 10-6: Using surrogate keys as primary keys with identifying relationships. One important point about Figure 10-6 is that the three category tables, plus the buyer and seller table, have additional identifier fields as primary keys (for example, BUYER.BUYER_ID). However, these tables still have their original primary key fields, now no longer as the primary key (for example, BUYER.BUYER). The identifiers have become surrogates (replacements) for the original string datatype names. Identifying relationships in Figure 10-6 imply that a child table record is specifically identified by a parent table record, through the connection between parent table primary key, and child table foreign key. 271
  19. Chapter 10 Identifying versus Non-Identifying Relationships One more improvement that can be made is to change all the identifying relationships back to non-identifying relationships, where appropriate, as shown in Figure 10-7. Using non-identifying relationships (represented by dotted lines in Figure 10-7), child tables are no longer uniquely identified by parent table primary key values (by the parent table primary key). Notice how much simpler the ERD in Figure 10-7 has become. Also, notice how the BID, SELLER_HISTORY, and BUYER_HISTORY tables are still related to their parent tables using identifying relationships (solid, non-dotted lines in Figure 10-7). The two history tables do not have primary keys of their own. This means they can contain duplicated records for each. In other words, a buyer can contain many history entries, and, therefore, each buyer can have many BUYER_HISTORY records. The BID table does not have its own exclusive primary key field because it is a table resolving the many-to-many relationship between the LISTING and BID tables. Seller seller_id Category_Primary seller primary_id popularity_rating primary join_date address return_policy international payment_methods Listing Category_Secondary Seller_History listing# secondary_id seller_id (FK) buyer_id (FK) primary_id (FK) buyer seller_id (FK) secondary comment_date secondary_id (FK) listing# tertiary_id (FK) comments description image start_date Buyer Buyer_History Category_Tertiary listing_days buyer_id buyer_id (FK) currency tertiary_id buyer starting_price seller secondary_id (FK) reserve_price popularity_rating comment_date tertiary buy_now_price join_date listing# number_of_bids address comments winning_price Bid buyer_id (FK) listing# (FK) bidder bid_price bid_date Figure 10-7: Using surrogate keys as primary keys with some non-identifying relationships. Parent Records without Children A parent record without children is where a parent table can have records, such that child related records are not required, within child tables. For example, there is no reason why a secondary category should always be created for a primary category. In other words, a primary category can be created where no secondary or tertiary categories exist for that primary category. For the online auction house, it depends on how categories are organized. It may be that some types of secondary and tertiary categories sell extremely well. This could perhaps make them primary categories as well. This particular complication is ignored. 272
Đồng bộ tài khoản