Managing time in relational databases- P5

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:20

lượt xem

Managing time in relational databases- P5

Mô tả tài liệu
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Tham khảo tài liệu 'managing time in relational databases- p5', công nghệ thông tin, cơ sở dữ liệu phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

Chủ đề:

Nội dung Text: Managing time in relational databases- P5

  1. 62 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH the row representing that assertion will cease to be asserted on that date even if no correcting assertion is supplied to replace it. The last reason an assertion end date may be changed is to lock an assertion which has been updated or deleted by a deferred transaction, until the resulting deferred assertion becomes current. We will have more to say about deferred trans- actions, deferred assertions and locking in Chapter 13. Now() and UTC Keeping our notation DBMS agnostic, and keeping the clock tick granularity generic, we will refer to the current moment, to right now, as Now().7 SQL Server may use getdate(), and DB2 may use Current Timestamp or Current Date. Depending on our clock tick duration, we might need to use a date formatting func- tion to set the current granularity. In our examples, we generally use one month as our clock tick granularity. However for our purposes, Now() can take on values at whatever level of granular- ity we choose to use, including day, second or microsecond. Now() is usually assumed to represent the current moment by using local time. But local time may change because of entering or leaving Daylight Savings Time. And another issue is time zone. At any one time, data about to update a database may exist in a different time zone than the database itself. Users about to retrieve data from a database may exist in a different time zone than the database itself. And, of course, federated queries may attempt to join data from databases located in different time zones. So the data values returned by Now() can change for reasons other than the passage of time. Daylight Savings Time can change those values. At any one point in time, those values can differ because of time zones. Clearly, we need a reference frame- work, and a set of values, that will not change for any reason other than the passage of time, and that will be the same value, at any point in time, the world over and year around. This reference framework is Universal Coordinated Time (UTC).8 To make use of UTC, our Asserted Versioning Framework will convert local time to UTC on maintenance and queries, and 7 Now() is a function that returns the current date. It is not a value. However, we will often use it to designate a specific point in time. For example, we may say that a time period starts at Now() and continues on until 12/31/9999. This is a shorthand way of emphasizing that, whenever that time period was created, it was given as its begin date the value returned by Now() at that moment. 8 However, even in UTC, some variations in time values do not reflect the passage of time. We are referring here to the periodic adjustments in UTC made by adding or removing leap seconds, as we described in an earlier section of this chapter.
  2. Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 63 will store Asserted Versioning temporal parameters, such as begin and end dates, in UTC. For example, with Policy_AV being an asserted version table of insurance policies, we would insert a policy like this: INSERT INTO Policy_AV (oid, asr_beg_dt . . . . .) VALUES (55, CURRENT TIMESTAMP - CURRENT TIMEZONE . . . . .) For queries, they will perform better if we do the time conver- sion before using the value as a selection predicate in the SQL itself. This is because most optimizers treat functions that appear in predicates as non-indexable. For example, in DB2, we should write: SET :my-cut ¼ TIMESTAMP(:my-local-time-value) - CURRENT TIMEZONE SELECT . . . . . FROM . . . . . WHERE oid ¼ 55 AND asr_beg_dt :my-cut rather than SELECT . . . . . FROM . . . . . WHERE oid ¼ 55 AND asr_beg_dt
  3. 64 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH academic work on bi-temporality. At that time, these version tables which also contained a row create date were state of the art in best practice methods for managing temporal data. We will discuss them in the next chapter. With a row creation date, of course, any query can be restricted to the rows present in a table as of any specific date by including a WHERE clause predicate that qualifies only those rows whose create date is less than or equal to the speci- fied date. With two effective dates, tables like these are also able to specify one of the two temporal dimensions that make up full bi-temporality. The standard temporal model uses the term “valid time” where we use the term “effective time”. But the difference is purely verbal. We have found no differences between how valid time works in the standard model, and how effective time works in Asserted Versioning. We use “effective time” because it is the preferred term among business IT professionals, and also because it readily adapts itself to other grammatical forms such as “becoming effective” and “in effect”. The standard model states that “(v)alid time . . . captur(es) the history of a changing reality, and transaction time . . . . . captur(es) the sequence of states of a changing table . . . . . A table supporting both is termed a “bi-temporal table” [2000, Snodgrass, p. 20]. But as we will see later, Asserted Versioning does not define bi-tempo- rality in exactly the same way. The difference lies primarily in the second of the two temporal dimensions, what computer scientists call “transaction time” and what we call “assertion time”. While a transaction begin date always indicates when a row is physically inserted into a table, an assertion begin date indicates when we are first willing to assert, or claim, that a row is a true statement about the object it represents, during that row’s effective (valid) time period, and also that the quality of the row’s data is good enough to base business decisions on. In the standard temporal model, the beginning of a transaction time period is the date on which the row is created. Obviously, once the row is created, that date cannot be changed. But in the Asserted Versioning temporal model, an assertion time period begins either on the date a row is created, or on a later date. Because an assertion begin date is not necessarily the same as the date on which its row is physically created, Asserted Versioning needs, in addition to the pair of dates that define this time period, an additional date which is the physical creation date of each row. That date serves as an audit trail, and as a means of reconstructing a table as it physically existed at any past point in time.
  4. Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 65 What are these rows with future assertion begin dates? To take a single example, they might be rows for which we have some of the business data, but not all of it, rows which are in the process of being made ready “for prime time”. These rows—which may be assertions about past, present or future versions—are not yet ready, we will say, to become part of the production data in the table, not yet ready to become rows that we are willing to present to the world and of which we are will- ing to say “We stand behind the statements these rows make. We claim that the statements they make are (or are likely to become) true, and that the information these rows provide meets the standards of reliability understood (or explicitly stated) to apply to all rows in this table”. So the semantics of the standard temporal model are fully supported by Asserted Versioning. But Asserted Versioning adds the semantics of what we call deferred assertions, and which we have just briefly described. As we will see in later chapters, deferred assertions are just one kind of internalized pipeline dataset, and the internalization of pipeline datasets can eliminate a large part of the IT maintenance budget by eliminating the need to manage pipeline datasets as distinct physical objects. Allen Relationships Allen relationships describe all possible positional relationships between two time periods along a common time- line. This includes the special case of one or both time periods being a point in time, i.e. being exactly one clock tick in length. There are 13 Allen relationships in total. Six have a corresponding inverse relationship, and one does not. Standard treatments of the Allen relationships may be found in both [2000, Snodgrass] and [2002, Date, Darwen, Lorentzos]. We have found it useful to reconstruct the Allen relationships as a binary taxonomy. Our taxonomy is shown in Figure 3.4. In this diagram, the leaf nodes include a graphic in which there are two timelines, each represented by a dashed line. All the leaf nodes but one have an inverse, and that one is italicized; when two time periods are identical, they do not have a distinct inverse. Thus, this taxonomy specifies 13 leaf- node relationships which are, in fact, precisely the 13 Allen relationships. The names of the Allen relationships are standard, and have been since Allen wrote his seminal article in 1983. But those names, and the names of the higher-level nodes in our own tax- onomy of the Allen relationships, are also expressions in
  5. 66 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH Time Periods Relationships Along a Common Timeline Intersects Excludes Fills Overlaps Before Meets |----------| |-----| |-----| |-----|-----| |----------| Equals Occupies |-----| |-----| Aligns During |-----| |------------| Starts Finishes |-----| |-----| |------------| |-----------| Figure 3.4 The Asserted Versioning Allen Relationship Taxonomy. ordinary language. In order to distinguish between the ordinary language and the technical uses of these terms, we will include the names of Allen relationships and our other taxonomy nodes in brackets when we are discussing them. We will also underline the non-leaf node relationships in the taxonomy, to emphasize that they are relationships we have defined, and are not one of the Allen relationships. In the following discussion, the first time period in a pair of them is the one that is either earlier than the other, or not longer than the other. Given two time periods on a common timeline, either they have at least one clock tick in common or they do not. If they do, we will say that they [intersect] one another. If they do not, we will say that they [exclude] one another. If there is an [intersects] relationship between two time per- iods, then either one [fills] the other or each [overlaps] the other. If one time period [fills] another, then all its clock ticks are also in the time period it [fills], but not necessarily vice versa. If one time period [overlaps] another, then the latter also overlaps the former; but, being the later of the two time periods, we say that the latter time period has the inverse relationship, [overlapsÀ1]. In the overlaps cases, each has at least one clock tick that the
  6. Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 67 other does not have, as well as having at least one clock tick that the other does have. If two time periods [exclude] one another, then they do not share any clock ticks, and they are either non-contiguous or con- tiguous. If there is at least one clock tick between them, they are non-contiguous and we say that one is [before] the other. Other- wise they are contiguous and we say that one [meets] the other. If one time period [fills] the other, then either they are [equal], or one [occupies] the other. If they are [equal], then neither has a clock tick that the other does not have. If one [occupies] the other, then all the clock ticks in the occupying time period are also in the occupied time period, but not vice versa. If one time period [occupies] the other, then either they share an [aligns] relationship, or one occurs [during] the other. If they are aligned, then they either start on the same clock tick or end on the same clock tick, and we say that one either [starts] or [finishes] the other. Otherwise, one occurs [during] the other, beginning after the other and ending before it. Note that if two time periods are aligned, one cannot both [start] and [finish] the other because if it did, it would be [equal] to the other. If one time period [starts] another, they both begin on the same clock tick. If one [finishes] the other, they both end on the same clock tick. If one time period [occupies] another, but they are not aligned, then one occurs [during] the other. Now let’s consider the special case in which one of the two time periods is a point in time, i.e. is exactly one clock tick in length, and the other one contains two or more clock ticks. This point in time may either [intersect] or [exclude] the time period. If the point in time [intersects] the time period, it also [fills] and [occupies] that time period. If it [aligns] with the time period, then it either [starts] the time period or [finishes] it. Otherwise, the point in time occurs [during] the time period. If the point in time [excludes] the time period, then either may be [before] the other, or they may [meet]. Finally, let’s consider one more special case, that in which both the time periods are points in time. Those two points in time may be [equal], or one may be [before] the other, or they may [meet]. There are no other Allen relationships possible for them. As we will see later, four of these Allen relationship categories are especially important. They will be discussed in later chapters, but we choose to mention them here. (i) The [intersects] relationship is important because for a tem- poral insert transaction to be valid, its effective time period cannot intersect that of any episode for the same object which is already in the target table. By the same token, for
  7. 68 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH a temporal update or delete transaction to be valid, the tar- get table must already contain at least one episode for the same object whose effective time period does [intersect] the time period designated by the transaction. (ii) The [fills] relationship is important because violations of the temporal analog of referential integrity always involve the failure of a child time period to [fill] a parent time period. We will be frequently discussing this relationship from the parent side, and we would like to avoid having to say things like “. . . . . failure of a parent time period to be filled by a child time period”. So we will use the term “includes” as a synonym for “is filled by”, i.e. as a synonym for [fillsÀ1]. Now we can say “. . . . . failure of a parent time period to include a child time period”. (iii) The [before] relationship is important because it distinguishes episodes from one another. Every episode of an object is non-contiguous with every other episode of the same object, and so for each pair of them, one of them must be [before] the other. (iv) The [meets] relationship is important because it groups versions for the same object into episodes. A series of vers- ions for the same object that are all contiguous, i.e. that all [meet], fall within the same episode of that object. Advanced Indexing Strategies Indexes are one way to improve performance. And it should be clear that it would be a serious performance handicap if we could not define indexes over either or both of the two time per- iods of a bi-temporal table. But this proves to be more complex than it might at first sight appear to be. The issue is that traditional indexes contain pointers to rows, pointers which are based on discrete values, while the two time periods of rows in bi-temporal tables are not discrete values, but rather an unbroken and non-overlapping sequence of such values. Such rows occupy points in effective (valid) time or in assertion (transaction) time only as a limit case. What they really occupy are intervals along those two timelines. That’s the reason we need two dates to describe each of them. Traditional balanced-tree indexes work well with discrete values, including such discrete values as dates. But they don’t work well with intervals of time, i.e. with time periods. But indexing methods which manage intervals are being developed. Specifically, some bi-temporal indexing methods manage the two intervals for a bi-temporal object as a single
  8. Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 69 unit, which would appear as a rectangle on a Cartesian graph in which one temporal dimension is represented by the X-axis and the other by the Y-axis. Another approach is to manage each of the two temporal dimensions separately. One reason for taking this approach is that, for the standard temporal model, the two temporal dimensions behave differently. Specifically, for the standard model, transac- tion time always moves forwards, whereas valid time can move forwards or backwards. This means that a bi-temporal row can be inserted into a table proactively in valid time, but can never be inserted into a table proactively in transaction time. Asserted Versioning, as we have already pointed out, supports both forwards and backwards movement in both temporal dimensions. So for Asserted Versioning, there is no difference in behavior which would justify separating the two temporal dimensions for indexing purposes. Specifically, Asserted Versioning supports both proactive (future-dated) versions and proactive assertions (i.e. deferred assertions) and also both retro- active versions and an approval transaction which can move deferred assertions backwards in time, but not prior to Now(). In Chapter 15, we will describe certain indexing strategies that will improve performance using today’s DBMS index designs. Temporal Extensions to SQL Following [2000, Snodgrass], we will refer to a future release of the SQL language that will contain temporal extensions as SQL3. A more detailed discussion may be found in that book, although we should note that the book is, at the time of publica- tion of this book, 10 years old. Temporal Upward Compatibility One issue related to changes in the SQL standard is temporal upward compatibility. In describing SQL3, Snodgrass states that “(t)emporal upward compatibility at its core says this: ‘Take an application that doesn’t involve time, that concerns only the cur- rent reality . . . . . Alter one or more of the tables so that they now have temporal support . . . . . The application should run as before, without changing a single line of code’” [2000, Snodgrass, p. 449]. This cannot be an objective for Asserted Versioning, because we are limited to current SQL, not to a future release of SQL that builds temporal extensions into the language itself. But we can come close. We can achieve this objective for queries by using a view which filters out all but current data, and by redirecting
  9. 70 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH existing queries to that view. We can achieve this objective for temporal inserts, updates and deletes by defining default effec- tive and assertion dates for these transactions. These will be default dates that cause these transactions, as written by their authors, and as parsed and submitted to the DBMS by the AVF, to physically insert and update current assertions of the current versions of the objects they reference. The PERIOD Datatype A second issue related to changes in the SQL standard is the need for a PERIOD datatype. This new datatype will not change the semantics of temporal data management, but it will simplify the expression of those semantics. For one thing, a single col- umn will replace a pair of dates. This will simplify the way that Allen relationships are specified in SQL statements. For example, it will change the expression with which we ask whether a point in time is or is not contained in a period of time. Where P and T are, respectively, a point in time and a period of time, and T1 and T2 are dates delimiting T using the closed-open representation, we currently must ask whether P is a clock tick within T like this: WHERE T1
  10. Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 71 But if we had a PERIOD datatype, and SQL extensions and indexing methods that could recognize and manage that datatype, then all the Allen relationships among time periods could be easily expressed, and the very important [excludes] relationship could be enforced by means of a unique index. Lacking that future technology, and the standards needed to insure interoperability across different vendor implementations, the AVF contains its own code that effectively turns pairs of dates into a user-defined PERIOD datatype. Temporal Primary Keys A third issue related to changes in the SQL standard is sup- port for temporal primary keys. With those temporal extensions, we will be able to declare a temporal primary key to the DBMS and, by the same token, declare temporal foreign keys as well. But what is it we will be declaring? Temporal tags added to phys- ically unique identifiers of rows of otherwise non-temporal tables? Or something more? If a SQL standard for bi-temporality, when we eventually have one, is a standard for adding two temporal tags to rows in other- wise non-temporal tables, and providing a PERIOD data type and Allen relationship operators to manage the data thus tagged, then most of the semantics of bi-temporality will have been left out of that standard, and left up to developers. The managed objects of temporal data management are not physical rows in physical tables. They are collections of one or more rows which represent temporally delimited claims about temporally delimited statements about what real-world persistent objects were like, are like, or will be like. As long as every database table contains one and only one row for each instance of the type indicated by the table, it is easy to forget about the semantics and concentrate on the mechan- ics. Primary key uniqueness is mechanics; its role in eliminating row-level synonyms—and its failure to address the problem of row-level homonyms—is the semantics that are easily and almost always overlooked. Foreign key referential integrity is mechanics; its role in expressing existence dependencies among the objects represented by those rows is the semantics that are easily and almost always overlooked. It has been this one-to-one correlation between rows and the objects they represent that has made it easy to give short shrift to semantics, and to then get right down to what really fascinates engineering types—the mechanics of making things work. But once we attempt to manage both the epistemological
  11. 72 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH and the ontological temporal dimensions of data, i.e. both asser- tion time and effective time, we must give up this comfortable one-to-one correlation. We must recognize that rows are no lon- ger an adequate representation of the semantic objects that the bi-temporality of data reveals to us. From the point of view of the conceptual foundations of Asserted Versioning, the declaration of a bi-temporal primary key must specify (i) a unique identifier for an object, (ii) an assertion time period, and (iii) an effective time period. In par- ticular, that declaration must not specify simply (i) a unique identifier for an otherwise non-temporal row, (ii) a valid time period, and (iii) a transaction time period, one whose begin date is always the date on which the row was physically created. Temporal Logic A fourth issue related to changes in the SQL standard is tem- poral logic. SQL implements a subset of propositional and first- order predicate logic (FOPL). Temporal extensions to SQL are not a matter of simply adding temporal features to SQL until we’re pleased with the result. The logical foundations of those temporal extensions must be explored so that, as SQL is extended, we can remain confident that it is logically sound. Examples of work in temporal logic include first-order temporal logic (FOTL). Just as FOPL uses the quantifiers for all X and for some X, FOTL adds the quantifiers at some time in the past, at every time in the past, at some time in the future and at every time in the future, and also adds the binary predicates SINCE and UNTIL. Glossary References Glossary entries whose definitions form strong inter- dependencies are grouped together in the following list. The same glossary entries may be grouped together in different ways at the end of different chapters, each grouping reflecting the semantic perspective of each chapter. There will usually be sev- eral other, and often many other, glossary entries that are not included in the list, and we recommend that the Glossary be consulted whenever an unfamiliar term is encountered. We note, in particular, that none of the nodes in our Allen relationships taxonomy are included in this list. In general, we leave taxonomy nodes out of these lists, but recommend that the reader look them up in the Glossary.
  12. Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 73 clock tick granularity PERIOD datatype point in time time period episode managed object object pipeline dataset inflow pipeline dataset internalization of pipeline datasets outflow pipeline dataset replace supercede withdraw statement
  13. THE ORIGINS OF ASSERTED 4 VERSIONING: IT BEST PRACTICES CONTENTS A Non-Temporal Table and a Basic Version Table 77 Basic Versioning 79 An Insert Transaction 79 An Update Transaction 80 A Second Update Transaction 81 A Delete Transaction 82 Logical Delete Versioning 83 Temporal Gap Versioning 86 Effective Time Versioning 87 Effective Time Versioning and Retroactive Updates 89 Effective Time Versioning and Retroactive Inserts and Deletions 90 The Scope and Limits of Best Practice Versioning 92 Glossary References 93 Lots of things are important to us. That’s why we keep data about them in our databases. In a non-temporal table, each one of them, i.e. each object, is represented by one and only one row. In a version table, however, each row represents a period of time in the life of an object, and is a description of what that object is like during that time. And so, in a version table, there can be any number of rows representing the same object, each describing what the object is like during a different period of time. In an assertion table, on the other hand, each row represents an assertion about an object, and represents what we said, dur- ing a specific period of time, that object is like. And so, in an assertion table, there also can be any number of rows representing the same object, each describing what we said, at a different period of time, the object is like. Managing Time in Relational Databases. Doi: 10.1016/B978-0-12-375041-9.00004-2 Copyright # 2010 Elsevier Inc. All rights of reproduction in any form reserved. 75
  14. 76 Chapter 4 THE ORIGINS OF ASSERTED VERSIONING: IT BEST PRACTICES Bi-temporal tables contain both kinds of information. Their rows tell us what we once asserted as true, currently assert as true, or may at some future time assert as true, a statement about what an object used to be like, is like right now, or may be like at some time in the future. Because each object is represented by exactly one row in a non-temporal table, when updates are applied, those updates overwrite the data that was there before the update. Such updates are called “updates in place”. But one problem with updates in place, of course, is that they lose information. They lose the information about what the object used to be like, about what it was like before the update. They lose historical informa- tion because they overwrite data. Historical data can usually be found somewhere, of course, in archives and transaction logs if nowhere else. But if it is impor- tant to be able to quickly and easily access data about what objects used to be like, either by itself or together with data about what those objects are like now, then keeping that data in the same table that also contains data about the current state of those objects makes a lot of sense. If we don’t keep historical and current data in the same table, then query authors who need that data will need to be aware of the multiple table and column names, and the multi- ple different locations, where different subsets of historical data are kept; and, as we know, they often are not. Even if aware of all the places from which they will have to assemble the histor- ical data they are interested in, they will also have to know which of possibly redundant copies of that data is the most cur- rent, and which the most reliable and complete; and, as we also know, they often don’t. They may need to rewrite queries, changing table and col- umn names prior to pointing them to whichever copy of that data is chosen as the target for those queries; and in doing so, as we all know, they often make mistakes. And when tables of historical data are not kept column for column union-ably paral- lel with the corresponding tables containing current data, which is often the case, then the job of query authors becomes even more difficult and error-prone. They won’t be able to simply copy production queries and change names. They will have to write new queries, perhaps joining data that the production queries did not have to join, perhaps assembling intermediate results and then combining those intermediate results in various complicated ways. In short, they may very quickly be taken into the realm of SQL queries that all but the most experienced query authors have no business writing.
  15. Chapter 4 THE ORIGINS OF ASSERTED VERSIONING: IT BEST PRACTICES 77 Versioning is the IT community’s way of providing queryable access to historical, current and future data. However, there are far too many variations of versioning to make an exhaustive review of them possible. So in this chapter, we will distinguish four main types of versioning, and will discuss one representa- tive method of each type. Those four types are: (i) Basic versioning; (ii) Logical delete versioning; (iii) Temporal gap versioning; and (iv) Effective time versioning. In this chapter, we will use several variations of a version table, and a corresponding non-temporal table, to present the basic mental model which we believe is essential to understand- ing how to manage versioned data. That mental model is that conventional and version tables of the same persistent objects (customers, policies, etc.) are related as follows. An object is represented in a non-temporal table by a row which is put into that table at a certain point in time, may be updated during its tenure on the table, and may eventually be removed from that table. These three stages in the life history of an object as represented in a non-temporal table are inaugurated by, respectively, an insert transaction, zero or more update transactions, and zero or one delete transaction. That same object is represented in a version table by a series of one or more temporally tagged rows. The object that is identified by a primary key in the non-temporal table is, in the corresponding basic version table, represented by that primary key plus a clock tick. So the primary key for the first row for an object to appear in a basic version table contains the object’s unique identifier and the date the row was inserted. For each update to that object, a new row is inserted into the version table. That row has the same unique identifier for the object, but its date is different than the date of other rows for that object already on the table because its date is the date it is inserted into the table. As for a delete, some approaches to versioning carry it out as a physical delete, and others as a logical delete. A Non-Temporal Table and a Basic Version Table Figure 4.1 introduces the diagram we will use to compare a non-temporal table with various types of version tables. Two tables are shown. The one on top represents a non- temporal table of insurance policies. The one below it represents
  16. 78 Chapter 4 THE ORIGINS OF ASSERTED VERSIONING: IT BEST PRACTICES Jan 10 Jan Jan Jan Jan Jan 2010 2011 2012 2013 2014 INSERT INTO Policy (BK, client, type, copay, crt_dt, updt_dt) VALUES (‘P861’, ‘C882’, ‘HMO’, ‘$15’, CURRENT_DATE, NULL) BK client type copay crt-dt updt-dt P861 C882 HMO $15 Jan10 {null} INSERT INTO Policy (BK, ver_dt, client, type, copay) VALUES (‘P861’, CURRENT_DATE, ‘C882’, ‘HMO’, ‘$15’) BK ver-dt client type copay P861 Jan10 C882 HMO $15 Figure 4.1 Conventional and Basic Version Tables: An Insert Transaction. a basic version table of those same policies. Above each table is the SQL statement that inserted or most recently altered the data in that table. In each table, primary key columns are indicated by under- lining their column headings. In each table, the italicized col- umn is a foreign key to a table which will make its first appearance in Chapter 11. Type and copay are the two business data columns in both tables. Create date and update date in the non-temporal table are, respectively, the date the row was inserted into the table, and the date the row was last updated. In all sample tables, dates are shown in the format “mmmyy”. Thus, “Jan10” is short for January 1st, 2010, which is the start of the January 2010 clock tick. Since the clock used in most of these examples ticks once a month, the notation is unambiguous. The reason for the “mmmyy” representation is that it takes up mini- mal horizontal space on the page, which is important for the sample transactions, rows and tables used in illustrations throughout the book. In the version table, there is no update date because rows are not physically updated. Instead, each logical update is carried out by copying the most current version, applying the update to the copy and inserting the result as the new most current ver- sion. As for a physical create date, that is the same thing as the version date in basic versioning.
  17. Chapter 4 THE ORIGINS OF ASSERTED VERSIONING: IT BEST PRACTICES 79 The upper components of the diagram read as follows: (i) The box in the upper left-hand corner of the diagram indicates what time it is now, in the example. (ii) The row of vertical bars represents a timeline. Each vertical bar on that timeline represents one month. The month which is current, in the context of each example, is marked by filling in its vertical bar. This is a graphical representa- tion of the same information provided in the clock tick box. (iii) The open-ended rectangle located below the clock tick box and directly above the January 2010 bar indicates the effec- tive time period of the single version of the policy object, P861. Rectangles which are open-ended will be used to rep- resent versions with unknown end dates. As we will see in later diagrams, as soon as the end date for a version is known, the rectangle will be closed. Basic Versioning IT professionals have been using version tables for at least a quarter-century, when they want to keep track of changes that would otherwise be lost because of updating by overwriting data. The simplest versioning method is to add a date to the primary key of the table to be versioned, thus transforming it from a non-temporal into a uni-temporal table, i.e. into a table with one temporal dimension. This is the method we call basic versioning. An Insert Transaction Figure 4.1 shows the results of applying the same insert trans- action to each table. In both cases, that result is a single row, as shown. An open-ended rectangle is situated directly above Janu- ary 2010. This marks the start of the time period during which policy P861 is represented in the basic version table. After the insert, the rectangle extends through to the end of the current month even though, on the date of the insert, we are at the beginning of that month, not at its end. This is because we are using a clock that ticks once a month and so that transac- tion, once applied, will remain in the database for at least that one month. Thus, our concept of a clock tick is a logical concept, not a physical one. There is no upper limit to the length of a clock tick. The lower limit is the granularity of a full machine timestamp, as accessible by the DBMS. It is what we called, in the previous chapter, an
  18. 80 Chapter 4 THE ORIGINS OF ASSERTED VERSIONING: IT BEST PRACTICES atomic clock tick. Although it is possible to compare points in time and periods of time that use different size clock ticks, it is far simpler when all temporal tables use the same size clock ticks. In this book, as we said, we will manage temporal data with a clock that ticks once a month. Management of different size clock ticks can be left to the eventual implementation of temporal data management by vendors because it is a discrete technical issue whose absence from this discussion does not affect the discussion in any other way. An Update Transaction Figure 4.2 shows the results of applying the same update transaction to each table. In the case of the version table, the update is carried out as an insert, resulting in a new row which is a new version of the policy. In the case of the non- temporal table, the update is carried out as a simple update in place. That transaction changes the copay amount on the policy from $15 to $20. The update was applied in May, so the rectangle for this update extends through the end of that month. Note also that the transaction closes the first rectangle, and closes it as of the same clock tick that begins the new version. May10 Jan Jan Jan Jan Jan 2010 2011 2012 2013 2014 UPDATE Policy SET copay = ‘$20’, updt_dt = CURRENT_DATE WHERE BK = ‘P861’ BK client type copay crt-dt updt-dt P861 C882 HMO $20 Jan10 May10 INSERT INTO Policy (BK, ver_dt, client, type, copay) VALUES (‘P861’, CURRENT_DATE, ‘C882’, ‘HMO’, ‘$20’) BK ver-dt client type copay P861 Jan10 C882 HMO $15 P861 May10 C882 HMO $20 Figure 4.2 Conventional and Basic Version Tables: An Update Transaction.
  19. Chapter 4 THE ORIGINS OF ASSERTED VERSIONING: IT BEST PRACTICES 81 Now the information content of the two tables begins to diverge. The version table tells us that P861 had a copay of $15 from January to May, and a copay of $20 thereafter. The non-temporal table does tell us that an update was made in May, and that the copay for P861 is now $20, but it doesn’t tell us what data the update changed, or how many updates have already been applied to the policy. But the version table tells us all these things. We can deter- mine what was changed by the update by comparing each non-initial version of the object with its immediate predecessor. We can tell what the policy was like before the change by looking at the previous version. We can tell how many updates have been applied to the policy by counting the versions, and we can tell when each one took place. A Second Update Transaction Figure 4.3 shows a second update transaction. This transac- tion updates the policy type from HMO (Health Maintenance Organization) to PPO (Preferred Provider Organization). Aug10 Jan Jan Jan Jan Jan 2010 2011 2012 2013 2014 UPDA Policy TE SET type = ‘PPO’, updt_dt = CURRENT_DATE WHERE BK = ‘P861’ BK client type copay crt-dt updt-dt P861 C882 PPO $20 Jan10 Aug10 INSERT INTO Policy (BK, ver_dt, client, type, copay) VALUES (‘P861’, CURRENT_DATE, ‘C882’, ‘PPO’, ‘$20’) BK ver-dt client type copay P861 Jan10 C882 HMO $15 P861 May10 C882 HMO $20 P861 Aug10 C882 PPO $20 Figure 4.3 Conventional and Basic Version Tables: A Second Update Transaction.
  20. 82 Chapter 4 THE ORIGINS OF ASSERTED VERSIONING: IT BEST PRACTICES After the transaction is applied to the version table, a third version is created. So we now know when the second version ended. It ended when the third version began. This is shown graphically, in Figure 4.3, by starting a third rectangle to show that the transaction has resulted in a third version of the object. Of course we don’t know, at this point, if or when that third version will ever end. The non-temporal table records the fact that an August 2010 update was applied. But it cannot tell us whether or not any pre- vious updates were applied or, if any were, how many of them there were. Nor can it tell us what the August update changed, or what the prior state of the policy was at any time between Jan- uary and August. The version table, on the other hand, can tell us all of these things. A Delete Transaction As shown in Figure 4.4, the result of applying the indicated delete transaction is that the row representing the policy is removed from the non-temporal table on December 2010, and that all the rows representing that policy are removed from the basic version table on that date. There remains no evidence, in either table, that policy P861 ever existed. For a modest set of business requirements, basic versioning may be all that is needed. And by beginning with basic versioning, we have been able to present an uncluttered compar- ison of a non-temporal table and a version table. An object is represented by a single row in a non-temporal table. The equiv- alent representation of that object, in a version table, is a tempo- rally contiguous set of one or more rows, all of them Dec10 Jan Jan Jan Jan Jan 2010 2011 2012 2013 2014 DELETE FROM Policy WHERE BK = ‘P861’ BK client type copay crt-dt updt-dt DELETE FROM Policy WHERE BK = ‘P861’ BK ver-dt client type copay Figure 4.4 Conventional and Basic Version Tables: A Delete Transaction.
Đồng bộ tài khoản