
2
discovery can be adapted to discover other types of data
dependencies (such as AFD). The general model of FD discovery
problem includes steps: generating a search space of FDs, verifying
the satisfaction of each FD, pruning the search space, outputting the
set of satisfied FDs and reducing redundancies in this set of satisfied
FDs. In the FD discovery problem, the key discovery is the special
case and is also an important problem in normalizing relational
databases.
The time complexity of the FD discovery problem is polynomial
in the number of tuples in the relation but is exponential in the
number of attributes of that relation. Therefore, for reducing the
processing time, effective pruning rules should be developed. Among
the proposed pruning rules, it is important to prune keys, and if a key
is discovered then it is possible to prune (delete) all sets containing
the key in the search space. However, the disadvantage of existing
key pruning rules is to find keys on the entire set of attributes of
the database (this is really a very difficult problem because the time
complexity can be exponential in number of attributes of ). So is
there any way to find keys in a proper subset of ? This question is
one of basic motivations of this thesis.
After the set of data dependencies is discovered, this set can be
very large and difficult to use because it contains unnecessary
redundancies. The important problem is how to eliminate (as much
as possible) the redundancy in the set of discovered data
dependencies. This is also a problem interested in the thesis.
Another research direction in the thesis is to focus on discovering
two typical types of RFD, namely AFD and CFD. Both AFD and
CFD have many applications and occurences in relational databases,