Security Methods for Statistical Databases

Introduction

are often used for research

§ Statistical Databases containing medical information

the privacy of the patient

§ Some of the data is protected by laws to help protect

§ Proper security precautions must be implemented to comply with laws and respect the sensitivity of the data

Accuracy vs. Confidentiality

Confidentiality –

Accuracy –

Patients, laws

Researchers want to extract accurate and meaningful data

and database administrators want to maintain the privacy of patients and the confidentiality of their information

Laws

§ Health Insurance Portability and Accountability Act

– HIPAA (Privacy Rule)

§ Covered organizations must comply by April 14, 2003

§ Designed to improve efficiency of healthcare system by using electronic exchange of data and maintaining security

§ Covered entities (health plans, healthcare clearinghouses, healthcare providers) may not use or disclose protected information except as permitted or required

§ Privacy Rule establishes a “minimum necessary standard” for the purpose of making covered entities evaluate their current regulations and security precautions

HIPAA Compliance

covered entities

§ Companies offer 3rd Party Certification of

associating companies HIPAA

§ Such companies will check your company and for compliance with

rapid

implementation and

compliance to HIPAA regulations

§ Can help with

Types of Statistical Databases

§ Static – a static

§ Dynamic – changes

continuously to reflect real-time data

database is made once and never changes

§ Example: most online research databases § Example: U.S. Census

Security Methods

§ Access Restriction

§ Query Set Restriction

§ Microaggregation

§ Data Perturbation

§ Output Perturbation

§ Auditing

§ Random Sampling

Access Restriction

§ Databases normally have different access

levels for different types of users

§ User ID and passwords are the most common

methods for restricting access

§

In a medical database:

§ Doctors/Healthcare Representative – full access to information

§ Researchers – only access to partial information (e.g. aggregate information)

Query Set Restriction

of records that must be in the result set

§ A query-set size control can limit the number

§ Allows the query results to be displayed only if the size of the query set satisfies the condition

§ Setting a minimum query-set size can help protect against the disclosure of individual data

Query Set Restriction

§ Let K represents the minimum number or records to be present for the query set

§ Let R represents the size of the query set

§ The query set can only be displayed if

K (cid:0)

R

Query Set Restriction

Query 2

Query 1

Original Database

Query Results

K

Query 2 Results

K

Query Results

Query 1 Results

Microaggregation

before publication

§ Raw (individual) data is grouped into small aggregates

the individual

§ The average value of the group replaces each value of

maintain data accuracy

§ Data with the most similarities are grouped together to

§ Helps to prevent disclosure of individual data

Microaggregation

§ National Agricultural Statistics Service (NASS)

publishes data about farms

§ To protect against data disclosure, data is only

released at the county level

§ Farms in each county are averaged together to maintain as much purity, yet still protect against disclosure

Microaggregation

Age

Microaggregated Age

10

11.67

Average

12

11.67

13

11.67

57

56.67

54

Average

56.67

59

56.67

Microaggregation

User

Averaged

Original Data

Microaggregated Data

Data Perturbation

§ Perturbed data is raw data with noise added

accessed, the true value is not disclosed

§ Pro: With perturbed databases, if unauthorized data is

data

§ Con: Data perturbation runs the risk of presenting biased

Data Perturbation

User 1

Noise Added

Original Database

Perturbed Database

User 2

Output Perturbation

Instead of the raw data being transformed as in Data Perturbation, only the output or query results are perturbed

§

is

less severe

than with data

perturbation

§ The bias problem

Output Perturbation

Query

User 1

Results

Noise Added to Results

Original Database

Query

Results

User 2

Auditing

§ Auditing is the process of keeping track of all queries made by

each user

§ Usually done with up-to-date logs

§ Each time a user issues a query, the log is checked to see if the

user is querying the database maliciously

Random Sampling

of the query are shown

§ Only a sample of the records meeting the requirements

to the same query

§ Must maintain consistency by giving exact same results

different query set

§ Weakness - Logical equivalent queries can result in a

Comparison Methods

The following criteria are used to determine the most effective methods of statistical database security:

§ Security – possibility of exact disclosure, partial

disclosure, robustness

§ Richness of Information – amount of non-confidential

information eliminated, bias, precision, consistency

§ Costs – initial implementation cost, processing

overhead per query, user education

A Comparison of Methods

Method

Security

Costs

Richness of Information

Query-set Restriction

Low

Low1

Low

Microaggregation

Moderate

Moderate

Moderate

Data Perturbation

High

High-Moderate

Low

Output Perturbation

Moderate

Moderate-low

Low

Auditing

Moderate-Low

Moderate

High

Sampling

Moderate

Moderate-Low

Moderate

1 Quality is low because a lot of information can be eliminated if the query does not meet the requirements

Sources

http://www.cs.jmu.edu/users/aboutams

§ This presentation is posted on

http://delivery.acm.org/10.1145/80000/76895/p515-adam.pdf?key1=76895&key2=1947043301&coll=portal&dl=ACM&CFID=4702747&CFTOKEN=83773110 )

§ Adam, Nabil R. ; Wortmann, John C.; Security- Control Methods for Statistical Databases: A Comparative Study; ACM Computing Surveys, Vol. 21, No. 4, December 1989 (

§ Official HIPAA – (http://cms.hhs.gov/hipaa/) incur

BioTech/Pharma Research: Rules of the Road (

http://www.privacyassociation.org/docs/3-02bernstein.pdf)

§ Bernstein, Stephen W.; Impact of HIPAA on

http://hipaatesting.com/service_bureau.html)

§ Service Bureau; 3rd Party Testing (