1 | <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> |
---|
2 | <html> |
---|
3 | <head> |
---|
4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
5 | <meta name="GENERATOR" content="Mozilla/4.76 [en] (X11; U; FreeBSD 4.3-RELEASE i386) [Netscape]"> |
---|
6 | </head> |
---|
7 | <body> |
---|
8 | |
---|
9 | <center> |
---|
10 | <h1> |
---|
11 | Security Interface for Berkeley DB</h1></center> |
---|
12 | |
---|
13 | <center><i>Susan LoVerso</i> |
---|
14 | <br><i>sue@sleepycat.com</i> |
---|
15 | <br><i>Rev 1.6</i> |
---|
16 | <br><i>2002 Feb 26</i></center> |
---|
17 | |
---|
18 | <p>We provide an interface allowing secure access to Berkeley DB. |
---|
19 | Our goal is to allow users to have encrypted secure databases. In |
---|
20 | this document, the term <i>ciphering</i> means the act of encryption or |
---|
21 | decryption. They are equal but opposite actions and the same issues |
---|
22 | apply to both just in the opposite direction. |
---|
23 | <h3> |
---|
24 | Requirements</h3> |
---|
25 | The overriding requirement is to provide a simple mechanism to allow users |
---|
26 | to have a secure database. A secure database means that all of the |
---|
27 | pages of a database will be encrypted, and all of the log files will be |
---|
28 | encrypted. |
---|
29 | <p>Falling out from this work will be a simple mechanism to allow users |
---|
30 | to request that we checksum their data for additional error detection (without |
---|
31 | encryption/decryption). |
---|
32 | <p>We expect that data in process memory or stored in shared memory, potentially |
---|
33 | backed by disk, is not encrypted or secure. |
---|
34 | <h2> |
---|
35 | <a NAME="DB Modifications"></a>DB Method Interface Modifications</h2> |
---|
36 | With a logging environment, all database changes are recorded in the log |
---|
37 | files. Therefore, users requiring secure databases in such environments |
---|
38 | also require secure log files. |
---|
39 | <p>A prior thought had been to allow different passwords on the environment |
---|
40 | and the databases within. However, such a scheme, then requires that |
---|
41 | the password be logged in order for recovery to be able to restore the |
---|
42 | database. Therefore, any application having the password for the |
---|
43 | log could get the password for any databases by reading the log. |
---|
44 | So having a different password on a database does not gain any additional |
---|
45 | security and it makes certain things harder and more complex. Some |
---|
46 | of those more complex things include the need to handle database and env |
---|
47 | passwords differently since they'd need to be stored and accessed from |
---|
48 | different places. Also resolving the issue of how <i>db_checkpoint</i> |
---|
49 | or <i>db_sync</i>, which flush database pages to disk, would find the passwords |
---|
50 | of various databases without any dbps was unsolved. The feature didn't |
---|
51 | gain anything and caused significant pain. Therefore the decision |
---|
52 | is that there will be a single password protecting an environment and all |
---|
53 | the logs and some databases within that environment. We do allow |
---|
54 | users to have a secure environment and clear databases. Users that |
---|
55 | want secure databases within a secure environment must set a flag. |
---|
56 | <p>Users wishing to enable encryption on a database in a secure environment |
---|
57 | or enable just checksumming on their database pages will use new flags |
---|
58 | to <a href="../docs/api_c/db_set_flags.html">DB->set_flags()</a>. |
---|
59 | Providing ciphering over an entire environment is accomplished by adding |
---|
60 | a single environment method: <a href="../docs/api_c/env_set_encrypt.html">DBENV->set_encrypt()</a>. |
---|
61 | Providing encryption for a database (not part of an environment) is accomplished |
---|
62 | by adding a new database method: <a href="../docs/api_c/db_set_encrypt.html">DB->set_encrypt()</a>. |
---|
63 | <p>Both of the <i>set_encrypt</i> methods must be called before their respective |
---|
64 | <i>open</i> calls. The environment method must be before the environment |
---|
65 | open because we must know about security before there is any possibility |
---|
66 | of writing any log records out. The database method must be before |
---|
67 | the database open in order to read the root page. The planned interfaces |
---|
68 | for these methods are: |
---|
69 | <pre>DBENV->set_encrypt(DBENV *dbenv, /* DB_ENV structure */ |
---|
70 | char *passwd /* Password */ |
---|
71 | u_int32_t flags); /* Flags */</pre> |
---|
72 | |
---|
73 | <pre>DB->set_encrypt(DB *dbp, /* DB structure */ |
---|
74 | char *passwd /* Password */ |
---|
75 | u_int32_t flags); /* Flags */</pre> |
---|
76 | The flags accepted by these functions are: |
---|
77 | <pre>#define DB_ENCRYPT_AES 0x00000001 /* Use the AES encryption algorithm */</pre> |
---|
78 | Passwords are NULL-terminated strings. NULL or zero length strings |
---|
79 | are illegal. These flags enable the checksumming and encryption using |
---|
80 | the particular algorithms we have chosen for this implementation. |
---|
81 | The flags are named such that there is a logical naming pattern if additional |
---|
82 | checksum or encryption algorithms are used. If a user gives a flag of zero, |
---|
83 | it will behave in a manner similar to DB_UNKNOWN. It will be illegal if |
---|
84 | they are creating the environment or database, as an algorithm must be |
---|
85 | specified. If they are joining an existing environment or opening an existing |
---|
86 | database, they will use whatever algorithm is in force at the time. |
---|
87 | Using DB_ENCRYPT_AES automatically implies SHA1 checksumming. |
---|
88 | <p>These functions will perform several initialization steps. We |
---|
89 | will allocate crypto_handle for our env handle and set up our function |
---|
90 | pointers. We will allocate space and copy the password into our env |
---|
91 | handle password area. Similar to <i>DB->set_cachesize</i>, calling |
---|
92 | <i>DB->set_encrypt</i> |
---|
93 | will actually reflect back into the local environment created by DB. |
---|
94 | <p>Lastly, we will add a new flag, DB_OVERWRITE, to the <a href="../docs/api_c/env_remove.html">DBENV->remove</a> |
---|
95 | method. The purpose of this flag is to force all of the memory used |
---|
96 | by the shared regions to be overwritten before removal. We will use |
---|
97 | <i>rm_overwrite</i>, |
---|
98 | a function that overwrites and syncs a file 3 times with varying bit patterns |
---|
99 | to really remove a file. Additionally, this flag will force a sync |
---|
100 | of the overwritten regions to disk, if the regions are backed by the file |
---|
101 | system. That way there is no residual information left in the clear |
---|
102 | in memory or freed disk blocks. Although we expect that this flag |
---|
103 | will be used by customers using security, primarily, its action is not |
---|
104 | dependent on passwords or a secure setup, and so can be used by anyone. |
---|
105 | <h4> |
---|
106 | Initialization of the Environment</h4> |
---|
107 | The setup of the security subsystem will be similar to replication initialization |
---|
108 | since it is a sort of subsystem, but it does not have its own region. |
---|
109 | When the environment handle is created via <i>db_env_create</i>, we initialize |
---|
110 | our <i>set_encrypt</i> method to be the RPC or local version. Therefore |
---|
111 | the <i>__dbenv</i> structure needs a new pointer: |
---|
112 | <pre> void *crypto_handle; /* Security handle */</pre> |
---|
113 | The crypto handle will really point to a new <i>__db_cipher</i> structure |
---|
114 | that will contain a set of functions and a pointer to the in-memory information |
---|
115 | needed by the specific encryption algorithm. It will look like: |
---|
116 | <pre>typedef struct __db_cipher { |
---|
117 | int (*init)__P((...)); /* Alg-specific initialization function */ |
---|
118 | int (*encrypt)__P((...)); /* Alg-specific encryption algorithm */ |
---|
119 | int (*decrypt)__P((...)); /* Alg-specific decryption function */ |
---|
120 | void *data; /* Pointer to alg-specific information (AES_CIPHER) */ |
---|
121 | u_int32_t flags; /* Cipher flags */ |
---|
122 | } DB_CIPHER;</pre> |
---|
123 | |
---|
124 | <pre>#define DB_MAC_KEY 20 /* Size of the MAC key */ |
---|
125 | typedef struct __aes_cipher { |
---|
126 | keyInstance encrypt_ki; /* Encrypt keyInstance temp. */ |
---|
127 | keyInstance decrypt_ki; /* Decrypt keyInstance temp. */ |
---|
128 | u_int8_t mac_key[DB_MAC_KEY]; /* MAC key */ |
---|
129 | u_int32_t flags; /* AES-specific flags */ |
---|
130 | } AES_CIPHER;</pre> |
---|
131 | It should be noted that none of these structures have their own mutex. |
---|
132 | We hold the environment region locked while we are creating this, but once |
---|
133 | this is set up, it is read-only forever. |
---|
134 | <p>During <a href="../docs/api_c/env_set_encrypt.html">dbenv->set_encrypt</a>, |
---|
135 | we set the encryption, decryption and checksumming methods to the appropriate |
---|
136 | functions based on the flags. This function will allocate us a crypto |
---|
137 | handle that we store in the <i>__dbenv</i> structure just like all the |
---|
138 | other subsystems. For now, only AES ciphering functions and SHA1 |
---|
139 | checksumming functions are supported. Also we will copy the password |
---|
140 | into the <i>__dbenv</i> structure. We ultimately need to keep the |
---|
141 | password in the environment's shared memory region or compare this one |
---|
142 | against the one that is there, if we are joining an existing environment, |
---|
143 | but we do not have it yet because open has not yet been called. We |
---|
144 | will allocate a structure that will be used in initialization and set up |
---|
145 | the function pointers to point to the algorithm-specific functions. |
---|
146 | <p>In the <i>__dbenv_open</i> path, in <i>__db_e_attach</i>, if we |
---|
147 | are creating the region and the <i>dbenv->passwd</i> field is set, we need |
---|
148 | to use the length of the password in the initial computation of the environment's |
---|
149 | size. This guarantees sufficient space for storing the password in |
---|
150 | shared memory. Then we will call a new function to initialize the |
---|
151 | security region, <i>__crypto_region_init</i> in <i>__dbenv_open</i>. |
---|
152 | If we are the creator, we will allocate space in the shared region to store |
---|
153 | the password and copy the password into that space. Or, if we are |
---|
154 | not the creator we will compare the password stored in the dbenv with the |
---|
155 | one in shared memory. Additionally, we will compare the ciphering |
---|
156 | algorithm to the one stored in the shared region.We'll smash the dbenv |
---|
157 | password and free it. If they do not match, we return an error. |
---|
158 | If we are the creator we store the offset into the REGENV structure. |
---|
159 | Then <i>__crypto_region_init </i> will call the initialization function |
---|
160 | set up earlier based on the ciphering algorithm specified. For now |
---|
161 | we will call <i>__aes_init</i>. Additionally this function will allocate |
---|
162 | and set up the per-process state vector for this encryption's IVs. |
---|
163 | See <a href="#Generating the Initialization Vector">Generating the Initialization |
---|
164 | Vector</a> for a detailed description of the IV and state vector. |
---|
165 | <p>In the AES-specific initialization function, <i>__aes_init</i>, |
---|
166 | we will initialize it by calling |
---|
167 | <i>__aes_derivekeys</i> in order to fill |
---|
168 | in the keyInstance and mac_key fields in that structure. The REGENV |
---|
169 | structure will have one additional item |
---|
170 | <pre> roff_t passwd_off; /* Offset of passwd */</pre> |
---|
171 | |
---|
172 | <h4> |
---|
173 | Initializing a Database</h4> |
---|
174 | During <a href="../docs/api_c/db_set_encrypt.html">db->set_encrypt</a>, |
---|
175 | we set the encryption, decryption and checksumming methods to the appropriate |
---|
176 | functions based on the flags. Basically, we test that we are not |
---|
177 | in an existing environment and we haven't called open. Then we just |
---|
178 | call through the environment handle to set the password. |
---|
179 | <p>Also, we will need to add a flag in the database meta-data page that |
---|
180 | indicates that the database is encrypted and what its algorithm is. |
---|
181 | This will be used when the meta-page is read after reopening a file. We |
---|
182 | need this information on the meta-page in order to detect a user opening |
---|
183 | a secure database without a password. I propose using the first unused1 |
---|
184 | byte (renaming it too) in the meta page for this purpose. |
---|
185 | <p>All pages will not be encrypted for the first 64 bytes of data. |
---|
186 | Database meta-pages will be encrypted on the first 512 bytes only. |
---|
187 | All meta-page types will have an IV and checksum added within the first |
---|
188 | 512 bytes as well as a crypto magic number. This will expand the |
---|
189 | size of the meta-page from 256 bytes to 512 bytes. The page in/out routines, |
---|
190 | <i>__db_pgin</i> and <i>__db_pgout</i> know the page type of the page and |
---|
191 | will apply the 512 bytes ciphering to meta pages. In <i>__db_pgout</i>, |
---|
192 | if we have a crypto handle in our (private) environment, we will apply |
---|
193 | ciphering to either the entire page, or the first 512 bytes if it is a |
---|
194 | meta-page. In <i>__db_pgin</i>, we will decrypt if the page we have |
---|
195 | a crypto handle. |
---|
196 | <p>When multiple processes share a database, all must use the same password |
---|
197 | as the database creator. Using an existing database requires several conditions |
---|
198 | to be true. First, if the creator of the database did not create |
---|
199 | with security, then opening later with security is an error. Second, |
---|
200 | if the creator did create it with security, then opening later without |
---|
201 | security is an error. Third, we need to be able to test and check |
---|
202 | that when another process opens a secure database that the password they |
---|
203 | provided is the same as the one in use by the creator. |
---|
204 | <p>When reading the meta-page, in <i>__db_file_setup</i>, we do not go |
---|
205 | through the paging functions, but directly read via <i>__os_read</i>. |
---|
206 | It is at this point that we will determine if the user is configured correctly. |
---|
207 | If the meta-page we read has an IV and checksum, they better have a crypto |
---|
208 | handle. If they have a crypto handle, then the meta-page must have |
---|
209 | an IV and checksum. If both of those are true, we test the password. |
---|
210 | We compare the unencrypted magic number to the newly-decrypted crypto magic |
---|
211 | number and if they are not the same, then we report that the user gave |
---|
212 | us a bad password. |
---|
213 | <p>On a mostly unrelated topic, even when we go to very large pagesizes, |
---|
214 | the meta information will still be within a disk sector. So, after |
---|
215 | talking it over with Keith and Margo, we determined that unencrypted meta-pages |
---|
216 | still will not need a checksum. |
---|
217 | <h3> |
---|
218 | Encryption and Checksum Routines</h3> |
---|
219 | These routines are provided to us by Adam Stubblefield at Rice University |
---|
220 | (astubble@rice.edu). The functional interfaces are: |
---|
221 | <pre>__aes_derivekeys(DB_ENV *dbenv, /* dbenv */ |
---|
222 | u_int8_t *passwd, /* Password */ |
---|
223 | size_t passwd_len, /* Length of passwd */ |
---|
224 | u_int8_t *mac_key, /* 20 byte array to store MAC key */ |
---|
225 | keyInstance *encrypt_key, /* Encryption key of passwd */ |
---|
226 | keyInstance *decrypt_key); /* Decryption key of passwd */</pre> |
---|
227 | This is the only function requiring the textual user password. From |
---|
228 | the password, this function generates a key used in the checksum function, |
---|
229 | <i>__db_chksum</i>. |
---|
230 | It also fills in <i>keyInstance</i> structures which are then used in the |
---|
231 | encryption and decryption routines. The keyInstance structures must |
---|
232 | already be allocated. These will be stored in the AES_CIPHER structure. |
---|
233 | <pre> __db_chksum(u_int8_t *data, /* Data to checksum */ |
---|
234 | size_t data_len, /* Length of data */ |
---|
235 | u_int8_t *mac_key, /* 20 byte array from __db_derive_keys */ |
---|
236 | u_int8_t *checksum); /* 20 byte array to store checksum */</pre> |
---|
237 | This function generates a checksum on the data given. This function |
---|
238 | will do double-duty for users that simply want error detection on their |
---|
239 | pages. When users are using encryption, the <i>mac_key </i>will contain |
---|
240 | the 20-byte key set up in <i>__aes_derivekeys</i>. If they just want |
---|
241 | checksumming, then <i>mac_key</i> will be NULL. According to Adam, |
---|
242 | we can safely use the first N-bytes of the checksum. So for seeding |
---|
243 | the generator for initialization vectors, we'll hash the time and then |
---|
244 | send in the first 4 bytes for the seed. I believe we can probably |
---|
245 | do the same thing for checksumming log records. We can only use 4 |
---|
246 | bytes for the checksum in the non-secure case. So when we want to |
---|
247 | verify the log checksum we can compute the mac but just compare the first |
---|
248 | 4 bytes to the one we read. All locations where we generate or check |
---|
249 | log record checksums that currently call <i>__ham_func4</i> will now call |
---|
250 | <i>__db_chksum</i>. |
---|
251 | I believe there are 5 such locations, |
---|
252 | <i>__log_put, __log_putr, __log_newfile, |
---|
253 | __log_rep_put |
---|
254 | </i>and<i> __txn_force_abort.</i> |
---|
255 | <pre>__aes_encrypt(DB_ENV *dbenv, /* dbenv */ |
---|
256 | keyInstance *key, /* Password key instance from __db_derive_keys */ |
---|
257 | u_int8_t *iv, /* Initialization vector */ |
---|
258 | u_int8_t *data, /* Data to encrypt */ |
---|
259 | size_t data_len); /* Length of data to encrypt - 16 byte multiple */</pre> |
---|
260 | This is the function to encrypt data. It will be called to encrypt |
---|
261 | pages and log records. The <i>key</i> instance is initialized in |
---|
262 | <i>__aes_derivekeys</i>. |
---|
263 | The initialization vector, <i>iv</i>, is the 16 byte random value set up |
---|
264 | by the Mersenne Twister pseudo-random generator. Lastly, we pass |
---|
265 | in a pointer to the <i>data</i> to encrypt and its length in <i>data_len</i>. |
---|
266 | The <i>data_len</i> must be a multiple of 16 bytes. The encryption is done |
---|
267 | in-place so that when the encryption code returns our encrypted data is |
---|
268 | in the same location as the original data. |
---|
269 | <pre>__aes_decrypt(DB_ENV *dbenv, /* dbenv */ |
---|
270 | keyInstance *key, /* Password key instance from __db_derive_keys */ |
---|
271 | u_int8_t *iv, /* Initialization vector */ |
---|
272 | u_int8_t *data, /* Data to decrypt */ |
---|
273 | size_t data_len); /* Length of data to decrypt - 16 byte multiple */</pre> |
---|
274 | This is the function to decrypt the data. It is exactly the same |
---|
275 | as the encryption function except for the action it performs. All |
---|
276 | of the args and issues are the same. It also decrypts in place. |
---|
277 | <h3> |
---|
278 | <a NAME="Generating the Initialization Vector"></a>Generating the Initialization |
---|
279 | Vector</h3> |
---|
280 | Internally, we need to provide a unique initialization vector (IV) of 16 |
---|
281 | bytes every time we encrypt any data with the same password. For |
---|
282 | the IV we are planning on using mt19937, the Mersenne Twister, a random |
---|
283 | number generator that has a period of 2**19937-1. This package can be found |
---|
284 | at <a href="http://www.math.keio.ac.jp/~matumoto/emt.html">http://www.math.keio.ac.jp/~matumoto/emt.html</a>. |
---|
285 | Tests show that although it repeats a single integer every once in a while, |
---|
286 | that after several million iterations, it doesn't repeat any 4 integers |
---|
287 | that we'd be stuffing into our 16-byte IV. We plan on seeding this |
---|
288 | generator with the time (tv_sec) hashed through SHA1 when we create the |
---|
289 | environment. This package uses a global state vector that contains |
---|
290 | 624 unsigned long integers. We do not allow a 16-byte IV of zero. |
---|
291 | It is simpler just to reject any 4-byte value of 0 and if we get one, just |
---|
292 | call the generator again and get a different number. We need to detect |
---|
293 | holes in files and if we read an IV of zero that is a simple indication |
---|
294 | that we need to check for an entire page of zero. The IVs are stored |
---|
295 | on the page after encryption and are not encrypted themselves so it is |
---|
296 | not possible for an entire encrypted page to be read as all zeroes, unless |
---|
297 | it was a hole in a file. See <a href="#Holes in Files">Holes in Files</a> |
---|
298 | for more details. |
---|
299 | <p>We will not be holding any locks when we need to generate our IV but |
---|
300 | we need to protect access to the state vector and the index. Calls |
---|
301 | to the MT code will come while encrypting some data in <i>__aes_encrypt.</i> |
---|
302 | The MT code will assume that all necessary locks are held in the caller. |
---|
303 | We will have per-process state vectors that are set up when a process begins. |
---|
304 | That way we minimize the contention and only multi-threaded processes need |
---|
305 | acquire locks for the IV. We will have the state vector in the environment |
---|
306 | handle in heap memory, as well as the index and there will be a mutex protecting |
---|
307 | it for threaded access. This will be added to the <i>__dbenv</i> |
---|
308 | structure: |
---|
309 | <pre> DB_MUTEX *mt_mutexp; /* Mersenne Twister mutex */ |
---|
310 | int *mti; /* MT index */ |
---|
311 | u_long *mt; /* MT state vector */</pre> |
---|
312 | This portion of the environment will be initialized at the end of _<i>_dbenv_open</i>, |
---|
313 | right after we initialize the other mutex for the <i>dblist</i>. When we |
---|
314 | allocate the space, we will generate our initial state vector. If we are |
---|
315 | multi-threaded we'll allocate and initialize our mutex also. |
---|
316 | <p>We need to make changes to the MT code to make it work in our namespace |
---|
317 | and to take a pointer to the location of the state vector and |
---|
318 | the index. There will be a wrapper function <i>__db_generate_iv</i> |
---|
319 | that DB will call and it will call the appropriate MT function. I |
---|
320 | am also going to change the default seed to use a hashed time instead of |
---|
321 | a hard coded value. I have looked at other implementations of the |
---|
322 | MT code available on the web site. The C++ version does a hash on |
---|
323 | the current time. I will modify our MT code to seed with the hashed |
---|
324 | time as well. That way the code to seed is contained within the MT |
---|
325 | code and we can just write the wrapper to get an IV. We will not |
---|
326 | be changing the core computational code of MT. |
---|
327 | <h2> |
---|
328 | DB Internal Issues</h2> |
---|
329 | |
---|
330 | <h4> |
---|
331 | When do we Cipher?</h4> |
---|
332 | All of the page ciphering is done in the <i>__db_pgin/__db_pgout</i> functions. |
---|
333 | We will encrypt after the method-specific function on page-out and decrypt |
---|
334 | before the method-specfic function on page-in. We do not hold any |
---|
335 | locks when entering these functions. We determine that we need to |
---|
336 | cipher based on the existence of the encryption flag in the dbp. |
---|
337 | <p>For ciphering log records, the encryption will be done as the first |
---|
338 | thing (or a new wrapper) in <i>__log_put. </i>See <a href="#Log Record Encryption">Log |
---|
339 | Record Encryption</a> for those details. |
---|
340 | <br> |
---|
341 | <h4> |
---|
342 | Page Changes</h4> |
---|
343 | The checksum and IV values will be stored prior to the first index of the |
---|
344 | page. We have a new P_INP macro that replaces use of inp[X] in the |
---|
345 | code. This macro takes a dbp as an argument and determines where |
---|
346 | our first index is based on whether we have DB_AM_CHKSUM and DB_AM_ENCRYPT |
---|
347 | set. If neither is set, then our first index is where it always was. |
---|
348 | If just checksumming is set, then we reserve a 4-byte checksum. |
---|
349 | If encryption is set, then we reserve 36 bytes for our checksum/IV as well |
---|
350 | as some space to get proper alignment to encrypt on a 16-byte boundary. |
---|
351 | <p>Since several paging macros use inp[X] in them, those macros must now |
---|
352 | take a dbp. There are a lot of changes to make all the necessary |
---|
353 | paging macros take a dbp, although these changes are trivial in nature. |
---|
354 | <p>Also, there is a new function <i>__db_chk_meta</i> to perform checksumming |
---|
355 | and decryption checking on meta pages specifically. This function |
---|
356 | is where we check that the database algorithm matches what the user gave |
---|
357 | (or if they set DB_CIPHER_ANY then we set it), and other encryption related |
---|
358 | testing for bad combinations of what is in the file versus what is in the |
---|
359 | user structures. |
---|
360 | <h4> |
---|
361 | Verification</h4> |
---|
362 | The verification code will also need to be updated to deal with secure |
---|
363 | pages. Basically when the verification code reads in the meta page |
---|
364 | it will call <i>__db_chk_meta</i> to perform any checksumming and decryption. |
---|
365 | <h4> |
---|
366 | <a NAME="Holes in Files"></a>Holes in Files</h4> |
---|
367 | Holes in files will be dealt with rather simply. We need to be able |
---|
368 | to distinguish reading a hole in a file from an encrypted page that happened |
---|
369 | to encrypt to all zero's. If we read a hole in a file, we do not |
---|
370 | want to send that empty page through the decryption routine. This |
---|
371 | can be determined simply without incurring the performance penalty of comparing |
---|
372 | every byte on a page on every read until we get a non-zero byte. |
---|
373 | <br>The __db_pgin function is only given an invalid page P_INVALID in this |
---|
374 | case. So, if the page type, which is always unencrypted, is |
---|
375 | P_INVALID, then we do not perform any checksum verification or decryption. |
---|
376 | <h4> |
---|
377 | Errors and Recovery</h4> |
---|
378 | Dealing with a checksum error is tricky. Ultimately, if a checksum |
---|
379 | error occurs it is extremely likely that the user must do catastrophic |
---|
380 | recovery. There is no other failure return other than DB_RUNRECOVERY |
---|
381 | for indicating that the user should run catastrophic recovery. We |
---|
382 | do not want to add a new error return for applications to check because |
---|
383 | a lot of applications already look for and deal with DB_RUNRECOVERY as |
---|
384 | an error condition and we want to fit ourselves into that application model. |
---|
385 | We already indicate to the user that when they get that error, then they |
---|
386 | need to run recovery. If recovery fails, then they need to run catastrophic |
---|
387 | recovery. We need to get ourselves to the point where users will |
---|
388 | run catastrophic recovery. |
---|
389 | <p>If we get a checksum error, then we need to log a message stating a |
---|
390 | checksum error occurred on page N. In <i>__db_pgin</i>, we can check |
---|
391 | if logging is on in the environment. If so, we want to log the message. |
---|
392 | <p>When the application gets the DB_RUNRECOVERY error, they'll have to |
---|
393 | shut down their application and run recovery. When the recovery encounters |
---|
394 | the record indicating checksum failure, then normal recovery will fail |
---|
395 | and the user will have to perform catastrophic recovery. When catastrophic |
---|
396 | recovery encounters that record, it will simply ignore it. |
---|
397 | <h4> |
---|
398 | <a NAME="Log Record Encryption"></a>Log Record Encryption</h4> |
---|
399 | Log records will be ciphered. It might make sense to wrap <i>__log_put</i> |
---|
400 | to encrypt the DBT we send down. The <i>__log_put </i>function is |
---|
401 | where the checksum is computed before acquiring the region lock. |
---|
402 | But also this function is where we call <i>__rep_send_message</i> to send |
---|
403 | the DBT to the replication clients. Therefore, we need the DBT to |
---|
404 | be encrypted prior to there. We also need it encrypted before checksumming. |
---|
405 | I think <i>__log_put </i>will become <i>__log_put_internal</i>, and the |
---|
406 | new <i>__log_put</i> will encrypt if needed and then call <i>__log_put_internal |
---|
407 | </i>(the |
---|
408 | function formerly known as <i>__log_put</i>). Log records are kept |
---|
409 | in a shared memory region buffer prior to going out to disk. Records |
---|
410 | in the buffer will be encrypted. No locks are held at the time we |
---|
411 | will need to encrypt. |
---|
412 | <p>On reading the log, via log cursors, the log code stores log records |
---|
413 | in the log buffer. Records in that buffer will be encrypted, so decryption |
---|
414 | will occur no matter whether we are returning records from the buffer or |
---|
415 | if we are returning log records directly from the disk. Current checksum |
---|
416 | checking is done in |
---|
417 | <i>__log_get_c_int.</i> Decryption will be done |
---|
418 | after the checksum is checked. |
---|
419 | <p>There are currently two nasty issues with encrypted log records. |
---|
420 | The first is that <i>__txn_force_abort</i> overwrites a commit record in |
---|
421 | the log buffer with an abort record. Well, our log buffer will be |
---|
422 | encrypted. Therefore, <i>__txn_force_abort</i> is going to need to |
---|
423 | do encryption of its new record. This can be accomplished by sending |
---|
424 | in the dbenv handle to the function. It is available to us in <i>__log_flush_commit</i> |
---|
425 | and we can just pass it in. I don't like putting log encryption in |
---|
426 | the txn code, but the layering violation is already there. |
---|
427 | <p>The second issue is that the encryption code requires data that is a |
---|
428 | multiple of 16 bytes and log record lengths are variable. We will |
---|
429 | need to pad log records to meet the requirement. Since the callers |
---|
430 | of <i>__log_put</i> set up the given DBT it is a logical place to pad if |
---|
431 | necessary. We will modify the gen_rec.awk script to have all of the generated |
---|
432 | logging functions pad for us if we have a crypto handle. This padding will |
---|
433 | also expand the size of log files. Anyone calling <i>log_put</i> and using |
---|
434 | security from the application will have to pad on their own or it will |
---|
435 | return an error. |
---|
436 | <p>When ciphering the log file, we will need a different header than the |
---|
437 | current one. The current header only has space for a 4 byte checksum. |
---|
438 | Our secure header will need space for the 16 byte IV and 20 byte checksum. |
---|
439 | This will blow up our log files when running securely since every single |
---|
440 | log record header will now consume 32 additional bytes. I believe |
---|
441 | that the log header does not need to be encrypted. It contains an |
---|
442 | offset, a length and our IV and checksum. Our IV and checksum are |
---|
443 | never encrypted. I don't believe there to be any risk in having the |
---|
444 | offset and length in the clear. |
---|
445 | <p>I would prefer not to have two types of log headers that are incompatible |
---|
446 | with each other. It is not acceptable to increase the log headers |
---|
447 | of all users from 12 bytes to 44 bytes. Such a change would also |
---|
448 | make log files incompatible with earlier releases. Worse even, is |
---|
449 | that the <i>cksum</i> field of the header is in between the offset and |
---|
450 | len. It would be really convenient if we could have just made a bigger |
---|
451 | cksum portion without affecting the location of the other fields. |
---|
452 | Oh well. Most customers will not be using encryption and we won't |
---|
453 | make them pay the price of the expanded header. Keith indicates that |
---|
454 | the log file format is changing with the next release so I will move the |
---|
455 | cksum field so it can at least be overlaid. |
---|
456 | <p>One method around this would be to have a single internal header that |
---|
457 | contains all the information both mechanisms need, but when we write out |
---|
458 | the header we choose which pieces to write. By appending the security |
---|
459 | information to the end of the existing structure, and adding a size field, |
---|
460 | we can modify a few places to use the size field to write out only the |
---|
461 | current first 12 bytes, or the entire security header needed. |
---|
462 | <h4> |
---|
463 | Replication</h4> |
---|
464 | Replication clients are going to need to start all of their individual |
---|
465 | environment handles with the same password. The log records are going |
---|
466 | to be sent to the clients decrypted and the clients will have to encrypt |
---|
467 | them on their way to the client log files. We cannot send encrypted |
---|
468 | log records to clients. The reason is that the checksum and IV are |
---|
469 | stored in the log header and the master only sends the log record itself |
---|
470 | to the client. Therefore, the client has no way to decrypt a log |
---|
471 | record from the master. Therefore, anyone wanting to use truly secure |
---|
472 | replication is going to have to have a secure transport mechanism. |
---|
473 | By not encrypting records, clients can theoretically have different passwords |
---|
474 | and DB won't care. |
---|
475 | <p>On the master side we must copy the DBT sent in. We encrypt the |
---|
476 | original and send to clients the clear record. On the client side, |
---|
477 | support for encryption is added into <i>__log_rep_put</i>. |
---|
478 | <h4> |
---|
479 | Sharing the Environment</h4> |
---|
480 | When multiple processes join the environment, all must use the same password |
---|
481 | as the creator. |
---|
482 | <p>Joining an existing environment requires several conditions to be true. |
---|
483 | First, if the creator of the environment did not create with security, |
---|
484 | then joining later with security is an error. Second, if the creator |
---|
485 | did create it with security, then joining later without security is an |
---|
486 | error. Third, we need to be able to test and check that when another |
---|
487 | process joins a secure environment that the password they provided is the |
---|
488 | same as the one in use by the creator. |
---|
489 | <p>The first two scenarios should be fairly trivial to determine, if we |
---|
490 | aren't creating the environment, we can compare what is there with what |
---|
491 | we have. In the third case, the <i>__crypto_region_init</i> function |
---|
492 | will see that the environment region has a valid passwd_off and we'll then |
---|
493 | compare that password to the one we have in our dbenv handle. In |
---|
494 | any case we'll smash the dbenv handle's passwd and free that memory before |
---|
495 | returning whether we have a password match or not. |
---|
496 | <p>We need to store the passwords themselves in the region because multiple |
---|
497 | calls to the <i>__aes_derivekeys </i>function with the same password yields |
---|
498 | different keyInstance contents. Therefore we don't have any way to |
---|
499 | check passwords other than retaining and comparing the actual passwords. |
---|
500 | <h4> |
---|
501 | Other APIs</h4> |
---|
502 | All of the other APIs will need interface enhancements to support the new |
---|
503 | security methods. The Java and C++ interfaces will likely be done |
---|
504 | by Michael Cahill and Sue will implement the Tcl and RPC changes. |
---|
505 | Tcl will need the changes for testing purposes but the interface should |
---|
506 | be public, not test-only. RPC should fully support security. |
---|
507 | The biggest risk that I can see is that the client will send the password |
---|
508 | to the server in the clear. Anyone sniffing the wires or running |
---|
509 | tcpdump or other packet grabbing code could grab that. Someone really |
---|
510 | interested in using security over RPC probably ought to add authentication |
---|
511 | and other measures to the RPC server as well. |
---|
512 | <h4> |
---|
513 | <a NAME="Utilities"></a>Utilities</h4> |
---|
514 | All should take a -P flag to specify a password for the environment or |
---|
515 | password. Those that take an env and a database might need something |
---|
516 | more to distinguish between env passwds and db passwds. Here is what we |
---|
517 | do for each utility: |
---|
518 | <ul> |
---|
519 | <li> |
---|
520 | berkeley_db_svc - Needs -P after each -h specified.</li> |
---|
521 | |
---|
522 | <li> |
---|
523 | db_archive - Needs -P if the env is encrypted.</li> |
---|
524 | |
---|
525 | <li> |
---|
526 | db_checkpoint - Needs -P if the env is encrypted.</li> |
---|
527 | |
---|
528 | <li> |
---|
529 | db_deadlock - No changes</li> |
---|
530 | |
---|
531 | <li> |
---|
532 | db_dump - Needs -P if the env or database is encrypted.</li> |
---|
533 | |
---|
534 | <li> |
---|
535 | db_load - Needs -P if the env or database is encrypted.</li> |
---|
536 | |
---|
537 | <li> |
---|
538 | db_printlog - Needs -P if the env is encrypted.</li> |
---|
539 | |
---|
540 | <li> |
---|
541 | db_recover - Needs -P if the env is encrypted.</li> |
---|
542 | |
---|
543 | <li> |
---|
544 | db_stat - Needs -P if the env or database is encrypted.</li> |
---|
545 | |
---|
546 | <li> |
---|
547 | db_upgrade - Needs -P if the env or database is encrypted.</li> |
---|
548 | |
---|
549 | <li> |
---|
550 | db_verify - Needs -P if the env or database is encrypted.</li> |
---|
551 | </ul> |
---|
552 | |
---|
553 | <h2> |
---|
554 | Testing</h2> |
---|
555 | All testing should be able to be accomplished via Tcl. The following |
---|
556 | tests (and probably others I haven't thought of yet) should be performed: |
---|
557 | <ul> |
---|
558 | <li> |
---|
559 | Basic functionality - basically a test001 but encrypted without an env</li> |
---|
560 | |
---|
561 | <li> |
---|
562 | Basic functionality, w/ env - like the previous test but with an env.</li> |
---|
563 | |
---|
564 | <li> |
---|
565 | Basic functionality, multiple processes - like first test, but make sure |
---|
566 | others can correctly join.</li> |
---|
567 | |
---|
568 | <li> |
---|
569 | Basic functionality, mult. processes - like above test, but initialize/close |
---|
570 | environment/database first so that the next test processes are all joiners |
---|
571 | of an existing env, but creator no longer exists and the shared region |
---|
572 | must be opened.</li> |
---|
573 | |
---|
574 | <li> |
---|
575 | Recovery test - Run recovery over an encrypted environment.</li> |
---|
576 | |
---|
577 | <li> |
---|
578 | Subdb test - Run with subdbs that are encrypted.</li> |
---|
579 | |
---|
580 | <li> |
---|
581 | Utility test - Verify the new options to all the utilities.</li> |
---|
582 | |
---|
583 | <li> |
---|
584 | Error handling - Test the basic setup errors for both env's and databases |
---|
585 | with multiple processes. They are:</li> |
---|
586 | |
---|
587 | <ol> |
---|
588 | <li> |
---|
589 | Attempt to set a NULL or zero-length passwd.</li> |
---|
590 | |
---|
591 | <li> |
---|
592 | Create Env w/ security and attempt to create database w/ its own password.</li> |
---|
593 | |
---|
594 | <li> |
---|
595 | Env/DB creates with security. Proc2 joins without - should get an |
---|
596 | error.</li> |
---|
597 | |
---|
598 | <li> |
---|
599 | Env/DB creates without security. Proc2 joins with - should get an |
---|
600 | error.</li> |
---|
601 | |
---|
602 | <li> |
---|
603 | Env/DB creates with security. Proc2 joins with different password |
---|
604 | - should get an error.</li> |
---|
605 | |
---|
606 | <li> |
---|
607 | Env/DB creates with security. Closes. Proc2 reopens with different |
---|
608 | password - should get an error.</li> |
---|
609 | |
---|
610 | <li> |
---|
611 | Env/DB creates with security. Closes. Tcl overwrites a page |
---|
612 | of the database with garbage. Proc2 reopens with the correct password. |
---|
613 | Code should detect checksum error.</li> |
---|
614 | |
---|
615 | <li> |
---|
616 | Env/DB creates with security. Open a 2nd identical DB with a different |
---|
617 | password. Put the exact same data into both databases. Close. |
---|
618 | Overwrite the identical page of DB1 with the one from DB2. Reopen |
---|
619 | the database with correct DB1 password. Code should detect an encryption |
---|
620 | error on that page.</li> |
---|
621 | </ol> |
---|
622 | </ul> |
---|
623 | |
---|
624 | <h2> |
---|
625 | Risks</h2> |
---|
626 | There are several holes in this design. It is important to document |
---|
627 | them clearly. |
---|
628 | <p>The first is that all of the pages are stored in memory and possibly |
---|
629 | the file system in the clear. The password is stored in the shared |
---|
630 | data regions in the clear. Therefore if an attacker can read the |
---|
631 | process memory, they can do whatever they want. If the attacker can |
---|
632 | read system memory or swap they can access the data as well. Since |
---|
633 | everything in the shared data regions (with the exception of the buffered |
---|
634 | log) will be in the clear, it is important to realize that file backed |
---|
635 | regions will be written in the clear, including the portion of the regions |
---|
636 | containing passwords. We recommend to users that they use system |
---|
637 | memory instead of file backed shared memory. |
---|
638 | </body> |
---|
639 | </html> |
---|