Managing Directory Data

This chapter covers management of LDAP Data Interchange Format (LDIF) data. In this chapter you will learn to:

Generate test LDIF data
Import and export LDIF data
Perform searches and modifications on LDIF files with command-line tools
Create and manage database backends to house directory data imported from LDIF
Delete database backends

LDIF provides a mechanism for representing directory data in text format. LDIF data is typically used to initialize directory databases, but also may be used to move data between different directories that cannot replicate directly, or even as an alternative backup format.

Generating Test Data

When you install OpenDJ, you have the option of importing sample data that is generated during the installation. This procedure demonstrates how to generate LDIF by using the make-ldif command, described in makeldif(1) in the Reference.

To Generate Test LDIF Data

The make-ldif command uses templates to provide sample data. Default templates are located in the /path/to/opendj/config/MakeLDIF/ directory. The example.template file can be used to create a suffix with entries of the type inetOrgPerson. You can do the equivalent in OpenDJ control panel (Directory Data > New Base DN… > Import Automatically Generated Example Data).

Write a file to act as the template for your generated LDIF.

The resulting test data template depends on what data you expect to encounter in production. Base your work on your knowledge of the production data, and on the sample template, /path/to/opendj/config/MakeLDIF/example.template, and associated data.

See make-ldif.template(5) in the Reference for reference information about template files.
Create additional data files for the content in your template to be selected randomly from a file, rather than generated by an expression.

Additional data files are located in the same directory as your template file.
Decide whether you want to generate the same test data each time you run the make-ldif command with your template.

If so, provide the same randomSeed integer each time you run the command.
Before generating a very large LDIF file, make sure you have enough space on disk.

Run the make-ldif command to generate your LDIF file:

$ make-ldif \
 --randomSeed 0 \
 --templateFile /path/to/my.template \
 --ldifFile /path/to/generated.ldif
Processed 1000 entries
Processed 2000 entries
...
Processed 10000 entries
LDIF processing complete.  10003 entries written

Importing and Exporting Data

You can use OpenDJ control panel to import data (Directory Data > Import LDIF) and to export data (Directory Data > Export LDIF). The following procedures demonstrate how to use the import-ldif and export-ldif commands, described in import-ldif(1) in the Reference and export-ldif(1) in the Reference.

To Import LDIF Data

The most efficient method of importing LDIF data is to take the OpenDJ server offline. Alternatively, you can schedule a task to import the data while the server is online.

Importing from LDIF overwrites all data in the target backend with entries from the LDIF data.

(Optional) If you do not want to use the default userRoot backend, create a new backend for your data.

See "Creating a New Database Backend" for details.
The following example imports dc=example,dc=org data into the userRoot backend, overwriting existing data:
- If you want to speed up the process—for example because you have millions of directory entries to import—first shut down the server, and then run the import-ldif command:
  $ stop-ds $ import-ldif \ --offline \ --includeBranch dc=example,dc=org \ --backendID userRoot \ --ldifFile /path/to/generated.ldif
- If not, schedule a task to import the data while online:
  $ import-ldif \ --port 4444 \ --hostname opendj.example.com \ --bindDN "cn=Directory Manager" \ --bindPassword password \ --includeBranch dc=example,dc=org \ --backendID userRoot \ --ldifFile /path/to/generated.ldif \ --trustAll
  Notice that the task is scheduled through communication over SSL on the administration port, by default 4444. You can schedule the import task to start at a particular time using the --start option.
  
  The --trustAll option trusts all SSL certificates, such as a default self-signed certificate used for testing.
If the server is replicated with other servers, initialize replication again after the successful import.

For details see "Initializing Replicas".

Initializing replication overwrites data in the remote servers in the same way that import overwrites existing data with LDIF data.

To Export LDIF Data

The following examples export dc=example,dc=org data from the userRoot backend:

To expedite export, shut down the server and then use the export-ldif command:

$ stop-ds
$ export-ldif \
 --offline
 --includeBranch dc=example,dc=org \
 --backendID userRoot \
 --ldifFile /path/to/backup.ldif

To export the data while online, leave the server running and schedule a task:
```
$ export-ldif \
 --port 4444 \
 --hostname opendj.example.com \
 --bindDN "cn=Directory Manager" \
 --bindPassword password \
 --includeBranch dc=example,dc=org \
 --backendID userRoot \
 --ldifFile /path/to/backup.ldif \
 --start 20111221230000 \
 --trustAll
```
The --start 20111221230000 option tells OpenDJ to start the export at 11 PM on December 21, 2012.

If OpenDJ is stopped at this time, then when you start OpenDJ again, the server attempts to perform the task after starting up.

Other Tools For Working With LDIF Data

This section demonstrates the ldifsearch, ldifmodify and ldif-diff commands, described in ldifsearch(1) in the Reference, ldifmodify(1) in the Reference, and ldif-diff(1) in the Reference.

Searching in LDIF With ldifsearch

The ldifsearch command is to LDIF files what the ldapsearch command is to directory servers:

$ ldifsearch \
 --baseDN dc=example,dc=org \
 --ldifFile generated.ldif \
 "(sn=Grenier)" \
 mobile
dn: uid=user.4630,ou=People,dc=example,dc=org
mobile: +1 728 983 6669

The --ldifFile ldif-file option replaces the --hostname and --port options used to connect to an LDAP directory. Otherwise, the command syntax and LDIF output is familiar to ldapsearch users.

Updating LDIF With ldifmodify

The ldifmodify command lets you apply changes to LDIF files, generating a new, changed version of the original file:

$ cat changes.ldif
dn: uid=user.0,ou=People,dc=example,dc=org
changetype: modify
replace: description
description: This is the new description for Aaccf Amar.
-
replace: initials
initials: AAA

$ ldifmodify \
 --sourceLDIF generated.ldif \
 --changesLDIF changes.ldif \
 --targetLDIF new.ldif

Notice that the resulting new LDIF file is likely to be about the same size as the source LDIF file.

Comparing LDIF With ldif-diff

The ldif-diff command reports differences between two LDIF files in LDIF format:

$ ldif-diff --sourceLDIF old.ldif --targetLDIF new.ldif
dn: uid=user.0,ou=People,dc=example,dc=org
changetype: modify
add: initials
initials: AAA
-
delete: initials
initials: ASA
-
add: description
description: This is the new description for Aaccf Amar.
-
delete: description
description: This is the description for Aaccf Amar.

The ldif-diff command reads both files into memory, and constructs tree maps to perform the comparison. The command is designed to work with small files and fragments, and can quickly run out of memory when calculating differences between large files.

About Database Backends

OpenDJ directory server stores data in a backend. A backend is a private server repository that can be implemented in memory, as a file, or as an embedded database.

Database backends are designed to hold large amounts of user data. OpenDJ directory server has tools for backing up and restoring database backends, as described in "Backing Up and Restoring Data". By default, OpenDJ directory server stores user data in a database backend named userRoot. When installing the server and importing user data, and when creating a database backend, you choose the backend type. OpenDJ directory server offers a choice of JE and PDB types.

These backend types are implemented using B-tree data structures. They store data as key-value pairs, which is different from the relational model exposed to clients of relational databases. JE and PDB backends differ in how they manage data on disk:

A JE backend stores data on disk using append-only log files with names like number.jdb. The JE backend writes updates to the highest-numbered log file. The log files grow until they reach a specified size (default: 100 MB). When the current log file reaches the specified size, the JE backend creates a new log file.

To avoid an endless increase in database size on disk, JE backends clean their log files in the background. A cleaner thread copies active records to new log files. Log files that no longer contain active records are deleted.

By default, JE backends let the operating system potentially cache data for a period of time before flushing the data to disk. This setting trades full durability with higher disk I/O for good performance with lower disk I/O. With this setting, it is possible to lose the most recent updates that were not yet written to disk in the event of an underlying OS or hardware failure. You can modify this behavior by changing the advanced configuration settings for the JE backend.

When a JE backend is opened, it recovers by recreating its B-tree structure from its log files. This is a normal process, one that allows the backend to recover after an orderly shutdown or after a crash.
A PDB backend stores data on disk using volume and journal files.

Volume files hold the data in identically sized sections called pages. A page either holds actual data or serves as an index to other pages. If a volume file runs out of space on existing pages, the PDB backend expands the volume to add more pages. The PDB backend does not, however, shrink the volume if pages become vacant, though it can reuse free pages. Volume files stay the same size or continue to grow once you have imported the data from LDIF. Only another import operation can shrink the volume size.

Journal files are append-only logs that record transactions and updated pages. Journal files have names like dj_journal.number. The PDB backend writes updates to the highest-numbered journal file. A journal file grows until it reaches 1 GB in size. The PDB backend then opens a new journal file.

To avoid an endless increase in disk space used by journal files, PDB backends clean their journal files when idle. When the backend is idle and not in the process of being backed up, a JOURNAL_COPIER thread copies pages from journal files to the appropriate volume. Old journal files are deleted. If the backend is idle long enough, the PDB backend copies all updates to the volume, leaving only one small journal file.

A PDB backend uses buffer pools in Java heap memory to cache data for fast access. Buffers are allocated to the PDB backend as long as it is in use, and are not subject to Java garbage collection. The PDB backend caches copies of data pages in the buffers, and lazily writes pages to the current journal file. At a configurable interval, the PDB backend ensures that all pages are written to disk and writes a checkpoint marker. It also writes a checkpoint marker during an orderly shutdown.

By default, a PDB backend is configured to trade full durability with higher disk I/O for good performance with lower disk I/O. With this setting, it is possible to lose the most recent updates that were not yet written to disk before a crash. You can modify this behavior by changing the advanced configuration settings for the PDB backend.

When a PDB backend is opened, it recovers by using its volume and journal files to recreate its B-tree structure starting with the last checkpoint marker, and then replaying more recent updates from the journal. (Recovery from an orderly shutdown is therefore optimally fast.) Recovery is a normal process, one that allows the backend to recover after an orderly shutdown or after a crash.

Due to the cleanup processes, JE and PDB backends can be actively writing to disk even when there are no pending client or replication operations. To back up a server using a file system snapshot, you must stop the server before taking the snapshot.

Creating a New Database Backend

OpenDJ stores your directory data in a backend. A backend is a repository that a directory server can access to store data. OpenDJ directory server offers different implementations, such as memory backends, LDIF file backends, and database backends. Database backends can be backed up and restored. By default, OpenDJ stores your data in a database backend named userRoot.

You can create new backends using the dsconfig create-backend command, described in dsconfig create-backend(1) in the Reference. OpenDJ directory server supports a variety of backend types, including in-memory backends, backends that store data in LDIF files, and backends that store data in key-value databases with indexes to improve performance with large data sets. When you create a backend, choose the type of backend that fits your purpose.

The following example creates a backend named myData. The backend is of type pdb, which relies on a PDB database for data storage and indexing. Alternatively, you can choose a different backend type with a different argument to the --type option, as in --type je:

$ dsconfig \
 create-backend \
 --hostname opendj.example.com \
 --port 4444 \
 --bindDN "cn=Directory Manager" \
 --bindPassword password \
 --type pdb \
 --backend-name myData \
 --set base-dn:dc=example,dc=com \
 --set enabled:true \
 --set db-cache-percent:25 \
 --trustAll \
 --no-prompt

Notice the setting db-cache-percent:25. This says to allocate 25% of memory available to the JVM to the new backend’s database cache. The default setting for db-cache-percent allocates 50%. When creating a new database backend, take care to keep the total memory allocated to all database caches lower than the total memory available to the JVM. As an alternative to db-cache-percent, you can use db-cache-size. The db-cache-size value is a specific amount of memory, such as 2 GB.

After creating the backend, you can view the settings as in the following example:

$ dsconfig \
 get-backend-prop \
 --hostname opendj.example.com \
 --port 4444 \
 --bindDN "cn=Directory Manager" \
 --bindPassword password \
 --backend-name myData \
 --trustAll \
 --no-prompt
Property          : Value(s)
------------------:--------------------
backend-id        : myData
base-dn           : "dc=example,dc=com"
compact-encoding  : true
db-cache-percent  : 25
db-cache-size     : 0 b
db-directory      : db
enabled           : true
index-entry-limit : 4000
writability-mode  : enabled

Alternatively, you can create a new backend in OpenDJ control panel (Directory Data > New Base DN > Backend > New Backend: backend-name). When you create a new backend using the dsconfig command, OpenDJ directory server creates the following indexes automatically:

aci presence
cn equality, substring
ds-sync-conflict equality
ds-sync-hist ordering
entryUUID equality
objectClass equality
givenName equality, substring
mail equality, substring
member equality
sn equality, substring
telephoneNumber equality, substring
uid equality
uniqueMember equality

You can create additional indexes as described in "Configuring and Rebuilding Indexes".

Encrypting Directory Data

OpenDJ directory server can encrypt directory data before storing it in a database backend on disk, keeping the data confidential until it is accessed by a directory client.

This feature is new in OpenDJ directory server 3.5.

Data encryption is useful for at least the following cases:

Ensuring Confidentiality and Integrity: Encrypted directory data is confidential, remaining private until decrypted with a proper key.

Encryption ensures data integrity at the moment it is accessed. OpenDJ directory cannot decrypt corrupted data.
Protection on a Shared Infrastructure: When you deploy directory services on a shared infrastructure you relinquish full and sole control of directory data.

For example, if OpenDJ directory server runs in the cloud, or in a data center with shared disks, the file system and disk management are not under your control.

Data confidentiality and encryption come with the following trade-offs:

Equality Indexes Limited to Equality Matching: When an equality index is configured without confidentiality, the values can be maintained in sorted order. A non-confidential, cleartext equality index can therefore be used for searches that require ordering and searches that match an initial substring.

An example of a search that requires ordering is a search with a filter "(cn⇐App)". The filter matches entries with commonName up to those starting with App (case-insensitive) in alphabetical order.

An example of a search that matches an initial substring is a search with a filter "(cn=A*)". The filter matches entries having a commonName that starts with a (case-insensitive).

In an equality index with confidentiality enabled, OpenDJ directory server no longer sorts cleartext values. As a result, you must accept that ordering and initial substring searches are unindexed.
Performance Impact: Encryption and decryption requires more processing than handling cleartext values.

Encrypted values also take up more space than cleartext values.
Replication Configuration Before Encryption: A directory server provides data confidentiality without requiring you to supply a key for encryption and decryption. It encrypts the data using a symmetric key stored under cn=admin data in the admin-backend. The symmetric key is encrypted in turn with the server’s public key also stored there. When multiple servers are configured to replicate data as described in "Managing Data Replication", the servers replicate the keys as well, allowing any server replica to decrypt any other replica’s encrypted data.

The directory server generates a secret key the first time it must encrypt data. That key is then shared across the replication topology as described above, or until it is marked as compromised. (For details regarding compromised keys, see "Handling Compromised Keys".)

When you configure replication, the source server overwrites cn=admin data in the destination server. This data includes any secret keys stored there by the destination server.

Therefore, if you configure data confidentiality before replication, the destination server’s keys disappear when you configure replication. The destination server can no longer decrypt any of its data.

To prevent this problem, always configure replication before configuring data confidentiality.

As explained in "Protect OpenDJ Directory Server Files", OpenDJ directory server does not encrypt directory data by default. This means that any user with system access to read directory files can potentially access directory data in cleartext:

$ strings /path/to/opendj/db/userRoot/dj* | grep bjensen | sort | uniq
'uid=bjensen,ou=People,dc=example,dc=com
/home/bjensen
bjensen
bjensen@example.com

To maintain data confidentiality on disk, you must configure it explicitly. In addition to preventing read access by other users as described in "Set Up a System Account for OpenDJ Directory Server", you can configure confidentiality for database backends. When confidentiality is enabled for a backend, OpenDJ directory server encrypts entries before storing them in the backend.

Encrypting stored directory data does not prevent it from being sent over the network in the clear.

Apply the suggestions in "Protect Directory Server Network Connections" to protect data sent over the network.

Enable backend confidentiality with the default encryption settings as shown in the following example that applies to the userRoot backend:

$ dsconfig \
 set-backend-prop \
 --hostname opendj.example.com \
 --port 4444 \
 --bindDN "cn=Directory Manager" \
 --bindPassword password \
 --backend-name userRoot \
 --set confidentiality-enabled:true \
 --no-prompt \
 --trustAll

After confidentiality is enabled, entries are encrypted when next written. That is, OpenDJ directory server does not automatically rewrite all entries in encrypted form. Instead, it encrypts each entry on update, for example, when a user updates their entry or when you import data. The settings for data confidentiality depend on the encryption capabilities of the JVM. For example, for details about the Sun/Oracle Java implementation, see the explanations in javax.crypto.Cipher. You can accept the default settings, or choose to specify the following:

The cipher algorithm defining how the cleartext is encrypted and decrypted.
The cipher mode of operation defining how a block cipher algorithm should transform data larger than a single block.
The cipher padding defining how to pad the cleartext to reach appropriate size for the algorithm.
The cipher key length, where longer key lengths strengthen encryption at the cost of more performance impact.

The default settings for confidentiality are cipher-transformation: AES/CBC/PKCS5Padding and cipher-key-length: 128. This means the algorithm is the Advanced Encryption Standard (AES), the cipher mode is Cipher Block Chaining (CBC), and the padding is PKCS#5 padding as described in RFC 2898: PKCS #5: Password-Based Cryptography Specification. The syntax for the cipher-transformation is algorithm/mode/padding, and all three must be specified. When the algorithm does not require a mode, use NONE. When the algorithm does not require padding, use NoPadding. Use of larger cipher-key-length values can require that you install JCE policy files such as those for unlimited strength.

OpenDJ directory server encrypts data using a symmetric key that is stored with the server configuration. The symmetric key is encrypted in turn with the server’s public key that is also stored with the server configuration. When multiple servers are configured to replicate data as described in "Configuring Replication", the servers replicate the keys as well, allowing any server replica to decrypt the data.

In addition to entry encryption, you can enable confidentiality by backend index, as long as confidentiality is enabled for the backend itself. Confidentiality hashes keys for equality type indexes using SHA-1, and encrypts the list of entries matching a substring key for substring indexes. The following example shows how to enable confidentiality for the mail index:

$ dsconfig \
 set-backend-index-prop \
 --hostname opendj.example.com \
 --port 4444 \
 --bindDN "cn=Directory Manager" \
 --bindPassword password \
 --backend-name userRoot \
 --index-name mail \
 --set confidentiality-enabled:true \
 --no-prompt \
 --trustAll

After changing the index configuration, you can rebuild the index to enforce confidentiality immediately. For details, see "Configuring and Rebuilding Indexes".

Avoid using sensitive attributes in VLV indexes. Confidentiality cannot be enabled for VLV indexes.

Encrypting and decrypting data comes with costs in terms of cryptographic processing that reduces throughput and of extra space for larger encrypted values. In general, tests with default settings show that the cost of enabling confidentiality can be quite modest, but your results can vary based on your systems and on the settings used for cipher-transformation and cipher-key-length. Make sure you test your deployment to qualify the impact of confidentiality before enabling it in production.

Setting Disk Space Thresholds For Database Backends

Directory data growth depends on applications that use the directory. As a result, when directory applications add more data than they delete, the database backend grows until it fills the available disk space. The system can end up in an unrecoverable state if no disk space is available.

Database backends therefore have advanced properties, disk-low-threshold and disk-full-threshold. When available disk space falls below disk-low-threshold, OpenDJ server only allows updates from users and applications that have the bypass-lockdown privilege, as described in "About Privileges". When available space falls below disk-full-threshold, OpenDJ server stops allowing updates, instead returning an UNWILLING_TO_PERFORM error to each update request.

OpenDJ server continues to apply replication updates without regard to the thresholds. OpenDJ server can therefore fill available disk space despite the thresholds, by accepting replication updates made on other servers. You can give yourself more time to react to the situation both by monitoring directory data growth and also by increasing the thresholds.

If growth across the directory service tends to happen quickly, set the thresholds higher than the defaults to allow more time to react when growth threatens to fill the disk. The following example sets disk-low-threshold to 2 GB disk-full-threshold to 1 GB for the userRoot backend:

$ dsconfig \
 set-backend-prop \
 --hostname opendj.example.com \
 --port 4444 \
 --bindDN "cn=Directory Manager" \
 --bindPassword password \
 --backend-name userRoot \
 --set "disk-low-threshold:2 GB" \
 --set "disk-full-threshold:1 GB" \
 --trustAll \
 --no-prompt

The properties disk-low-threshold and disk-full-threshold are listed as advanced properties. To examine their values with the dsconfig command, use the --advanced option as shown in the following example:

$ dsconfig \
 get-backend-prop \
 --advanced \
 --hostname opendj.example.com \
 --port 4444 \
 --bindDN "cn=Directory Manager" \
 --bindPassword password \
 --backend-name userRoot \
 --property disk-low-threshold \
 --property disk-full-threshold \
 --trustAll \
 --no-prompt
Property            : Value(s)
--------------------:---------
disk-full-threshold : 1 gb
disk-low-threshold  : 2 gb

Updating an Existing Backend to Add a New Base DN

In addition to letting you create new backends as described in "Creating a New Database Backend", OpenDJ lets you add a new base DN to an existing backend.

The following example adds the suffix o=example to the existing backend userRoot:

$ dsconfig \
 set-backend-prop \
 --hostname opendj.example.com \
 --port 4444 \
 --bindDN "cn=Directory Manager" \
 --bindPassword password \
 --backend-name userRoot \
 --add base-dn:o=example \
 --trustAll \
 --no-prompt

$ dsconfig \
 get-backend-prop \
 --hostname opendj.example.com \
 --port 4444 \
 --bindDN "cn=Directory Manager" \
 --bindPassword password \
 --backend-name userRoot \
 --property base-dn \
 --trustAll \
 --no-prompt
Property : Value(s)
---------:-------------------------------
base-dn  : "dc=example,dc=com", o=example

Alternatively, you can update an existing backend in OpenDJ control panel (Directory Data > New Base DN, then select the existing backend from the dropdown Backend list, and enter the new Base DN name).

Deleting a Database Backend

You delete a database backend by using the dsconfig delete-backend command, described in dsconfig delete-backend(1) in the Reference.

When you delete a database backend by using the dsconfig delete-backend command, OpenDJ does not actually remove the database files for two reasons. First, a mistake could potentially cause lots of data to be lost. Second, deleting a large database backend could cause severe service degradation due to a sudden increase in I/O load.

Instead, after you run the dsconfig delete-backend command you must also manually remove the database backend files.

If you do run the dsconfig delete-backend command by mistake and have not yet deleted the actual files, then you can recover from the mistake by creating the backend again, reconfiguring the indexes that were removed, and rebuilding the indexes as described in "Configuring and Rebuilding Indexes".