General security

Data Backups with Bacula: Bacula Internals

Dejan Lukan
August 28, 2014 by
Dejan Lukan

Introduction

This article presents the integral concepts of Bacula operation and management, which are integral parts of every Bacula backup solution and must be understood in detail. When using Bacula, we must first be up-to-date with the following Bacula terminology:

  • Director: the director's name and the access password used for authentication to the console program.
  • Job: defines the backup/restore jobs and ties together Client, FileSet and Schedule to each client.
  • JobDefs: optional resource that defines the default values for the Job resource.
  • Schedule: defines Jobs, which will be run automatically upon a defined time schedule.
  • FileSet: defines a set of files to be backed up for each Client.
  • Client: defines a Client to be backed up.
  • Volume: a single physical tape (or a single file) on which Bacula will store our backup data.
  • Storage: defines the physical device where the Volumes are mounted.
  • Pool: defines a group of Volumes, so the backup is not restricted upon the length of a single volume, but can span multiple volumes. In a configuration file, we must specify a Pool rather than a Volume to let Bacula automatically figure out which Volume will be used next for data storage. Pools are very useful, because they are not limited to the size of the Volume, but multiple volumes can be used at once; we can also use pools to separate daily, weekly and monthly backup jobs to ensure that daily jobs can write only to specific volumes, weekly to other volumes and monthly to the rest of the volumes.
  • Catalog: defines the database, which is usually Postgresql/Mysql/Sqlite used for keeping metadata about the backed up files and other important information that is part of the backup procedure.
  • Message: defines error/information messages, which can be sent to email or logged.

Below we can see a clear concise representation of a Bacula system represented with a graphical drawing, which we can keep around when we start wondering about various Bacula backup components. On the left side we can see a client machine from where we can run cat/bconsole to administer the Bacula server; the client machine also has to have the file daemon listening on port 9102 open, so Bacula server can connect to it and starts the backup process. On the right side of the picture is the storage system comprising of multiple volumes connected in pools. The middle part of the image represents the Bacula server, which is responsible for the whole backup process. The Director is the heart of the Bacula server, which overlooks every taken action; it has multiple jobs defined that tie together Filesets, Schedules and Clients to backup certain data at a certain time on a certain client machine.

FREE role-guided training plans

FREE role-guided training plans

Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

When configuring a Bacula server, we need to install all bacula-* packages with a default package manager like apt-get. If you're installing Bacula on Synology, then it should be installed with the ipkg command as presented at [1].

After installing Bacula server, we have to install configuration files for:

  • The Director (bacula-dir.conf): the heart of the backing process and a daemon that controls all the other daemons. We have to change the FileSet resource to define the files to be backed up, as well as Schedule resource to specify when the backup job will be run and Client resource to specify on which client the backup job should be run. Each client machine requires separate FileSet/Schedule/Client resources, which uniquely define each backup job.
  • File Daemon (bacula-fd.conf): a program running on each client machine, which listens on port 9102, serves files requested by the Director, and sends them to the Storage daemon.
  • Storage Daemon (bacula-sd.conf): the storage daemon must accept the data from the File daemon and store them on the storage media; the File daemon can also ask the Storage daemon to find the data it has previously stored and send it back to it when restoration is in place.
  • The Console (bconsole.conf): used to connect to the director and observe status of clients and jobs, as well as manually run jobs and other things.

The picture below, taken from [1], presents Bacula resource types that must be defined in configuration files for each daemon or service. The Director (bacula-dir.conf) must implement almost all of the resources, namely the Catalog, Client, Console, Director, FileSet, JobDefs, Message, Pool, Schedule and Storage. On the other hand, the Console (bconsole.conf) must implement only Console and Director resources.

In the chapters below, I'll present the default configuration settings of each daemon after it has been installed, so you can scoop through the list and adjust the configuration options to your needs.

Based on whether we're using Mysql/Postgresql/Sqlite, we must also run the appropriate script in the /opt/etc/bacula/scripts/ directory; an example of running make_sqlite3_tables can be seen below.

[plain]

# /opt/etc/bacula/scripts/make_sqlite3_tables

[/plain]

If we connect to the sqlite database afterwards, the database should exist and the tables should be present inside the database, which can be seen below.

[plain]

# sqlite3 /opt/var/bacula/working/bacula.db

sqlite> .tables

BaseFiles File JobMedia MediaType Storage

CDImages FileSet Location NextId UnsavedFiles

Client Filename LocationLog Path Version

Counters Job Log Pool

Device JobHisto Media Status

[/plain]

Configuring the Director on the Server

The document at [1] presents simplified Bacula object definitions that can be seen on the picture below. Note that each configuration option that references another resource in the configuration file is presented in red text color.

In bacula-dir.conf, the default resources and their configuration options are the following:

  • Director: the director name and password used to authenticate with the Console program.
    • Name: the name of the director resource.
    • DIRAddress: the IP address to which the file daemon will bind, which gives us the option to listen on specific interfaces only. If we want to listen on localhost interface only, we can set it to 127.0.0.1, but if we want to listen on all interfaces, we have to set it to 0.0.0.0 (the default).
    • DIRPort: the port on which Director will listen for Console connections.
    • QueryFile: the file where the Director can find the SQL statements for Query command of the Console.
    • WorkingDirectory: a directory where the Director will store its status files, which includes log files, traceback/backtrace files, etc.
    • PidDirectory: a directory where the Director may put its process ID file used when shutting down Bacula to prevent multiple copies from being run simultaneously.
    • Maximum Concurrent Jobs: a total number of jobs that should run simultaneously, which is set to 1 by default to simplify things.
    • Password: the password, which must be used by the bconsole application in order to connect to the Director.
    • Message: a message resource that specifies where to send messages that are not part of any job; the messages of each job will be sent to a Message resource specified by each job individually.
  • Job: defines a backup/restore job, which ties together Client, FileSet and Schedule resources; each client has a separate job, which is used to backup the data on that client.
    • Name: the job name, which can also be specified in the Run command in the console to be run manually; note that you should give your job the same name as the name of the Client with which it will be used.
    • JobDefs: references the default settings for this Job specified in the JobRefs resources.
    • Write Bootstrap: a filename where a bootstrap for each job run will be written to; it is used only when backing up the data. When a full backup is in place, the file is overwritten on each job run, contrary to the incremental/differential backup where the data is only appended to the file. We need to backup this file after each job is done executing to be able to recover the current state of our system – we can specify a file on a different mounted drive on a separate machine to keep the file backed-up. A bootstrap file contains ASCII information that is used when specifying which files should be restored, what volume are they on, and where they are on a volume [5].
  • JobDefs: a resource where we can specify the defaults of certain Job, which can then be used in Job definition to inherit options from JobDefs to make the configuration cleaner.
    • Name: the name of the jobdefs resource.
    • Type: a job type, which can be one of the following: Backup, Restore, Verify, Admin. The Backup defines a backup job to perform the actual backup of the files, whereas the Restore job restores the file contents. The Verify job compares the contents of the catalog to file system to verify the backup status and Admin job is used for catalog pruning.
    • Level: specifies the default job level to be run, which differs based on the type of the job. The Backup job can have the following values: Full, Incremental, Differential. A Full level backs up all files in the FileSet even if they haven't changed since the last backup. The Incremental level will backup only the files that have changed since the last successful backup and the Differential level will backup all files that have changed since the last successful Full backup.
    • Client: specifies the Client resource that will be used in the current job.
    • FileSet: specifies the FileSet that will be used in the current job.
    • Schedule: references the Schedule that will be used with the current job to indicate when the job will be run automatically by the scheduler.
    • Storage: references the storage resource where the data will be backed up.
    • Messages: the Message resource which will be used with the current job and specifies the location where the messages will be sent: log file, email message, etc.
    • Pool: references the pool where the data will be backed up.
    • Priority: the priority of the jobs, which defines the order in which the jobs will be run.
  • Schedule: defines the time when the job will be run by Bacula scheduler.
    • Name: name of the Schedule resource.
    • Run: defines when a job will be run.
  • FileSet: defines the files which will be backed up on a client.
    • Name: name of the FileSet resource.
    • Include: defines a list of directories and files to be backed up. We can also apply compression, recursion, signature and other features to a backed up directory.
  • Client: defines the client to be backed up.
    • Name: name of the Client resource.
    • Address: a hostname, fqdn or an IP of the client machine.
    • FDPort: the port number of the File daemon.
    • Catalog: references the Catalog to be used with this Client.
    • Password: the password of the File daemon, which is used when establishing connection with the File daemon.
    • File Retention: the amount of time Bacula will keep File records in the Catalog database, which will be removed after the set amount of time if AutoPrune is set; note that just the entries in the catalog database will be removed, while the the archive backups will be left intact.
    • Job Retention: the amount of time Bacula will keep Job records in the Catalog database, which will be removed after the set amount of time if AutoPrune is set.
    • AutoPrune: if set, the File/Job Retention will be applied to the Catalog and certain entries will be removed.
  • Storage: defines the physical device used for backup the data.
    • Name: the name of the Storage resource.
    • Address: a hostname, a fqdn or an IP address of the Storage daemon.
    • SDPort: the port number of the Storage daemon.
    • Password: the password required when connecting to Storage daemon.
    • Device: the name of the device to be used for storage, which can be an arbitrary string.
    • Media Type: the media type used for storage, which describes the storage media, which can be an arbitrary string.
  • Pool: defines a pool of volumes that can be used when backup the data, which can be used to restrict a Job/Client to use only a particular Volume.
    • Name: the name of the Pool resource.
    • Pool Type: defines the pool type, which depends upon the current Job and can be one of the following: Backup, Archive, Cloned, Migration, Copy, Save.
    • Recycle: if set, purged volumes will be recycled, which means the expired data will be removed from the Catalog and overwritten from the Volume.
    • AutoPrune: automatically applies the Volume Retention period in the Pool, which will remove the expired volumes.
    • Volume Retention: the amount of time Bacula will keep data in the volume before removing it.
  • Catalog: defines the database where a list of files and Volume names is stored.
    • Name: the name of the Catalog resource.
    • Dbname: the name of the database.
    • Dbuser: the username of the database.
    • Dbpassword: the password of the database.
  • Message: defines where the information and error messages will be sent.
    • Name: the name of the Message resource.
    • Mailcommand: the mail command used to send emails.
    • Mail: the email address where the email messages will be sent.
    • Console: send a message to the Bacula console.
    • Append: specifies a file to which the messages are appended.
    • Catalog: send a message to the Catalog database, which is written to the table named Log.
  • Console:
    • Name: the name of the Console resource.
    • Password: the password needed for Backup Console to be authorized.
    • CommandACL: a list of console commands that can be executed by the console.

Let's now configure the Bacula director, where we have to name the director itself and specify the binding address as well as port number (the default 0.0.0.0:9101 is used). The maximum concurrent jobs is set to 10 to concurrently execute backuping jobs.

[plain]

Director {

Name = baculaserver-dir

DIRAddress = 0.0.0.0

DIRport = 9101

QueryFile = "/opt/etc/bacula/scripts/query.sql"

WorkingDirectory = "/opt/var/bacula/working"

PidDirectory = "/var/run"

Maximum Concurrent Jobs = 1

Password = "Ha2JIPxkAYRQTzE7AN9mPVguF5gfp7evP4BQsqWIthLZd7X4OB"

Messages = Daemon

}

[/plain]

Next, there's a JobDefs directive used to specify the default configuration options for each job. I didn't change any of the default options presented below. The name of the JobDefs definition is DefaultJob, which we need to reference from Job directives to apply the defaults to each job. The type of the job is set to Backup, which is used for backing up purposes, and the level is set to Incremental, which is used to backup all files that have changed since the last backup. The Schedule, Storage and Messages directives reference corresponding configuration options, which we'll describe later on: because we're referencing other configuration blocks, we need to ensure the names of the blocks stay the same for the references to be done properly.

[plain]

JobDefs {

Name = "DefaultJob"

Type = Backup

Level = Incremental

FileSet = "Full Set"

Schedule = "WeeklyCycle"

Storage = File

Messages = Standard

Pool = Default

Priority = 10

}

[/plain]

Next, the Job section is used to create the job named BackupClient used for backing up files on a remote client machine. The JobDefs option references the DefaultJob job definition to apply its configuration options to the current job, while the Client parameter specifies the name of the remote client's file daemon.

[plain]

Job {

Name = "BackupClient"

JobDefs = "DefaultJob"

Client = computer-fd

}

[/plain]

We also need to configure the computer-fd Client definition, which must point to our client machine by using IP address or hostname: in our case the machine hostname is used. Note that the password must be the same as set in the Director section in bacula-fd.conf on the client side.

[plain]

Client {

Name = computer-fd

Address = machine

FDPort = 9102

Catalog = MyCatalog

Password = "1ecUD1K75FjuyZaxm0T7LJjnzzX3H6gZKxtAbqhBls4jXHryJl"

File Retention = 30 days

Job Retention = 6 months

AutoPrune = yes

}

[/plain]

So far we've configured the Bacula server-client communication, but there are still some sections that need to be configured: Storage, Pool, Schedule and FileSet. Each Job section references a storage configuration – in our case the default job definition has "Storage = File" among its configuration options, which means we should name the storage configuration section as 'File'. The storage address should be set to a FQDN (fully qualified domain name) of the storage daemon, which can be resolved by a local DNS server – this must be true even if Bacula storage is running on the same machine, which is normally the case. In most cases the password is already correct, but we can double-check by taking a look into the /opt/etc/bacula/bacula-sd.conf configuration file.

[plain]

Storage {

Name = File

# Do not use "localhost" here

Address = baculaserverbox

SDPort = 9103

Password = "cthnUFNpZ3FwMzDtNQlL0xyodxU3UBoxqJLQsZfzJ5htaJJNH9"

Device = FileStorage

Media Type = File

}

[/plain]

The job definition configuration options are also specifying a pool by using "Pool = Default", which means we need to configure a pool which is named Default.

[plain]

Pool {

Name = Default

Pool Type = Backup

Recycle = yes

AutoPrune = yes

Volume Retention = 365 days

}

[/plain]

Each Job/JobDefs also has a schedule configuration section "Schedule = "WeeklyCycle", which is used to run a backup job weekly.

[plain]

Schedule {

Name = "WeeklyCycle"

Run = Full 1st sun at 23:05

Run = Differential 2nd-5th sun at 23:05

Run = Incremental mon-sat at 23:05

}

[/plain]

The last references element in Job/JobDefs is the FileSet element "FileSet = "Full Set" specifying which files to back-up. The FileSet below backs up every file in the /backup directory and appends the SHA1 hash to each of the backed up files.

[plain]

FileSet {

Name = "Full Set"

Include {

Options {

signature = SHA1

}

File = /backup

}

}

[/plain]

Next we must save the configuration file and check whether it contains any errors, which we can do with bacula-dir command as presented below.

[plain]

# bacula-dir /opt/etc/bacula/bacula-dir.conf

#

[/plain]

Since no error messages were printed to stdout, the configuration file is okay, and we can restart the Bacula service.

[plain]

# /etc/init.d/bacula-director restart

Stopping Bacula Director...: bacula-dir.

Starting Bacula Director...: bacula-dir.

[/plain]

After restarting the Bacula Director, we have to verify whether it's listening on port 9101, which we can do by invoking bconsole from the client. It's needless to say that the passwords for Console configuration section must match in order for the client to be able to connect to the server.

[plain]

# bconsole

Connecting to Director localhost:9101

1000 OK: baculaserverbox-dir

Enter a period to cancel a command.

*status

[/plain]

Configuring File Daemon on the Client

The configuration for the client daemon is fairly simple, we we only need to configure the file daemon. The components involved are presented below.

In bacula-fd.conf, the default resources and their configuration options are the following:

  • Director
    • Name: the name of the director, which can connect to this client, which must match the name specified in the director configuration file on the server endpoint. Note that each file daemon can have multiple Director sections, where each section defined a director that is allowed to connect to this client.
    • Password: the password that must be supplied by the director in order to authorize to the client.
    • Monitor: if set to no, the director has full access to the client, while if set to yes, the director is only allowed to see the current status of the client.
  • Client/FileDaemon
    • Name: the name of the client, which must match the name specified in the director configuration file. Note that each there should be only one FileDaemon directive in the bacula-fd.conf configuration file, which specifies the settings of the current client.
    • Working Directory: specifies the directory in which the client daemon will put its status files and should be used only by Bacula.
    • Pid Directory: specifies the directory in which the director will put its process ID files, which is used to keep only one instance of the file daemon running.
    • FDAddress: the IP address to which the file daemon will bind, which gives us the option to listen on specific interfaces only. If we want to listen on localhost interface only, we can set it to 127.0.0.1, but if we want to listen on all interfaces, we have to set it to 0.0.0.0.
    • FDPort: the port number on which the file daemon will listen for director connections (default 9102).
    • Maximum Concurrent Jobs: the maximum number of allowed concurrent jobs. It is normally set to 2, so we're allowed to run a backup job as well as request actions in the console simultaneously.
  • Messages
    • Name: the name of the message resource used by the Job directives used to specify how the messages are handled.
    • MailCommand: the command used to send messages to SMTP server, where the bsmtp Bacula SMTP client is used.
    • OperatorCommand: similar to the MailCommand, except that this command is used to send Operator messages.
    • Destination: specifies the destination where the messages are passed: if the director directive is used, then the messages are passed to the specified director; the name of the director must match the director name specified in the Director directive in the same bacula-fd.conf configuration file. Each destination also contains a comma separated list of message types that will be processed – the following message types are supported: all, info, warning, error, fatal, terminate, notsaved, skipped, mount, restored, security, alert, and volmgmt. The destination may be one of the following: director, file, append, syslog, mail, mail on error, mail on success, operator, console, stdout, stderr and catalog.

Let's now take a look at each of the configuration directives in a configuration file. Below we can see the Director configuration where the name and the password are given, which must be used by the director in order to successfully connect and authenticate to the file daemon. After that there's another director, where the monitor mode is enabled, which means that the director will only be able to monitor the status of the file daemon.

[plain]

Director {

Name = baculaserver-dir

Password = "1ecUD1K75FjuyZaxm0T7LJjnzzX3H6gZKxtAbqhBls4jXHryJl"

}

Director {

Name = baculaserver-mon

Password = "RydDUErHPyfOa4CcQNwrxsoWJxSPoRFVka9KdGAnJOiCW2U2Tq"

Monitor = yes

}

[/plain]

Next, we have a file daemon configuration, where the name of the file deamon is presented as well as its address and port number; mostly the defaults are used.

[plain]

FileDaemon {

Name = computer-fd

FDAddress = 0.0.0.0

FDport = 9102

WorkingDirectory = /var/lib/bacula

Pid Directory = /var/run/bacula

Maximum Concurrent Jobs = 10

}

[/plain]

At the end there's also the messages configuration, where the Standard name is used as well as the director configuration option telling the file daemon to pass all messages except skipped files back to the computer-dir director.

[plain]

Messages {

Name = Standard

director = computer-dir = all, !skipped, !restored

}

[/plain]

Once we've configured the file daemon the way we want, we can check if we've made any mistakes in the configuration file itself. We can do that by using the bacula-fd command as follows.

[plain]

# bacula-fd /etc/bacula/bacula-fd.conf

#

[/plain]

Since there were no errors printed to the stdout, the configuration file is okay, and we can safely restart the file daemon for changes to take effect:

[plain]

# /etc/init.d/bacula-fd restart

Stopping Bacula File daemon...: bacula-fd.

Starting Bacula File daemon...: bacula-fd.

[/plain]

Configuring Storage Daemon on the Server

In bacula-sd.conf, the default resources and their configuration options are the following: Storage, Director, Message and Device as presented on the picture below. We won't describe them in detail, since the options are basically the same as already presented; rather than that we'll present just the configuration options used in the bacula-sd.conf configuration file.

First we need to configure the storage server itself by providing it's name, IP address and port number:

[plain]

Storage {

Name = baculaserver-sd

SDAddress = 0.0.0.0

SDPort = 9103

WorkingDirectory = "/opt/var/bacula/working"

Pid Directory = "/var/run"

Maximum Concurrent Jobs = 20

}

[/plain]

Next, we need to ensure the Director has access to the storage daemon by providing the name and password of the director.

[plain]

Director {

Name = baculaserver-dir

Password = "Ha2JIPxkAYRQTzE7AN9mPVguF5gfp7evP4BQsqWIthLZd7X4OB"

}

[/plain]

In the bacula-dir.conf configuration file, there was also a "Device = FileStorage" configuration option inside the Storage section. Now in the bacula-sd.conf, we must configure that storage, which is why we need to know which configuration directives to use in Device resource:

  • Name: specifies the name of the device where the data will be stored.
  • Media Type: the type of media supported by this device.
  • Archive Device: the name of the storage device, which is set depending on the storage type: for removable devices it's set to /dev/hdc. It can also be a directory if the data is archived to a disk storage: in this case we're storing the backup data on a separate volume created on a specific VM.
  • LabelMedia
  • Random Access: the device is a random access medium.
  • AutomaticMount: automatically mount the device.
  • RemovableMedia: when set to yes, the media cannot be removed: tapes, cds, etc, while it can't be removed when set to no.
  • AlwaysOpen: when set to yes, Bacula will ensure the device is always available when it is needed. We can still unmount the drive when needing it for something else by using the unmount command in Console, but we must manually mount afterwards, because otherwise the next Bacula job will block.

[plain]

Device {

Name = FileStorage

Media Type = File

Archive Device = /backup/

LabelMedia = yes;

Random Access = Yes;

AutomaticMount = yes;

RemovableMedia = no;

AlwaysOpen = no;

}

[/plain]

Configuring the Console

The Bacula Console allows the user to interact with the Bacula Director daemon to receive messages, status of the backups, etc. The document at [1] presents simplified Bacula object definitions that can be seen on the picture below.

On the client side we have to edit /etc/bacula/bconsole.conf in order to tell the bconsole program about the director's network address and password. The configuration file should look something like below.

[plain]

Director {

Name = baculaserver-dir

DIRport = 9101

address = baculaserverbox

Password = "fJfUFfog5Kp1AxN4BAiMMxdARD9ZxWiPHnoqlJh2oHRK3yBIo9"

}

[/plain]

In order to connect to the server's Director, we can simply execute the bconsole command as presented below.

[plain]

# bconsole

Connecting to Director baculaserverbox:9101

Enter a period to cancel a command.

[/plain]

After that we can test whether the Bacula director is able to connect to the Bacula file daemon running on a remote client machine: Below, we can see that we used the status command in bconsole and selected option 3 for client status, after which we received that the director was able to login to the file daemon running on remote client; additionally we can also see that no jobs are currently running and that there are no terminated jobs.

[plain]

*status

Status available for:

1: Director

2: Storage

3: Client

4: All

Select daemon type for status (1-4): 3

Connecting to Client computer-fd at machine:9102

computer-fd Version: 5.2.6 (21 February 2012) x86_64-pc-linux-gnu debian 7.0

Daemon started 09-Aug-14 11:19. Jobs: run=0 running=0.

Heap: heap=270,336 smbytes=15,425 max_bytes=15,572 bufs=45 max_bufs=46

Sizeof: boffset_t=8 size_t=8 debug=0 trace=0

Running Jobs:

Director connected at: 09-Aug-14 13:17

No Jobs running.

====

Terminated Jobs:

====

[/plain]

Conclusion

In this article we've presented the basic concepts of Bacula integrals that need to be understood when deciding to use Bacula for our backup solution. Understanding those concepts takes time and energy, but it's worth it, considering Bacula can do most anything in order to satisfy our needs.

References

[1] Solid-state drive, https://en.wikipedia.org/wiki/Solid-state_drive.

[2] Tape drive https://en.wikipedia.org/wiki/Tape_drive.

[3] List of backup software https://en.wikipedia.org/wiki/List_of_backup_software.

[4] Bacula-Web, http://www.bacula-web.org/.

[5] The Bootstrap File, http://www.bacula.org/5.2.x-manuals/en/main/main/Bootstrap_File.html.

[6] ESXi 5.1: Using Raw Device Mappings (RDM) on an HP Microserver, http://forza-it.co.uk/esxi-5-1-using-raw-device-mappings-rdm-on-an-hp-microserver/.

[7] Bacula Installation and Configuration Guide, https://access.redhat.com/site/sites/default/files/attachments/install_1.pdf.

[8] Overview on modifying the Synology Server, bootstrap, ipkg etc, http://forum.synology.com/wiki/index.php/Overview_on_modifying_the_Synology_Server,_bootstrap,_ipkg_etc.

[9] Data Encryption, http://www.bacula.org/5.2.x-manuals/en/main/main/Data_Encryption.html.

[10] Messages Resource, http://www.bacula.org/5.2.x-manuals/en/main/main/Messages_Resource.html.