This documentation refers to the "version 12" cyrus index format and associated mailbox files.
No external tools should make use of this information. The only supported method of access to the mail store is through the standard interfaces: IMAP, POP, NNTP, LMTP, etc.
A cyrus mailbox is a directory in the filesystem. It contains the following files:
With "split metadata" configuration, the mailbox may actually be split between multiple disks, with the files being in the same relative directory on the meta disk. See the imapd.conf option metapartition_files for more information
The message files are named by their UID, followed by a ".", so UID 423 would be named "423.". They are stored in wire-format: lines are terminated by CRLF and binary data is not allowed.
This file contains mailbox-wide information that does not change often. Its format:
<Mailbox Header Magic String> <Quota Root>\t<Mailbox Unique ID String>\n <Space-separated list of user flags>\n <Mailbox ACL>\n
The Mailbox Unique ID String is used for non-owner per-user \Seen flags so they remain with the mailbox during renames, and also by the replication subsystem to detect mailbox renames.
The ACL is a copy of the value stored in mailboxes.db, and isn't actually used.
The cyrus.index file must be locked in exclusive mode while making changes to the cyrus.header file to ensure consistency. All changes are made by rewriting the entire file and renaming the new version into place.
The cyrus.cache file is a pure cache of information that's also present in the message files. It exists to make ENVELOPE and specific header fetches more efficient, as well as to assist with searches and sorts.
If a cyrus.cache file is missing or corrupted, it can be re-generated by running a reconstruct on the mailbox.
The format is 10 individual records each prefixed with a 32 bit length value in network byte order. The offset of each message's cache record is stored in the cyrus.index file (documented below). The records in a cyrus.cache file are of variable length, depending on the contents of the associated message.
The first 4 bytes of the cyrus.cache file are a "generation number" which must match the first 4 bytes of the associated cyrus.index file. In the past this was used to track consistency between the files, but the name locking scheme and per-record CRC check in cyrus 2.4 and above means this is just a backup consistency check rather than an essential format feature.
+------------------------------------------------------------------------+ |Gen # (32bits)|Size 1 (32bits)|Data 1 | +------------------------------------------------------------------------+ | |Size 2 (32bits)|Data 2 |Size 3 (32bits)| Data 3 | +------------------------------------------------------------------------+ | ..... | +------------------------------------------------------------------------+
While there are occasional changes to the cache format, this information is NOT stored in the cyrus.cache file. Instead, there is a "cache_version" field in the cyrus.index record, so multiple different versions of cache data may exist in the same cache file.
The order of fields per record in the cache file is as follows: (keep in mind that they are all preceded by a 4 byte network byte order size).
Offsets into the message file to pull out various body parts. Because of the nature of MIME parts, this is somewhat recursive.
This looks like the following (starting the octet following the cache field size). All of the fields are bit32s.
[ [Number of message parts+1 for the rfc822 header if present] [ [Offset in the message file of the header of this part] [Size (octets) of the header of this part] [Offset in the message file of the content of this part] [Size (octets) of the content of this part] [Encoding Type of this part] ] (repeat for each part as well as once for the headers) [zero *or* number of sub-parts in the case of a multipart. if nonzero, this is a recursion into the top structure] (repeat for each part) ]
Note if this is not a message/rfc822, than the values for the sizes of the part 0 are -1 (to indicate that it doesn't exist). Sub-parts are not possible for a part 0, so they aren't included when finding recursive entries.
The offset and size info for both the mime header and content part are useful in order to do fast indexing on the appropriate parts of the message file when a client does a FETCH request for BODY[HEADER], or BODY[2.MIME].
Note that the top level RFC822 headers are a treated as a separate part from their body text ("0" or "HEADER").
In the case of a multipart/alternative, the content size & offset refers to the size of the entire mime part.
A very simple message (with a single text/plain part) would therefore look like:
[[2][rfc822 header][text/plain body part info][0]]
A simple multipart/alternative message might look like:
[[3][rfc822 header][text/plain message part info] [second message part info][0][0]]
A message with an attachment that has two subparts:
[[3][rfc822 header info][rfc822 first body part info][attachment info][0][ [3][NIL header info][sub part 1 info][sub part 2 info][0][0]]]
A message with an attached message/rfc822 message with the following total structure:
message/rfc822 0 headers; content-type: multipart/mixed 1 text/plain 2 message/rfc822 0 headers; content-type: multipart/alternative 1 text/plain 2 text/html
[[3][rfc822 header part 0][text/plain part 1][overall attachment info][0][ [3][rfc822 header part 2.0][text/plain part 2.1][text/html part 2.2] [0][0]]]
Any cached header fields. The exact set of fields here depends on the cache record version - there is a function in imap/mailbox.c to determine if a named header would be cached based on the version. These are in the same format they would appear in the message file:
HeaderName: headerdata\r\n
Examples include: References, In-Reply-To, etc.
The cyrus.index file must be locked in exclusive mode while making changes to the cyrus.cache file to ensure consistency. All new cache records are created by reading the current end-of-file offset, appending the new cache record, and storing that start offset into the associated cyrus.index record.
The cyrus.index file is NOT just a cache - it stores information not present in the message file!
The cyrus.index file consists of a fixed width header, followed by fixed width records. In the past, it would be rewritten on every expunge, but since Cyrus 2.4 the expunged records remain in the cyrus.index file for a configurable time to support QRESYNC and more efficient delayed expunge.
The cyrus.index file is the "heart" of the mailbox format - containing checksums (CRC32) of everything else, and the most frequently updated fields. All fields are stored in network byte order and aligned on 4 byte boundaries. Due to some 64 bit values being stored, the header and individual records are aligned on 8 byte boundaries.
The overall format looks sort of like this:
cyrus.index: +----------------+ | Mailbox Header | +----------------+ | Msg: Num 1 | +----------------+ | Msg: Num 2 | +----------------+ | ... | +----------------+
The basic idea being that there is one header, and then all the message records are evenly spaced throughout the file. All of the message records are at well-known offsets, making any part of the file accessable at roughly equal speed.
cyrus.index files can not be repacked (i.e. records can not change UID for a particular offset, and the file can't be rewritten or deleted) unless there's an exclusive namelock held for the mailbox name. This is to avoid race conditions and simplify the use of mailboxes. Whenever a mailbox is opened, the caller holds a shared namelock on the mailbox name for the duration of the "mailbox object"'s existence.
All reads of a cyrus.index file must be done with a lock held, and all writes must be done with an exclusive lock held. This ensures CRC32 checksums of individual headers and records are always consistent. There are no direct "offset" reads done any more, instead the mailbox API provides a way to read an entire cyrus.index header or cyrus.index record into a struct, performing consistency checks. Writes are also done with a complete record struct.
The index header contains the following information, in order:
There are also spare fields in the index header, to allow for future expansion without forcing an upgrade of the file, and to round up to be divisible by 8 bytes.
These records start immediately following the cyrus.index header, and are all fixed size. They are in-order by uid of the message.
The message isn't delivered until the new index header is written. In case of a crash before the new index header is written, any previous writes will be overwritten on the next delivery (and will not be noticed by the readers).
Note that certain power failure situations (power failure in the middle of a disk sector write) could cause a mailbox to need reconstruction (possibly even losing some flag state). These failure modes are not possible in the "Hardware RAID disk model" (which we will describe somewhere else when we get around to it).