journal: Prevent total log loss on unclean shutdown at high write rates (#42639)

In Meta production we have been considering using journald more widely for some time. One of the blockers to doing that which I have noticed is that often journald seems to have vastly less data after lockups/power failures compared to plain files, which is not great when debugging outages. On small write rates this tends to be hard to reproduce, but when writing thousands of messages a second, an unclean shutdown can result in the end result being an active journal file with a header that records an arena larger than the data that actually reached disk. What happens is then that journalctl then discards the entire file(!), completely ignoring that there is a huge amount of data which is actually perfectly readable. The reason for that is that the journal header is updated on every append, while the file size and newly written arena contents are only made durable on the filesystem's own schedule. After a crash, the header can therefore describe writes which were logically completed by journald but whose backing data or file metadata never reached disk. Take the following example of how this can happen at high log rates: 1. journald appends objects into an mmap()ed arena, periodically growing the file with fallocate() in FILE_SIZE_INCREASE (8M) steps and advancing the header's arena_size tail pointers as it goes along. 2. The header is dirtied on every append, and its arena_size is advanced at each fallocate(). It is, from the kernel's perspective, an ordinary data page and is only made durable by the kernel's periodic page cache writeback on its own schedule. The file's length, by contrast, is metadata, made durable only when the filesystem commits a transaction (or on an fsync(), which journald does not issue between sync intervals). 3. journald marks journals NOCOW, so the header's data block is overwritten in place and is decoupled from the size metadata. Nothing orders the two with respect to each other. Writeback therefore can routinely persist a header whose arena_size has run ahead of the file length recorded on disk. 4. Power is lost. On the next boot the persisted header reflects an arena_size and tail pointers which have been advanced for appends. However their payload and the file metadata were never committed, so header_size + arena_size now points well past the end of the file as it exists on disk. 5. journal_file_verify_header() then rejects this with -ENODATA: if (... || header_size + arena_size > (uint64_t) f->last_stat.st_size) return -ENODATA; That is correct when opening for writing, because we must not append to a file whose recorded state we cannot trust, and the caller must rotate it away. But the same check also runs on read only opens, where it is actively harmful. In the case of journalctl, the entire file is skipped, even though the data hash table, the field hash table, and the head of the array all are present and fully intact, and the great majority of entries are physically present. In fact, only a very small part of the most recently written tail is missing, but everything before is readable. This results in mistakenly rejecting the entire file as corrupt. This happens extremely frequently on machines with high write rates during power cuts or lockups. In testing writing ~7500 msg/s through journald and then cutting power, I reproduced it in ten out of ten attempts across different machines. In each case, the header was left claiming ~296M of arena while only ~192-208M had reached disk. In this case, journalctl reports that it has recovered 0 of ~335000 messages. Whether a given crash trips the condition depends on where it falls relative to the header's writeback, but when it does, the loss today is total. After this patch the vast majority of messages can be retrieved. Let's fix this by keeping the rejection for writing, but for read-only opens, let's just clamp the arena to the real file size and skip the consistency checks on the now unreliable tail pointers. The reader will walk the entry array chain from its intact head and stop at the truncation point by the bounds check that already exists, so there's no need to do any more than that there.
2026-06-30 19:57:29 +00:00 · 2026-06-26 02:49:59 +09:00
parent a38b4c39ff f6a6fc014c
commit 27556c03c2
2 changed files with 381 additions and 6 deletions
--- a/src/libsystemd/sd-journal/journal-file.c
+++ b/src/libsystemd/sd-journal/journal-file.c
@@ -590,10 +590,39 @@ static int journal_file_verify_header(JournalFile *f) {

        arena_size = le64toh(READ_NOW(f->header->arena_size));

-        if (UINT64_MAX - header_size < arena_size || header_size + arena_size > (uint64_t) f->last_stat.st_size)
+        if (UINT64_MAX - header_size < arena_size)
                return -ENODATA;

+        uint64_t file_size = (uint64_t) f->last_stat.st_size;
+
+        /* Probably an unclean shutdown where the header was written, but the arena data was not. On write we
+         * should ask the caller to rotate, but on read, we can still work it out with bounds checks. */
+        bool truncated = false;
+        if (header_size + arena_size > file_size) {
+                if (journal_file_writable(f))
+                        return -ENODATA;
+
+                /* This shouldn't happen given file_size is page aligned via fallocate(), but just in case
+                 * things are _really_ messed up... */
+                uint64_t available = ALIGN_DOWN_U64(file_size, sizeof(uint64_t));
+                if (header_size > available || available - header_size < offsetof(ObjectHeader, payload))
+                        return -ENODATA;
+
+                log_debug("Journal file %s claims a %" PRIu64 " byte arena but is only %" PRIu64
+                          " bytes on disk, clamping for recovery.",
+                          f->path,
+                          arena_size,
+                          file_size);
+                arena_size = available - header_size;
+                truncated = true;
+        }
+
        uint64_t tail_object_offset = le64toh(f->header->tail_object_offset);
+        if (truncated)
+                /* The tail may be in the lost region, so cap it at the last possible object header start. */
+                tail_object_offset = MIN(
+                                tail_object_offset,
+                                header_size + arena_size - offsetof(ObjectHeader, payload));
        if (!offset_is_valid(tail_object_offset, header_size, UINT64_MAX))
                return -ENODATA;
        if (header_size + arena_size < tail_object_offset)
@@ -615,7 +644,7 @@ static int journal_file_verify_header(JournalFile *f) {
        if (!offset_is_valid(entry_array_offset, header_size, tail_object_offset))
                return -ENODATA;

-        if (JOURNAL_HEADER_CONTAINS(f->header, tail_entry_array_offset)) {
+        if (!truncated && JOURNAL_HEADER_CONTAINS(f->header, tail_entry_array_offset)) {
                uint32_t offset = le32toh(f->header->tail_entry_array_offset);
                uint32_t n = le32toh(f->header->tail_entry_array_n_entries);

@@ -632,7 +661,7 @@ static int journal_file_verify_header(JournalFile *f) {
                        return -ENODATA;
        }

-        if (JOURNAL_HEADER_CONTAINS(f->header, tail_entry_offset)) {
+        if (!truncated && JOURNAL_HEADER_CONTAINS(f->header, tail_entry_offset)) {
                uint64_t offset = le64toh(f->header->tail_entry_offset);

                if (!offset_is_valid(offset, header_size, tail_object_offset))
@@ -664,7 +693,7 @@ static int journal_file_verify_header(JournalFile *f) {

        /* Verify number of objects */
        uint64_t n_objects = le64toh(f->header->n_objects);
-        if (n_objects > arena_size / offsetof(ObjectHeader, payload))
+        if (!truncated && n_objects > arena_size / offsetof(ObjectHeader, payload))
                return -ENODATA;

        uint64_t n_entries = le64toh(f->header->n_entries);
@@ -1517,6 +1546,18 @@ static int get_next_hash_offset(
        return 0;
 }

+static bool chain_tail_lost(JournalFile *f, int r, uint64_t offset, ObjectType type) {
+        assert(f);
+
+        /* Only accept truncation when reading. On writing it's not possible to know how to safely proceed. */
+        if (journal_file_writable(f) || !IN_SET(r, -EBADMSG, -EADDRNOTAVAIL))
+                return false;
+
+        log_debug_errno(r, "Failed to read %s at offset %" PRIu64 " of %s, treating it as the end of the chain: %m",
+                        journal_object_type_to_string(type), offset, f->path);
+        return true;
+}
+
 int journal_file_find_field_object_with_hash(
                JournalFile *f,
                const void *field,
@@ -1554,6 +1595,8 @@ int journal_file_find_field_object_with_hash(
                Object *o;

                r = journal_file_move_to_object(f, OBJECT_FIELD, p, &o);
+                if (chain_tail_lost(f, r, p, OBJECT_FIELD))
+                        break;
                if (r < 0)
                        return r;

@@ -1655,6 +1698,8 @@ int journal_file_find_data_object_with_hash(
                size_t rsize;

                r = journal_file_move_to_object(f, OBJECT_DATA, p, &o);
+                if (chain_tail_lost(f, r, p, OBJECT_DATA))
+                        break;
                if (r < 0)
                        return r;

@@ -3070,6 +3115,10 @@ static int generic_array_bisect(
                uint64_t left, right, k, m, m_original;

                r = journal_file_move_to_object(f, OBJECT_ENTRY_ARRAY, a, &array);
+                if (chain_tail_lost(f, r, a, OBJECT_ENTRY_ARRAY)) {
+                        r = TEST_GOTO_PREVIOUS;
+                        goto previous;
+                }
                if (r < 0)
                        return r;

@@ -4576,10 +4625,17 @@ int journal_file_get_cutoff_realtime_usec(JournalFile *f, usec_t *ret_from, usec
        }

        if (ret_to) {
-                if (f->header->tail_entry_realtime == 0)
+                Object *o;
+                int r;
+
+                /* The header may be stale on unclean shutdown, so don't trust it. */
+                r = journal_file_next_entry(f, 0, DIRECTION_UP, &o, NULL);
+                if (r < 0)
+                        return r;
+                if (r == 0)
                        return -ENOENT;

-                *ret_to = le64toh(f->header->tail_entry_realtime);
+                *ret_to = le64toh(o->entry.realtime);
        }

        return 1;
--- a/src/libsystemd/sd-journal/test-journal.c
+++ b/src/libsystemd/sd-journal/test-journal.c
@@ -11,6 +11,7 @@
 #include "journal-vacuum.h"
 #include "log.h"
 #include "rm-rf.h"
+#include "stdio-util.h"
 #include "tests.h"
 #include "time-util.h"

@@ -264,6 +265,324 @@ TEST(min_compress_size) {
 }
 #endif

+typedef struct EntryArrayCut {
+        uint64_t offset;
+        uint64_t last_surviving_seqnum;
+} EntryArrayCut;
+
+static EntryArrayCut find_entry_array_cut(JournalFile *f, uint64_t first) {
+        uint64_t a, n = 0, last_surviving_seqnum = 0;
+        Object *o, *entry;
+
+        for (a = first; a > 0;) {
+                ASSERT_OK_ZERO(journal_file_move_to_object(f, OBJECT_ENTRY_ARRAY, a, &o));
+                n++;
+
+                uint64_t next = le64toh(o->entry_array.next_entry_array_offset);
+                if (next == 0)
+                        break;
+
+                uint64_t k = journal_file_entry_array_n_items(f, o);
+                ASSERT_GT(k, UINT64_C(0));
+
+                ASSERT_OK_ZERO(journal_file_move_to_object(
+                                f,
+                                OBJECT_ENTRY,
+                                journal_file_entry_array_item(f, o, k - 1),
+                                &entry));
+                last_surviving_seqnum = le64toh(entry->entry.seqnum);
+                a = next;
+        }
+
+        /* We need at least two arrays, so that lopping off the final one still leaves a readable prefix. */
+        ASSERT_GE(n, UINT64_C(2));
+        ASSERT_GT(last_surviving_seqnum, UINT64_C(0));
+
+        return (EntryArrayCut) {
+                .offset = a,
+                .last_surviving_seqnum = last_surviving_seqnum,
+        };
+}
+
+static void test_recover_truncated_linear_one(bool zeroed_tail) {
+        _cleanup_(mmap_cache_unrefp) MMapCache *m = NULL;
+        dual_timestamp ts;
+        JournalFile *f;
+        Object *o;
+        EntryArrayCut cut;
+        uint64_t p, file_size, c;
+        usec_t from, to, last_realtime = 0;
+        char t[] = "/var/tmp/journal-XXXXXX";
+
+        /* When a journal's header records more arena than reached disk, make sure reads recover the on disk
+         * prefix. */
+
+        ASSERT_NOT_NULL(m = mmap_cache_new());
+        mkdtemp_chdir_chattr(t);
+
+        ASSERT_OK_ZERO(journal_file_open(
+                        -EBADF, "test.journal", O_RDWR|O_CREAT, JOURNAL_COMPRESS, 0666, UINT64_MAX,
+                        /* metrics= */ NULL, m, /* template= */ NULL, &f));
+        dual_timestamp_now(&ts);
+
+        for (unsigned i = 0; i < 200; i++) {
+                struct iovec iovec = IOVEC_MAKE_STRING("LINE=x");
+                ts.realtime = i + 1;
+                ASSERT_OK_ZERO(journal_file_append_entry(
+                                f, &ts, /* boot_id= */ NULL, &iovec, 1,
+                                /* seqnum= */ NULL, /* seqnum_id= */ NULL,
+                                /* ret_object= */ NULL, /* ret_offset= */ NULL));
+        }
+
+        cut = find_entry_array_cut(f, le64toh(f->header->entry_array_offset));
+        file_size = (uint64_t) f->last_stat.st_size;
+        ASSERT_GT(file_size, cut.offset);
+        (void) journal_file_offline_close(f);
+
+        /* Turn the last entry to crowfood. */
+        ASSERT_OK_ERRNO(truncate("test.journal", (int64_t) cut.offset));
+        if (zeroed_tail)
+                ASSERT_OK_ERRNO(truncate("test.journal", (int64_t) file_size));
+
+        ASSERT_OK_ZERO(journal_file_open(
+                        -EBADF, "test.journal", O_RDONLY, JOURNAL_COMPRESS, 0666, UINT64_MAX,
+                        /* metrics= */ NULL, m, /* template= */ NULL, &f));
+
+        c = 0;
+        for (p = 0; journal_file_next_entry(f, p, DIRECTION_DOWN, &o, &p) > 0;) {
+                c++;
+                ASSERT_EQ(le64toh(o->entry.seqnum), c);
+                last_realtime = le64toh(o->entry.realtime);
+        }
+        ASSERT_EQ(c, cut.last_surviving_seqnum);
+        ASSERT_LT(c, UINT64_C(200)); /* We did not recover the tail that was cut. */
+
+        ASSERT_EQ(journal_file_get_cutoff_realtime_usec(f, &from, &to), 1);
+        ASSERT_EQ(from, UINT64_C(1));
+        ASSERT_EQ(to, last_realtime);
+
+        /* Direct access to the zeroed region must fail. */
+        if (zeroed_tail)
+                ASSERT_ERROR(journal_file_move_to_object(f, OBJECT_UNUSED, cut.offset, &o), EBADMSG);
+
+        (void) journal_file_close(f);
+
+        /* A file with arena past EOF can't be written to safely. However, if the tail was allocated but
+         * zeroed, the header is self consistent, so writable opens are accepted (and are handled by
+         * STATE_ONLINE instead). */
+        if (zeroed_tail) {
+                struct iovec iovec = IOVEC_MAKE_STRING("LINE=new");
+
+                ASSERT_OK_ZERO(journal_file_open(
+                                -EBADF, "test.journal", O_RDWR, JOURNAL_COMPRESS, 0666, UINT64_MAX,
+                                /* metrics= */ NULL, m, /* template= */ NULL, &f));
+
+                /* The header is consistent, but when we write we should find the real tail and fail. */
+                ASSERT_ERROR(journal_file_append_entry(
+                                f, &ts, /* boot_id= */ NULL, &iovec, 1,
+                                /* seqnum= */ NULL, /* seqnum_id= */ NULL,
+                                /* ret_object= */ NULL, /* ret_offset= */ NULL), EBADMSG);
+
+                (void) journal_file_offline_close(f);
+        } else
+                ASSERT_ERROR(journal_file_open(
+                                -EBADF, "test.journal", O_RDWR, JOURNAL_COMPRESS, 0666, UINT64_MAX,
+                                /* metrics= */ NULL, m, /* template= */ NULL, &f), ENODATA);
+
+        if (arg_keep)
+                log_info("Not removing %s", t);
+        else
+                ASSERT_OK(rm_rf(t, REMOVE_ROOT | REMOVE_PHYSICAL));
+}
+
+TEST(recover_truncated_linear) {
+        ASSERT_OK_ERRNO(setenv("SYSTEMD_JOURNAL_COMPACT", "0", 1));
+        test_recover_truncated_linear_one(/* zeroed_tail= */ false);
+        test_recover_truncated_linear_one(/* zeroed_tail= */ true);
+
+        ASSERT_OK_ERRNO(setenv("SYSTEMD_JOURNAL_COMPACT", "1", 1));
+        test_recover_truncated_linear_one(/* zeroed_tail= */ false);
+        test_recover_truncated_linear_one(/* zeroed_tail= */ true);
+}
+
+static void test_recover_truncated_indexed_one(bool zeroed_tail) {
+        _cleanup_(mmap_cache_unrefp) MMapCache *m = NULL;
+        dual_timestamp ts;
+        JournalFile *f;
+        Object *o, *d;
+        EntryArrayCut cut;
+        uint64_t file_size;
+        static const char field[] = "FOO=bar";
+        char t[] = "/var/tmp/journal-XXXXXX";
+
+        /* The same vague idea as above with the truncation, but this time it's the bisection case since the
+         * per-data entry array is missing. */
+
+        ASSERT_NOT_NULL(m = mmap_cache_new());
+        mkdtemp_chdir_chattr(t);
+
+        ASSERT_OK_ZERO(journal_file_open(
+                        -EBADF, "test.journal", O_RDWR|O_CREAT, JOURNAL_COMPRESS, 0666, UINT64_MAX,
+                        /* metrics= */ NULL, m, /* template= */ NULL, &f));
+        dual_timestamp_now(&ts);
+
+        for (unsigned i = 0; i < 200; i++) {
+                struct iovec iovec = IOVEC_MAKE_STRING(field);
+                ASSERT_OK_ZERO(journal_file_append_entry(
+                                f, &ts, /* boot_id= */ NULL, &iovec, 1,
+                                /* seqnum= */ NULL, /* seqnum_id= */ NULL,
+                                /* ret_object= */ NULL, /* ret_offset= */ NULL));
+        }
+
+        ASSERT_EQ(journal_file_find_data_object(f, field, strlen(field), &d, /* ret_offset= */ NULL), 1);
+        cut = find_entry_array_cut(f, le64toh(d->data.entry_array_offset));
+        file_size = (uint64_t) f->last_stat.st_size;
+        ASSERT_GT(file_size, cut.offset);
+        (void) journal_file_offline_close(f);
+
+        /* Remove the final per-data array object. The data object's n_entries now overcounts the per-data
+         * arrays on disk */
+        ASSERT_OK_ERRNO(truncate("test.journal", (int64_t) cut.offset));
+        if (zeroed_tail)
+                ASSERT_OK_ERRNO(truncate("test.journal", (int64_t) file_size));
+
+        ASSERT_OK_ZERO(journal_file_open(
+                        -EBADF, "test.journal", O_RDONLY, JOURNAL_COMPRESS, 0666, UINT64_MAX,
+                        /* metrics= */ NULL, m, /* template= */ NULL, &f));
+        ASSERT_EQ(journal_file_find_data_object(f, field, strlen(field), &d, /* ret_offset= */ NULL), 1);
+
+        /* Seeking up for the largest possible seqnum walks into the missing tail array, and must fall back
+         * to the last entry that actually survived rather than failing the query. */
+        ASSERT_EQ(journal_file_move_to_entry_by_seqnum_for_data(
+                        f, d, UINT64_MAX, DIRECTION_UP, &o, /* ret_offset= */ NULL), 1);
+        ASSERT_EQ(le64toh(o->entry.seqnum), cut.last_surviving_seqnum);
+        ASSERT_LT(le64toh(o->entry.seqnum), UINT64_C(200));
+
+        /* Seeking down past everything that survived also walks into the missing array, and must report no
+         * match rather than propagating the read error to the caller. */
+        ASSERT_OK_ZERO(journal_file_move_to_entry_by_seqnum_for_data(
+                        f, d, UINT64_MAX, DIRECTION_DOWN, &o, /* ret_offset= */ NULL));
+
+        /* A seqnum that only existed in the lost region must degrade to no match, not a false hit (on an
+         * intact file this seqnum would have matched). */
+        ASSERT_OK_ZERO(journal_file_move_to_entry_by_seqnum_for_data(
+                        f, d, cut.last_surviving_seqnum + 1, DIRECTION_DOWN, &o, /* ret_offset= */ NULL));
+
+        /* The head of the chain is intact, so a downward seek from the start still finds the first entry. */
+        ASSERT_EQ(journal_file_move_to_entry_by_seqnum_for_data(
+                        f, d, 0, DIRECTION_DOWN, &o, /* ret_offset= */ NULL), 1);
+        ASSERT_EQ(le64toh(o->entry.seqnum), UINT64_C(1));
+
+        (void) journal_file_close(f);
+
+        if (arg_keep)
+                log_info("Not removing %s", t);
+        else
+                ASSERT_OK(rm_rf(t, REMOVE_ROOT | REMOVE_PHYSICAL));
+}
+
+TEST(recover_truncated_indexed) {
+        ASSERT_OK_ERRNO(setenv("SYSTEMD_JOURNAL_COMPACT", "0", 1));
+        test_recover_truncated_indexed_one(/* zeroed_tail= */ false);
+        test_recover_truncated_indexed_one(/* zeroed_tail= */ true);
+
+        ASSERT_OK_ERRNO(setenv("SYSTEMD_JOURNAL_COMPACT", "1", 1));
+        test_recover_truncated_indexed_one(/* zeroed_tail= */ false);
+        test_recover_truncated_indexed_one(/* zeroed_tail= */ true);
+}
+
+static void test_recover_truncated_hash_chain_one(bool field, bool zeroed_tail) {
+        _cleanup_(mmap_cache_unrefp) MMapCache *m = NULL;
+        dual_timestamp ts;
+        JournalFile *f;
+        uint64_t lost_offset, file_size, buckets, bucket;
+        char lost_key[64], lost_value[64], t[] = "/var/tmp/journal-XXXXXX";
+
+        /* A lookup must tolerate a hash bucket whose tail node was lost to truncation, returning the
+         * surviving prefix instead of failing. The lost value is chosen to share a bucket with the surviving
+         * one, so it is chained behind the surviving head rather than heading its own bucket. */
+
+        const char *survives_value = field ? "FIELD0=x" : "FOO=survives";
+        const char *survives_key   = field ? "FIELD0"   : "FOO=survives";
+
+        ASSERT_NOT_NULL(m = mmap_cache_new());
+        mkdtemp_chdir_chattr(t);
+
+        ASSERT_OK_ZERO(journal_file_open(
+                        -EBADF, "test.journal", O_RDWR|O_CREAT, JOURNAL_COMPRESS, 0666, UINT64_MAX,
+                        /* metrics= */ NULL, m, /* template= */ NULL, &f));
+        dual_timestamp_now(&ts);
+
+        /* journal_file_hash_data() is keyed per-file, so compute the colliding value against the open file. */
+        buckets = le64toh(field ? f->header->field_hash_table_size : f->header->data_hash_table_size) / sizeof(HashItem);
+        bucket = journal_file_hash_data(f, survives_key, strlen(survives_key)) % buckets;
+        for (uint64_t i = 1;; i++) {
+                ASSERT_LT(i, UINT64_C(1000000));
+                if (field)
+                        xsprintf(lost_key, "FIELD%" PRIu64, i);
+                else
+                        xsprintf(lost_key, "FOO=%" PRIu64, i);
+                if (journal_file_hash_data(f, lost_key, strlen(lost_key)) % buckets == bucket)
+                        break;
+        }
+        if (field)
+                xsprintf(lost_value, "%s=x", lost_key);
+        else
+                strcpy(lost_value, lost_key);
+
+        const char *v;
+        FOREACH_ARGUMENT(v, survives_value, lost_value) {
+                struct iovec iovec = IOVEC_MAKE_STRING(v);
+                ASSERT_OK_ZERO(journal_file_append_entry(
+                                f, &ts, /* boot_id= */ NULL, &iovec, 1,
+                                /* seqnum= */ NULL, /* seqnum_id= */ NULL,
+                                /* ret_object= */ NULL, /* ret_offset= */ NULL));
+        }
+
+        ASSERT_EQ(field ?
+                  journal_file_find_field_object(f, lost_key, strlen(lost_key), NULL, &lost_offset) :
+                  journal_file_find_data_object(f, lost_key, strlen(lost_key), NULL, &lost_offset), 1);
+        file_size = (uint64_t) f->last_stat.st_size;
+        ASSERT_GT(file_size, lost_offset);
+        (void) journal_file_offline_close(f);
+
+        /* Lose the second object's body, but keep the bucket pointer that still references it. */
+        ASSERT_OK_ERRNO(truncate("test.journal", (int64_t) lost_offset));
+        if (zeroed_tail)
+                ASSERT_OK_ERRNO(truncate("test.journal", (int64_t) file_size));
+
+        ASSERT_OK_ZERO(journal_file_open(
+                        -EBADF, "test.journal", O_RDONLY, JOURNAL_COMPRESS, 0666, UINT64_MAX,
+                        /* metrics= */ NULL, m, /* template= */ NULL, &f));
+
+        ASSERT_EQ(field ?
+                  journal_file_find_field_object(f, survives_key, strlen(survives_key), NULL, NULL) :
+                  journal_file_find_data_object(f, survives_key, strlen(survives_key), NULL, NULL), 1);
+        ASSERT_OK_ZERO(field ?
+                       journal_file_find_field_object(f, lost_key, strlen(lost_key), NULL, NULL) :
+                       journal_file_find_data_object(f, lost_key, strlen(lost_key), NULL, NULL));
+
+        (void) journal_file_close(f);
+
+        if (arg_keep)
+                log_info("Not removing %s", t);
+        else
+                ASSERT_OK(rm_rf(t, REMOVE_ROOT | REMOVE_PHYSICAL));
+}
+
+TEST(recover_truncated_hash_chain) {
+        const char *compact;
+
+        FOREACH_ARGUMENT(compact, "0", "1") {
+                ASSERT_OK_ERRNO(setenv("SYSTEMD_JOURNAL_COMPACT", compact, 1));
+
+                test_recover_truncated_hash_chain_one(/* field= */ false, /* zeroed_tail= */ false);
+                test_recover_truncated_hash_chain_one(/* field= */ false, /* zeroed_tail= */ true);
+                test_recover_truncated_hash_chain_one(/* field= */ true, /* zeroed_tail= */ false);
+                test_recover_truncated_hash_chain_one(/* field= */ true, /* zeroed_tail= */ true);
+        }
+}
+
 static int intro(void) {
        arg_keep = saved_argc > 1;