systemd

mirror of https://github.com/systemd/systemd.git synced 2026-06-30 19:57:29 +00:00

Go to file

Yu Watanabe 27556c03c2 journal: Prevent total log loss on unclean shutdown at high write rates (#42639 )

In Meta production we have been considering using journald more widely
for some time. One of the blockers to doing that which I have noticed is
that often journald seems to have vastly less data after lockups/power
failures compared to plain files, which is not great when debugging
outages.

On small write rates this tends to be hard to reproduce, but when
writing thousands of messages a second, an unclean shutdown can result
in the end result being an active journal file with a header that
records an arena larger than the data that actually reached disk. What
happens is then that journalctl then discards the entire file(!),
completely ignoring that there is a huge amount of data which is
actually perfectly readable.

The reason for that is that the journal header is updated on every
append, while the file size and newly written arena contents are only
made durable on the filesystem's own schedule. After a crash, the header
can therefore describe writes which were logically completed by journald
but whose backing data or file metadata never reached disk.

Take the following example of how this can happen at high log rates:

1. journald appends objects into an mmap()ed arena, periodically growing
the file with fallocate() in FILE_SIZE_INCREASE (8M) steps and advancing
the header's arena_size tail pointers as it goes along.
2. The header is dirtied on every append, and its arena_size is advanced
at each fallocate(). It is, from the kernel's perspective, an ordinary
data page and is only made durable by the kernel's periodic page cache
writeback on its own schedule. The file's length, by contrast, is
metadata, made durable only when the filesystem commits a transaction
(or on an fsync(), which journald does not issue between sync
intervals).
3. journald marks journals NOCOW, so the header's data block is
overwritten in place and is decoupled from the size metadata. Nothing
orders the two with respect to each other. Writeback therefore can
routinely persist a header whose arena_size has run ahead of the file
length recorded on disk.
4. Power is lost. On the next boot the persisted header reflects an
arena_size and tail pointers which have been advanced for appends.
However their payload and the file metadata were never committed, so
header_size + arena_size now points well past the end of the file as it
exists on disk.
5. journal_file_verify_header() then rejects this with -ENODATA:

if (... || header_size + arena_size > (uint64_t) f->last_stat.st_size)
            return -ENODATA;

That is correct when opening for writing, because we must not append to
a file whose recorded state we cannot trust, and the caller must rotate
it away. But the same check also runs on read only opens, where it is
actively harmful. In the case of journalctl, the entire file is skipped,
even though the data hash table, the field hash table, and the head of
the array all are present and fully intact, and the great majority of
entries are physically present. In fact, only a very small part of the
most recently written tail is missing, but everything before is
readable. This results in mistakenly rejecting the entire file as
corrupt.

This happens extremely frequently on machines with high write rates
during power cuts or lockups. In testing writing ~7500 msg/s through
journald and then cutting power, I reproduced it in ten out of ten
attempts across different machines.

In each case, the header was left claiming ~296M of arena while only
~192-208M had reached disk. In this case, journalctl reports that it has
recovered 0 of ~335000 messages. Whether a given crash trips the
condition depends on where it falls relative to the header's writeback,
but when it does, the loss today is total. After this patch the vast
majority of messages can be retrieved.

Let's fix this by keeping the rejection for writing, but for read-only
opens, let's just clamp the arena to the real file size and skip the
consistency checks on the now unreliable tail pointers. The reader will
walk the entry array chain from its intact head and stop at the
truncation point by the bounds check that already exists, so there's no
need to do any more than that there.

2026-06-26 02:49:59 +09:00

.clusterfuzzlite

…

.github

mkosi: update mkosi ref to f7762b71437227922a367bb89597843c77494ef9

2026-06-25 15:25:36 +02:00

.obs

obs: prepare ParticleOS images in workflow

2026-06-16 11:08:45 +01:00

.semaphore

semaphore: stop deleting all apt sources

2026-05-04 16:50:05 +01:00

catalog

journal: add catalog message for missing dlopen dep

2026-06-22 10:54:49 +01:00

coccinelle

tree-wide: standardize header names across src/fundamental, src/basic and src/shared

2026-05-21 10:33:03 +09:00

docs

sysupdate: notify hook subscribers after a successful update

2026-06-24 13:05:33 +02:00

factory

factory: do not install nsswitch.conf when nss is disabled

2025-11-25 10:48:31 +01:00

hostname-wordlist

hostname: improve the algorithm in hostname_pick_word()

2026-06-20 14:26:41 +02:00

hwdb.d

hwdb: map Brazilian ThinkPad T14 Gen 1 slash key to KEY_RO

2026-06-24 15:37:27 +01:00

LICENSES

tree-wide: standardize header names across src/fundamental, src/basic and src/shared

2026-05-21 10:33:03 +09:00

man

bootctl: add link-auto/LinkAuto and auto-link on completed system update

2026-06-24 13:05:34 +02:00

mime

mime: add mimetype for luks home dir

2025-03-07 17:27:20 +01:00

mkosi

mkosi: update mkosi ref to f7762b71437227922a367bb89597843c77494ef9

2026-06-25 15:25:36 +02:00

mkosi.extra/usr/lib/repart.d

mkosi: Grow the root partition on boot

2026-02-10 16:32:05 +01:00

modprobe.d

modprobe: set 'ifb numifbs=0' to avoid autocreating ifb0

2024-01-12 23:24:54 +00:00

network

network: enable LLDP for links that use only link-local addressing

2026-02-24 17:09:33 +01:00

po: Translated using Weblate (Spanish)

2026-06-25 12:17:55 +02:00

presets

sysext: refresh sysexts and confexts on completed system update

2026-06-24 13:05:34 +02:00

profile.d

profile.d: add instructions how to deactivate 80-systemd-osc-context.sh

2026-05-29 01:08:07 +09:00

rules.d

hostnamed: allow setting machine tags via udev rules (#42390 )

2026-06-19 20:38:42 +02:00

shell-completion

bootctl: add link-auto/LinkAuto and auto-link on completed system update

2026-06-24 13:05:34 +02:00

src

journal: Prevent total log loss on unclean shutdown at high write rates (#42639 )

2026-06-26 02:49:59 +09:00

sysctl.d

coredump: capture crashing thread ID and name

2026-03-18 10:27:27 +00:00

sysusers.d

imds: add new systemd-imdsd.service that makes IMDS data accessible locally

2026-03-26 10:54:15 +01:00

test

tpm2: add SWTPM fallback test, fixes and hardening (#42722 )

2026-06-25 17:13:03 +02:00

tmpfiles.d

report: add support for optionally signing reports

2026-06-19 05:23:18 +02:00

tools

tools: add script to print blurb for SPI yearly report

2026-06-15 20:45:36 +01:00

units

tpm2: add SWTPM fallback test, fixes and hardening (#42722 )

2026-06-25 17:13:03 +02:00

xorg

xorg/50-systemd-user: import XAUTHORITY only if set

2026-02-19 08:15:33 +01:00

.clang-format

clang-format: Add include sorting directives

2025-04-30 09:30:33 +02:00

.clang-tidy

treewide: get rid of remaing getopt/getopt_long stuff

2026-05-16 18:37:12 +02:00

.clangd

tree-wide: standardize header names across src/fundamental, src/basic and src/shared

2026-05-21 10:33:03 +09:00

.ctags

…

.dir-locals.el

emacs: add settings for Python modes

2026-05-18 02:51:41 +09:00

.editorconfig

chore: fix editorconfig pattern and add setting for zsh

2025-05-30 14:53:45 +09:00

.gitattributes

libc: Add kexec_file_load() syscall wrapper

2026-04-13 11:13:04 +02:00

.gitignore

gitignore: also ignore mkosi.local.conf in subdirectories

2026-06-04 17:50:22 +01:00

.mailmap

mailmap: name change

2026-04-18 14:24:47 +01:00

.packit.yml

mkosi: update fedora commit reference to 9cb09470c9c5a437f8e9c1e0e449b87de83733eb

2026-05-26 16:55:17 +02:00

.pylintrc

…

.vimrc

vimrc: explicitly set shiftwidth for the C file type

2023-09-18 13:11:45 +02:00

.ycm_extra_conf.py

ycm: apply "ruff format" and "ruff check --fix"

2026-05-18 02:32:42 +09:00

AGENTS.md

docs: Update AI usage policy

2026-06-17 09:59:34 +00:00

CITATION.cff

add CITATION.cff file

2025-06-05 14:39:20 +02:00

CLAUDE.md

Move AI instructions to AGENTS.md

2026-03-06 08:55:55 +01:00

LICENSE.GPL2

licensing: update address of FSF

2025-10-07 13:00:12 +01:00

LICENSE.LGPL2.1

licensing: update address of FSF

2025-10-07 13:00:12 +01:00

meson_options.txt

manager: make systemd+executor a multicall binary

2026-06-22 17:19:54 +02:00

meson.build

manager: make systemd+executor a multicall binary

2026-06-22 17:19:54 +02:00

meson.version

meson: switch version to 262~devel

2026-06-19 19:08:35 +01:00

mypy.ini

Move mypy.ini and ruff.toml to top level

2024-11-24 16:47:20 +01:00

NEWS

Add NEWS entry

2026-06-22 10:39:36 +02:00

README

README: bump required minimum version of musl to 1.2.6

2026-05-21 05:06:35 +09:00

README.md

README: note that we now have packages built from stable branch too

2026-02-09 09:36:53 +01:00

ruff.toml

ci/linter: stop ignoring lambda assignment lint

2026-05-18 09:43:36 +09:00

TODO.md

TODO: drop bootctl link + sysupdate integration item

2026-06-24 13:06:20 +02:00

README.md

System and Service Manager

Details

Most documentation is available on systemd's web site.

Assorted, older, general information about systemd can be found in the systemd Wiki.

Information about build requirements is provided in the README file.

Consult our NEWS file for information about what's new in the most recent systemd versions.

Please see the Code Map for information about this repository's layout and content.

Please see the Hacking guide for information on how to hack on systemd and test your modifications.

Please see our Contribution Guidelines for more information about filing GitHub Issues and posting GitHub Pull Requests.

When preparing patches for systemd, please follow our Coding Style Guidelines.

If you are looking for support, please contact our mailing list, join our IRC channel #systemd on libera.chat or Matrix channel

Stable branches with backported patches are available in the stable repo.

We have a security bug bounty program sponsored by the Sovereign Tech Fund hosted on YesWeHack

Repositories with distribution packages built from git main are available on OBS, and also repositories with packages built from the latest stable release