Bug 40873 - Make make-initrd build reproducible initrd images
Summary: Make make-initrd build reproducible initrd images
Status: CLOSED FIXED
Alias: None
Product: Sisyphus
Classification: Development
Component: make-initrd (show other bugs)
Version: unstable
Hardware: all Linux
: P5 enhancement
Assignee: Alexey Gladkov
QA Contact: qa-sisyphus
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-07 18:46 MSK by Vladimir D. Seleznev
Modified: 2021-09-11 19:07 MSK (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vladimir D. Seleznev 2021-09-07 18:46:06 MSK
For now make-initrd generates non-reproducible initrd images when it is running on the same environment (the same kernel, the same installed packages, the same configuration, etc). Initrd image builds within one environment should be reproducible.
Comment 1 Alexey Gladkov 2021-09-07 18:54:39 MSK
Please explain what you mean when you say "reproducible" ?
Comment 2 Vladimir D. Seleznev 2021-09-07 19:03:49 MSK
For now I got:

# make-initrd
...
# b2sum /boot/initrd-std-def.img 
fe1a5a2fad9a6943982a5a88931c58aa1a753ea3b3287b13f275667d05972c2d75e74d107b133a81d415dd08d1e93ebd83d775fd4ae91ead362c82c62a4fd782  /boot/initrd-std-def.img

Let's regenerate it:

# make-initrd
...
# b2sum /boot/initrd-std-def.img 
03fa3052bab13ae8b566bca21e67f9bc318a3a9924990172041eafc2fa3d565832046af018f06e34574d1ae4ccfda17b24c13eda81353740b19804e3c32bd5ea  /boot/initrd-std-def.img

The images differ. Its content may be the same but it can be sorted differently, or it caused by some timestamp, or some other causes.
Comment 3 Alexey Gladkov 2021-09-07 19:13:25 MSK
About the order of the files, I probably agree. This might be a good thing.
But I disagree about the same timestamp. I see no reason to keep them.
Comment 4 Alexey Gladkov 2021-09-07 20:19:20 MSK
I looked at the code for a bit. Some features generate new files for initramfs image and not just copy existing ones from the system. In this case, saving the timestamp is not possible without the presence of a previous version of initramfs.

With that said, I don't think it is possible to implement reproducible initramfs images.
Comment 5 Vladimir D. Seleznev 2021-09-08 08:32:33 MSK
> I looked at the code for a bit. Some features generate new files for initramfs image and not just copy existing ones from the system.

Sure, but the timestamp could be faked. Many projects fake timestamps to achieve the goal of reproducibility. While many of them use SOURCE_DATE_EPOCH [1] (make-initrd could support it too), I propose to take mtime of the kernel image file for which initrd image is generated.

[1] https://reproducible-builds.org/docs/source-date-epoch/
Comment 6 Alexey Gladkov 2021-09-08 11:45:56 MSK
Of course I could fake timestamp wherever files are created or copied. But the fact is that the code is not ready for this and I see no point in doing it for no good reason. You have not provided a good rationale for this.

Moreover it seems to me wrong to use timestamp of kernel for initramfs content. They are not related in any way. Why would they be the same? We can use 1-1-1970 or any other random date for content with the same effect.
Comment 7 Vladimir D. Seleznev 2021-09-08 14:14:45 MSK
(In reply to Alexey Gladkov from comment #6)
> Of course I could fake timestamp wherever files are created or copied. But
> the fact is that the code is not ready for this and I see no point in doing
> it for no good reason. You have not provided a good rationale for this.

Reproducibility is valuable by itself, but here's more practical reason: to make installation and bootable images that contain initrd reproducible that anyone can easily verify it was not infected by some side software during the build. There are many steps to achieve this but it is not possible without reproducible initrd images.

> Moreover it seems to me wrong to use timestamp of kernel for initramfs
> content. They are not related in any way. Why would they be the same? We can
> use 1-1-1970 or any other random date for content with the same effect.

Sure *they are* related in some way: you build initrd image for the particular kernel, and usually it contains some modules for that kernel. You can try to load some kernel with an initrd built for another one but it has a little sense: the modules packed in that initrd most likely cannot be loaded with this kernel. Anyway we need to chose some timestamp, and at last there is some reasoning for the proposed one.
Comment 8 Alexey Gladkov 2021-09-08 14:46:23 MSK
(Ответ для Vladimir D. Seleznev на комментарий #7)
> Reproducibility is valuable by itself, but here's more practical reason: to
> make installation and bootable images that contain initrd reproducible that
> anyone can easily verify it was not infected by some side software during
> the build. There are many steps to achieve this but it is not possible
> without reproducible initrd images.

To verify that initramfs was not infected you need to get checksum after _each_ initramfs creation and check it every boot. This checksum will be the same between rebuilds. Reproducibility has nothing to do with this problem.

I still don't hear any arguments why implement this. I'm still not convinced of the need to implement it.

Please give me a real life usecase that cannot be solved without reproducible initramfs.

> Sure *they are* related in some way: you build initrd image for the
> particular kernel, and usually it contains some modules for that kernel. You
> can try to load some kernel with an initrd built for another one but it has
> a little sense: the modules packed in that initrd most likely cannot be
> loaded with this kernel. Anyway we need to chose some timestamp, and at last
> there is some reasoning for the proposed one.

No, they are not related. Or you don't understand the essence of ctime/mtime. We create another new image with _new_ files (not only copied from the system). ctime should reflect the actual creation time and not mislead that the initramfs was created when the kernel was compiled. That's just wrong. The more I think about it, the less I like this idea in general.

BTW I have feature request to generate initramfs with modules for multiple kernels. In this case, it is not at all clear which timestamp to use. I definitely won't use the kernel as a source of timestamp.

P.S. Please stop reopen this bug until you convince me or I'll just stop responding.
Comment 9 Vladimir D. Seleznev 2021-09-08 15:08:52 MSK
(In reply to Alexey Gladkov from comment #8)
> (Ответ для Vladimir D. Seleznev на комментарий #7)
> > Reproducibility is valuable by itself, but here's more practical reason: to
> > make installation and bootable images that contain initrd reproducible that
> > anyone can easily verify it was not infected by some side software during
> > the build. There are many steps to achieve this but it is not possible
> > without reproducible initrd images.
> 
> To verify that initramfs was not infected you need to get checksum after
> _each_ initramfs creation and check it every boot. This checksum will be the
> same between rebuilds. Reproducibility has nothing to do with this problem.

No, you have been misleaded: I've talked about reproducibility of installation and bootable images which cannot be achieved without initrd image reproducibility. In fact, if someone could infect my initrd, she probably could modify my checksum prog that I could not notice that infection.

The reproducibility of installation images, on the other hand, is very convenient to test different build environments for suspicious side effects, for example.  

> I still don't hear any arguments why implement this. I'm still not convinced
> of the need to implement it.

I gave it above.

> Please give me a real life usecase that cannot be solved without
> reproducible initramfs.

For now you cannot build reproducible bootable/installation image because you also need to build initramfs for it.

> > Sure *they are* related in some way: you build initrd image for the
> > particular kernel, and usually it contains some modules for that kernel. You
> > can try to load some kernel with an initrd built for another one but it has
> > a little sense: the modules packed in that initrd most likely cannot be
> > loaded with this kernel. Anyway we need to chose some timestamp, and at last
> > there is some reasoning for the proposed one.
> 
> No, they are not related. Or you don't understand the essence of
> ctime/mtime. We create another new image with _new_ files (not only copied
> from the system). ctime should reflect the actual creation time and not
> mislead that the initramfs was created when the kernel was compiled. That's
> just wrong. The more I think about it, the less I like this idea in general.

To be clear: I'm not talking about ctime/mtime of the initramfs itself, it's all about timestamps that are packed inside of it. The [cm]time of initramfs do not relate to the reproducibility.

> BTW I have feature request to generate initramfs with modules for multiple
> kernels. In this case, it is not at all clear which timestamp to use. I
> definitely won't use the kernel as a source of timestamp.

For that you can pick the newest one.

> P.S. Please stop reopen this bug until you convince me or I'll just stop
> responding.

Ok, but how will I know that I've convinced you?
Comment 10 Alexey Gladkov 2021-09-08 17:56:42 MSK
(Ответ для Vladimir D. Seleznev на комментарий #9)
> > To verify that initramfs was not infected you need to get checksum after
> > _each_ initramfs creation and check it every boot. This checksum will be the
> > same between rebuilds. Reproducibility has nothing to do with this problem.
> 
> No, you have been misleaded: I've talked about reproducibility of
> installation and bootable images which cannot be achieved without initrd
> image reproducibility. In fact, if someone could infect my initrd, she
> probably could modify my checksum prog that I could not notice that
> infection.
> 
> The reproducibility of installation images, on the other hand, is very
> convenient to test different build environments for suspicious side effects,
> for example.  

You are reinventing the wheel. We already have a mechanism to prevent initramfs and kernel spoofing - secure boot. The entire chain, starting from bios, will be certified. Grub2 can check signatures [1] if you do it for your self. And this solution does not require any additional make-initrd changes. I'm not an expert in secure boot, but kernel and initramfs verification should be done at a higher level - bootloader.

And again this has nothing to do with reproducibility. To check whether the generated image has been modified, you do not need to recreate it at all. You need a checksum from the content and you need to keep this checksum separately in a safe place.

[1] https://www.gnu.org/software/grub/manual/grub/html_node/Using-digital-signatures.html#Using-digital-signatures

> > Please give me a real life usecase that cannot be solved without
> > reproducible initramfs.
> 
> For now you cannot build reproducible bootable/installation image because
> you also need to build initramfs for it.

You need to generate initramfs for this particular hardware configuration anyway.
Regenerating initramfs for validation doesn't make sense to me.

> > > Sure *they are* related in some way: you build initrd image for the
> > > particular kernel, and usually it contains some modules for that kernel. You
> > > can try to load some kernel with an initrd built for another one but it has
> > > a little sense: the modules packed in that initrd most likely cannot be
> > > loaded with this kernel. Anyway we need to chose some timestamp, and at last
> > > there is some reasoning for the proposed one.
> > 
> > No, they are not related. Or you don't understand the essence of
> > ctime/mtime. We create another new image with _new_ files (not only copied
> > from the system). ctime should reflect the actual creation time and not
> > mislead that the initramfs was created when the kernel was compiled. That's
> > just wrong. The more I think about it, the less I like this idea in general.
> 
> To be clear: I'm not talking about ctime/mtime of the initramfs itself, it's
> all about timestamps that are packed inside of it. The [cm]time of initramfs
> do not relate to the reproducibility.

I know how to implement a SOURCE_DATE_EPOCH even for out-of-tree features, but I don't see any usecases for this.

> Ok, but how will I know that I've convinced you?

It will be then when I agree, and I'll write about it.
Comment 11 Vladimir D. Seleznev 2021-09-08 21:37:31 MSK
(In reply to Alexey Gladkov from comment #10)
> (Ответ для Vladimir D. Seleznev на комментарий #9)
> > > To verify that initramfs was not infected you need to get checksum after
> > > _each_ initramfs creation and check it every boot. This checksum will be the
> > > same between rebuilds. Reproducibility has nothing to do with this problem.
> > 
> > No, you have been misleaded: I've talked about reproducibility of
> > installation and bootable images which cannot be achieved without initrd
> > image reproducibility. In fact, if someone could infect my initrd, she
> > probably could modify my checksum prog that I could not notice that
> > infection.
> > 
> > The reproducibility of installation images, on the other hand, is very
> > convenient to test different build environments for suspicious side effects,
> > for example.  
> 
> You are reinventing the wheel. We already have a mechanism to prevent
> initramfs and kernel spoofing - secure boot. The entire chain, starting from
> bios, will be certified. Grub2 can check signatures [1] if you do it for
> your self. And this solution does not require any additional make-initrd
> changes. I'm not an expert in secure boot, but kernel and initramfs
> verification should be done at a higher level - bootloader.

No, I'm not reinventing the wheel: the boot process is not my concern here, I already wrote that. The boot process is not my concern, I'm trying to solve another task.
 
> And again this has nothing to do with reproducibility. To check whether the
> generated image has been modified, you do not need to recreate it at all.
> You need a checksum from the content and you need to keep this checksum
> separately in a safe place.

No, my concern is about reproducibility, but in the generated *installation* images, like ISO/USB flash images, if it make it clear, that contain generated initramfs images, that SHOULD be reproducible to make installation ISO/USB images reproducible too.

> [1]
> https://www.gnu.org/software/grub/manual/grub/html_node/Using-digital-
> signatures.html#Using-digital-signatures
> 
> > > Please give me a real life usecase that cannot be solved without
> > > reproducible initramfs.
> > 
> > For now you cannot build reproducible bootable/installation image because
> > you also need to build initramfs for it.
> 
> You need to generate initramfs for this particular hardware configuration
> anyway.
> Regenerating initramfs for validation doesn't make sense to me.
> 
> > > > Sure *they are* related in some way: you build initrd image for the
> > > > particular kernel, and usually it contains some modules for that kernel. You
> > > > can try to load some kernel with an initrd built for another one but it has
> > > > a little sense: the modules packed in that initrd most likely cannot be
> > > > loaded with this kernel. Anyway we need to chose some timestamp, and at last
> > > > there is some reasoning for the proposed one.
> > > 
> > > No, they are not related. Or you don't understand the essence of
> > > ctime/mtime. We create another new image with _new_ files (not only copied
> > > from the system). ctime should reflect the actual creation time and not
> > > mislead that the initramfs was created when the kernel was compiled. That's
> > > just wrong. The more I think about it, the less I like this idea in general.
> > 
> > To be clear: I'm not talking about ctime/mtime of the initramfs itself, it's
> > all about timestamps that are packed inside of it. The [cm]time of initramfs
> > do not relate to the reproducibility.
> 
> I know how to implement a SOURCE_DATE_EPOCH even for out-of-tree features,
> but I don't see any usecases for this.

Yes, the SOURCE_DATE_EPOCH would be nice, and there is an use-case, and I alredy wrote it: to make installation/bootable ISO/USB images reproducible. Because they contain initramfs images too. Which are not reproducible now. Sadly.

> > Ok, but how will I know that I've convinced you?
> 
> It will be then when I agree, and I'll write about it.

And again: the boot process *is not* my concern here.
Comment 12 Alexey Gladkov 2021-09-08 22:43:56 MSK
(Ответ для Vladimir D. Seleznev на комментарий #11)
> No, my concern is about reproducibility, but in the generated *installation*
> images, like ISO/USB flash images, if it make it clear, that contain
> generated initramfs images, that SHOULD be reproducible to make installation
> ISO/USB images reproducible too.

To check the ISO/USB image itself you _don't_ need to rebuild it from scratch. You need to check an image checksum. 

I don’t understand where "SHOULD" came from. It doesn't follow from anything. I can assume that someone from government insists on the reproducibility of the images you provide but that's another story. Such a narrow task is easy to solve when creating such a _special_ image. It's special because without knowing the timestamp (SOURCE_DATE_EPOCH), you will never reproduce the image exactly the same.

> Yes, the SOURCE_DATE_EPOCH would be nice, and there is an use-case, and I
> alredy wrote it: to make installation/bootable ISO/USB images reproducible.
> Because they contain initramfs images too. Which are not reproducible now.
> Sadly.

This is not true. You can always repack the image with any timestamp you need. This can be part of the code that will reproduce the image. You will need such a general script anyway.
Comment 13 Alexey Gladkov 2021-09-09 13:43:00 MSK
After offlist discussion with Dmitry V. Levin and Gleb Fotengauer-Malinovskiy, I agreed that the date is not important in the initramfs image and it doesn't need to be stored. It can be 0. It makes sense to me. It does not require heuristics or extra variables and is easy to implement.
Comment 14 Repository Robot 2021-09-11 19:07:33 MSK
make-initrd-2.23.0-alt1 -> sisyphus:

 Sat Sep 11 2021 Alexey Gladkov <legion@altlinux.ru> 2.23.0-alt1
 - New version (2.23.0).
 - Feature ucode: The absence of the firmware file is not an error (ALT#40790).
 - Set mtime of all initramfs files and directories to 01-01-1970 (ALT#40873).