Bug 45757 - gpg2 is stuck (spinning) under fakeroot-1.29-alt1 on e2k
Summary: gpg2 is stuck (spinning) under fakeroot-1.29-alt1 on e2k
Status: CLOSED FIXED
Alias: None
Product: Sisyphus
Classification: Development
Component: fakeroot (show other bugs)
Version: unstable
Hardware: e2k Linux
: P5 normal
Assignee: placeholder@altlinux.org
QA Contact: qa-sisyphus
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-05 16:31 MSK by Ivan Zakharyaschev
Modified: 2023-04-06 20:57 MSK (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan Zakharyaschev 2023-04-05 16:31:40 MSK
fakeroot-1.29-alt1 on all e2k* platforms

# How to reproduce:

$ hsh --apt-conf /home/imz/.hasher/sisyphus_e2k/apt.conf --ini ~/hasher
$ hsh-install ~/hasher/ gpg-keygen
$ hsh-run ~/hasher/ -- fakeroot -- gpg-keygen --passphrase '' --name-real 'Some One' --name-email someone@example.com /dev/null /usr/src/example-pubkey.asc
gpg: Generating a basic GPG key


And it gets stuck, consuming all CPU.

$ hsh-run ~/hasher/ -- rpm -q fakeroot gpg-keygen
fakeroot-1.29-alt1.e2kv4
gpg-keygen-20190611-alt1.noarch
$

# Expected behavior

It's ok not under fakeroot, or under fakeroot-1.25.3-alt1 (an older version), or on x86_64 and other platforms.

## Expected behavior (demonstrated not under fakeroot)

$ hsh-run ~/hasher/ -- gpg-keygen --passphrase '' --name-real 'Some One' --name-email someone@example.com /dev/null /usr/src/example-pubkey.asc
gpg: Generating a basic GPG key
7CBB7A0FA1A496A7
$

## Expected behavior (demonstrated under fakeroot-1.25.3-alt1)

$ hsh --apt-conf /home/imz/.hasher/p10_e2k/apt.conf --ini ~/hasher1
$ hsh-install ~/hasher1/ gpg-keygen
$ hsh-run ~/hasher1/ -- rpm -q fakeroot gpg-keygen
fakeroot-1.25.3-alt1.e2kv4
gpg-keygen-20190611-alt1.noarch
$ hsh-run ~/hasher1/ -- fakeroot -- gpg-keygen --passphrase '' --name-real 'Some One' --name-email someone@example.com /dev/null /usr/src/example-pubkey.asc
gpg: Generating a basic GPG key
8B2F8F26042FAF82
$

# The gpg2 command that gets stuck

It's actually just the first gpg2 invocation:

$ hsh-run ~/hasher/ -- fakeroot -- sh -efux gpg-keygen --passphrase '' --name-real 'Some One' --name-email someone@example.com /dev/null /usr/src/example-pubkey.asc
...
++ mktemp -d
+ temp_dir=/usr/src/tmp/tmp.dHWob6q7al
+ trap 'rm -Rf '\''/usr/src/tmp/tmp.dHWob6q7al'\''' EXIT HUP INT TERM
+ export GNUPGHOME=/usr/src/tmp/tmp.dHWob6q7al
+ GNUPGHOME=/usr/src/tmp/tmp.dHWob6q7al
+ cat
+ gpg2 --quiet --batch --no-tty --gen-key /usr/src/tmp/tmp.dHWob6q7al/.input
gpg: Generating a basic GPG key
^C
Comment 1 Ivan Zakharyaschev 2023-04-05 16:41:58 MSK
strace reports the following at this moment. (Here I used --root instead of fakeroot. /cmd contained the same command.)

$ hsh-run --root ~/hasher/ -- strace -f -y /bin/sh /cmd &>strace-apt-gpg.0

...
[pid 2005524] read(9,  <unfinished ...>
[pid 2005520] write(4, "D (genkey(rsa(nbits 4:3072)))\n", 30 <unfinished ...>
[pid 2005524] <... read resumed> "D (genkey(rsa(nbits 4:3072)))\n", 1002) = 30
[pid 2005520] <... write resumed> )     = 30
[pid 2005524] read(9,  <unfinished ...>
[pid 2005520] write(4, "END", 3 <unfinished ...>
[pid 2005524] <... read resumed> "END", 1002) = 3
[pid 2005520] <... write resumed> )     = 3
[pid 2005524] read(9,  <unfinished ...>
[pid 2005520] write(4, "\n", 1 <unfinished ...>
[pid 2005524] <... read resumed> "\n", 999) = 1
[pid 2005520] <... write resumed> )     = 1
[pid 2005520] read(4,  <unfinished ...>
[pid 2005524] access("/dev/random", R_OK) = 0
[pid 2005524] access("/dev/urandom", R_OK) = 0
[pid 2005524] getpid()                  = 2005523
[pid 2005524] getpid()                  = 2005523
[pid 2005524] open("/etc/gcrypt/random.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] getpid()                  = 2005523
[pid 2005524] open("/etc/gcrypt/random.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index0/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index1/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index2/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index3/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index0/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index1/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index2/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index3/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index0/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index1/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index2/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] openat(AT_FDCWD, "/sys/devices/system/cpu/cpu2/cache/index3/level", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2005524] mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4588f88ea000
[pid 2005524] munmap(0x4588f88ea000, 135168) = 0
[pid 2005524] mprotect(0x4588fc021000, 8192, PROT_READ|PROT_WRITE) = 0
[pid 2005524] open("/dev/random", O_RDONLY) = 10
[pid 2005524] fcntl64(10, F_GETFD)      = 0
[pid 2005524] fcntl64(10, F_SETFD, FD_CLOEXEC) = 0
[pid 2005524] restart_syscall(<... resuming interrupted fcntl64 ...>) = -1 EINTR (Interrupted system call)
[pid 2005524] restart_syscall(<... resuming interrupted restart_syscall ...>) = -1 EINTR (Interrupted system call)
[pid 2005524] restart_syscall(<... resuming interrupted restart_syscall ...>) = -1 EINTR (Interrupted system call)
...

strace of the same moment without fakeroot. (Here, for some reason, strace didn't decode string args!..)

$ hsh-run ~/hasher/ -- strace -f -y /bin/sh /cmd2 &>strace-apt-gpg.2

...
[pid 2006040] read(9,  <unfinished ...>
[pid 2006036] write(4, 0x554e18, 30 <unfinished ...>
[pid 2006040] <... read resumed> 0x458f0c000ce0, 1002) = 30
[pid 2006036] <... write resumed> )     = 30
[pid 2006040] read(9,  <unfinished ...>
[pid 2006036] write(4, 0x4578b408e184, 3 <unfinished ...>
[pid 2006040] <... read resumed> 0x458f0c000ce0, 1002) = 3
[pid 2006036] <... write resumed> )     = 3
[pid 2006040] read(9,  <unfinished ...>
[pid 2006036] write(4, 0x4578b408e144, 1 <unfinished ...>
[pid 2006040] <... read resumed> 0x458f0c000ce3, 999) = 1
[pid 2006036] <... write resumed> )     = 1
[pid 2006036] read(4,  <unfinished ...>
[pid 2006040] access(0x458f09cd0190, R_OK) = 0
[pid 2006040] access(0x458f09cd01a0, R_OK) = 0
[pid 2006040] getpid()                  = 2006039
[pid 2006040] getpid()                  = 2006039
[pid 2006040] open(0x458f09ccfd50, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] getpid()                  = 2006039
[pid 2006040] open(0x458f09ccfd50, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] openat(AT_FDCWD, 0x458f0b2d0f70, O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 2006040] mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x458f0b2d3000
[pid 2006040] munmap(0x458f0b2d3000, 135168) = 0
[pid 2006040] mprotect(0x458f0c021000, 8192, PROT_READ|PROT_WRITE) = 0
[pid 2006040] open(0x458f09cd12b0, O_RDONLY) = 10
[pid 2006040] fcntl64(10, F_GETFD)      = 0
[pid 2006040] fcntl64(10, F_SETFD, FD_CLOEXEC) = 0
[pid 2006040] getrandom(0x458f0b2d1140, 96, 0) = 96
...
Comment 2 Ivan Zakharyaschev 2023-04-05 16:50:20 MSK
May be related to https://bugzilla.altlinux.org/45737 ?..
Comment 3 Michael Shigorin 2023-04-05 17:17:24 MSK
(Ответ для Ivan Zakharyaschev на комментарий #2)
> May be related to https://bugzilla.altlinux.org/45737 ?..
JFYI, we've observed another regression with fakeroot 1.29 on e2k ("warning: Unable to reset I/O priority" spam from rpm's lib/rpmscript.c that didn't occur with fakeroot 1.28 under the same kernel); glebfm@ noted that might be related to syscall function wrapper that got introduced in this version:
http://git.altlinux.org/gears/f/fakeroot.git?p=fakeroot.git;a=commitdiff;h=f091ef785ee9f3484b91c8e918d8241e8d244d83

(Ответ для Ivan Zakharyaschev на комментарий #0)
> And it gets stuck, consuming all CPU.
All 100% of a single CPU core, strictly speaking. :-)
Comment 4 Ivan Zakharyaschev 2023-04-05 17:37:08 MSK
I've had a thought (without any confirmation) that this new syscall wrapper can lead to access to prohibited memory among these universal 6 arguments, like to a result of speculative computation. Previously, I saw something like SIGBUS in such cases, but here I don't see any signals in strace at this moment, but just resuming some bizarre syscall after endless interrupts; the reason of the interrupts is not shown. This might not be the real reason, just an idea.
Comment 5 Ivan Zakharyaschev 2023-04-05 17:38:20 MSK
(Ответ для Ivan Zakharyaschev на комментарий #4)
> I've had a thought (without any confirmation) that this new syscall wrapper
> can lead to access to prohibited memory among these universal 6 arguments,
> like to a result of speculative computation. Previously, I saw something
> like SIGBUS in such cases, but here I don't see any signals in strace at
> this moment, but just resuming some bizarre syscall after endless
> interrupts; the reason of the interrupts is not shown. This might not be the
> real reason, just an idea.

"prohibited use of value" would be a more correct description for that hypothetical situation
Comment 6 Dmitry V. Levin 2023-04-05 22:31:11 MSK
(In reply to Ivan Zakharyaschev from comment #1)
> strace of the same moment without fakeroot. (Here, for some reason, strace
> didn't decode string args!..)

That's because strace was denied permission to access the tracee's memory.
Comment 7 Ivan Zakharyaschev 2023-04-06 00:41:57 MSK
Meanwhile I've understood the problem: a special calling convention on e2k. And I'm doing a fix.
Comment 8 Ivan Zakharyaschev 2023-04-06 01:42:27 MSK
http://ftp.altlinux.org/pub/people/mike/elbrus/docs/elbrus_prog/html/chapter9.html#id34 :

9.5.1. Передача параметров¶

...

Зависимость передачи параметров от интерфейса процедуры

При генерации кода компилятор вправе использовать и доверять информации об интерфейсе процедуры в точке ее вызова. Эта информация получается из заданного предописания процедуры. Возможен и анализ по вызову при отсутствии предописания. Все предописания можно разделить на три группы:

    предописание со спецификацией всех параметров;

    предописание со спецификацией переменного числа параметров;

    предописание без спецификации параметров.

Передача параметров для вызова со спецификацией всех параметров осуществляется по общей схеме, приведенной выше.

Интерфейс обработки списка переменного числа параметров подразумевает нахождение их в памяти. Поэтому при передаче параметров для вызова процедуры с переменным числом параметров, параметры, входящие в список переменного числа (начиная с параметра перед эллипсом), сразу размещаются в соответствующие места локального стека, даже если они могут быть помещены в первые восемь регистров.

Если для вызова процедуры нет предописания со спецификацией параметров, необходимо предусмотреть все возможные случаи. Поэтому при формировании списка фактических параметров первые восемь параметров помещаются и на регистры (как в случае процедур с фиксированным числом параметров), и в память (как в случае процедур с переменным числом параметров).

Таким образом, процедура с переменным числом параметров всегда может предполагать, что переменная часть параметров находится в памяти. А для процедуры с фиксированным числом параметров первые параметры находятся в первых восьми регистрах.

* * *

The syscall.S assembler implementation on e2k indeed simply loads
all arguments from the stack.
    
I'm not sure whether this difference in the prototype can be important
for some other platform, but who knows...

glibc's syscall.S implementations don't look like this would be the case. However,
searching the web, one can learn that AArch64/MacOS has the same "problem".

(Note that on aarch64 xN registers are 64-bit registers, and wN are their 32-bit parts.)

https://cpufun.substack.com/i/32634393/why-bother-us-if-this-all-works :

Why bother us if this all works?

The reason this is an issue at all is that it doesn’t work this way on AArch64/MacOS You may have checked the AArch64 compilers in Compiler Explorer and seen code like this,

        add     x0, sp, #4                      // =4
        mov     w1, #1
        mov     w2, #2
        mov     w3, #3
        bl      foo(int*, int, int, int)
...       
        add     x0, sp, #4                      // =4
        mov     w1, #4
        mov     w2, #5
        mov     w3, #6
        bl      foo_ellipsis(int*, ...)

which shows the same properties as that on x86_64: the arguments are being passed in the same places whether or not this is a variadic function, so that’s all good, right?

But… and it’s a big BUT, the calling convention on AArch64/MacOS is not like this. Here the compiler doesn’t load arguments which are matching the ellipsis into registers, but rather puts them onto the stack. Then the va_list code extracts them from there. As a result the test code fails when run natively on the MacOS M1 machines.

* * *

I'm making my fix pretty and making a task with it.
Comment 9 Ivan Zakharyaschev 2023-04-06 02:54:19 MSK
I suggest the fix in task 318100
Comment 10 Ivan Zakharyaschev 2023-04-06 03:27:42 MSK
(Ответ для Ivan Zakharyaschev на комментарий #4)
> I've had a thought (without any confirmation) that this new syscall wrapper
> can lead to access to prohibited memory among these universal 6 arguments,
> like to a result of speculative computation. Previously, I saw something
> like SIGBUS in such cases, but here I don't see any signals in strace at
> this moment, but just resuming some bizarre syscall after endless
> interrupts; the reason of the interrupts is not shown. This might not be the
> real reason, just an idea.

No, the way the syscall wrapper was compiled (and optimized) on e2k (see with disassemble in gdb) was almost the same as syscall.S implementation; so, since they write it that way, nothing bad should have been expected (except for the wrong calling convention):


# tail -n28 glibc-2.29-alt2.E2K.26.012.1/sysdeps/unix/sysv/linux/e2k/syscall.S

#include <sysdep.h>

        .ignore ld_st_style
        .text

ENTRY (syscall)

        setwd   wsz = 0x9
        setbn   rsz = 0x3, rbs = 0x5, rcur = 0x0
        getsp   0x0, %r7

        __SYSCALL_ARG_MEM (%r7, 0x0, %b[0])
        __SYSCALL_ARG_MEM (%r7, 0x8, %b[1])
        __SYSCALL_ARG_MEM (%r7, 0x10, %b[2])
        __SYSCALL_ARG_MEM (%r7, 0x18, %b[3])
        __SYSCALL_ARG_MEM (%r7, 0x20, %b[4])
        __SYSCALL_ARG_MEM (%r7, 0x28, %b[5])
        __SYSCALL_ARG_MEM (%r7, 0x30, %b[6])

        sdisp   %ctpr1, __SYSCALL_TRAPNUM
        call    %ctpr1, wbs = 0x5

        __SYSCALL_OUTPUT

        ret

PSEUDO_END (syscall)
#

__SYSCALL_ARG_MEM is just a load.

glibc-2.29-alt2.E2K.26.012.1/sysdeps/unix/sysv/linux/e2k/e2k64/sysdep.h:
#define __SYSCALL_ARG_MEM(src1, src2, dst)      ldd src1, src2, dst
Comment 11 Ivan Zakharyaschev 2023-04-06 15:34:09 MSK
(Ответ для Ivan Zakharyaschev на комментарий #9)
> I suggest the fix in task 318100

Fixed there.

From: bugzilla-admin@altlinux.org
Subject: [Bug 45757] Unable to close via changelog: no such bug

You have tried to close bug 45757 via changelog
(see below for the changelog excerpt).

Unfortunately, it is not possible: this bug does not exist
https://bugzilla.altlinux.org/45757



Sincerely, your Bugzilla.

:-/ ??
Comment 12 Ivan Zakharyaschev 2023-04-06 20:57:49 MSK
Amended a bit more in:

task #318154: added #100: build tag "1.29-alt3" from /people/imz/packages/fakeroot.git

* Thu Apr  6 2023 Ivan Zakharyaschev <imz@altlinux.org> 1.29-alt3
- Fixed a compiler error of older GCCs (for p10) in the wrapper for
  syscall function.
- Warn the maintainer if a function definition is missing (when it is not
  generated for special cases like calling a variadic function like syscall).