Gnuconfig

· slope000's blog

Start day: 09/07/2022\

  1. First level hook: app-admin/keepassxc-2.7.1-r1
    Password manager.
    Dependencies:
    Lots and lots of dependencies in the tree. Let's choose the first one: gnuconfig-20220508
  2. Second level hook: sys-devel/gnuconfig-20220508
    Updated config.sub and config.guess file from GNU.
    Dependencies:
    None.

# ebuild

Let's start by reviewing the ebuild.

   SRC_URI="https://dev.gentoo.org/~sam/distfiles/${CATEGORY}/${PN}/${P}.tar.xz"
   HOMEPAGE="https://savannah.gnu.org/projects/config"
   LICENSE="GPL-3+-with-autoconf-exception"

Nothing else is remarkable here, it's just packaging, source preparation, testing and installation instructions. No patches are available for stable version (the one I use).
Lines of code: ~3655

# Source code preparation

Nothing unusual here as well,

wget https://dev.gentoo.org/~sam/distfiles/sys-devel/gnuconfig/gnuconfig-20220508.tar.xz
tar xf gnuconfig-20220508.tar.xz --one-top-level
rm gnuconfig-20220508.tar.xz

# ChangeLog

gitlog-to-changelog is a perl script, distributed under the GPL3+ license. As the name suggests, it's a converter between two log formats. ChangeLog and ChangeLog-old files are also provided in the source tree.

[Break]: Read GPL3 License.

# config.sub

The remaining program consists of two parts - config.sub and config.guess. Starting with the config.sub, it's a script for validating and canonicalizing a configuration triplet:

# The goal of this file is to map all the various variations of a given
# machine specification into a single specification in the form:
#	CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM
# or in some cases, the newer four-part form:
#	CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM
# It is wrong to echo any other type of specification.

It is distributed under the GPL 3.0 licence, or any later version.

# Validation

The string is splitted into components in the following way:

Remark: It's so interesting to see so many machine types, being used to x86-64 for the lifetime. One day, I'll try to use GNU/Gentoo/Linux with another platform as well. :D

Then, some substitution are made, e.g.

	pdp11-unknown)
		vendor=dec
        ;;
    i370-ibm*)
		vendor=ibm
		;;
    xps-unknown | xps100-unknown)
		cpu=xps100
		vendor=honeywell
		;;
	x64 | amd64)
		cpu=x86_64
		vendor=pc
		;;

for CPU and vendor. Then, similarly, some other validation rules are applied to operating systems (includes checking ABI and libc), and validates the OS-kernel combination, for example -dietlibc* is not valid because it's just a libc implementation, and requires a kernel.

Remark: Wow, never actually heard about diet libc. Some meta-information about this minimal libc implementation mostly for embedded devices: Stable release date: September 24, 2018. Supported platforms: Alpha, ARM, PA-RISC, ia64, i386, MIPS, s390, sparc, PowerPC. Actually, there is also DietLinux. It is a boot floppy based on the diet libc. Two alternatives are dnetc linux, and Pauls Boot CD. I think I'll test it one day, sounds pretty interesting.

If CPU and and OS are known, but not the manufacturer, the logical manufacturer is picked, for example:

			*-beos*)
				vendor=be
				;;
            *-genix*)
				vendor=ns
				;;
			s390-* | s390x-*)
				vendor=ibm
				;;

In the end, the canonicalized configuration is echoed, and the program terminates:

echo "$cpu-$vendor-${kernel:+$kernel-}$os"
exit

If the initial configuration wasn't validated successfully on a step N during the script execution, the script terminates at step N+1.

# config.guess

Another part of gnuconfig is config.guess. Originally written by Per Bothner; maintained since 2000 by Ben Elliston. Similarly, the license of config.guess is GPL 3+. It is used for system detection.

It's possible to disable some shellcheck features for systems with pre-POSIX /bin/sh by uncommenting the following line: # shellcheck disable=SC2006,SC2268. SC2006 checks whether the $(...) (correct) notation is used instead of legacy backticked `...` (legacy). SC2268 warns if x-prefix (x-hack) is used: e.g. [ “x$var” = “xval” ] (legacy). Throughout the code, a few other spellchecks are disabled.

# System Detection

One thing which aids system detection is compiler, used by this script: HOST_CC (deprecated) / CC_FOR_BUILD.

The detection starts with executing uname:

UNAME_MACHINE=`(uname -m) 2>/dev/null` || UNAME_MACHINE=unknown
UNAME_RELEASE=`(uname -r) 2>/dev/null` || UNAME_RELEASE=unknown
UNAME_SYSTEM=`(uname -s) 2>/dev/null` || UNAME_SYSTEM=unknown
UNAME_VERSION=`(uname -v) 2>/dev/null` || UNAME_VERSION=unknown

In case the system name includes Linux or GNU, LIBC is next detected:

case $UNAME_SYSTEM in
Linux|GNU|GNU/*)
	LIBC=unknown

	set_cc_for_build
	cat <<-EOF > "$dummy.c"
	#include <features.h>
	#if defined(__UCLIBC__)
	LIBC=uclibc
	#elif defined(__dietlibc__)
	LIBC=dietlibc
	#elif defined(__GLIBC__)
	LIBC=gnu
	#else
	#include <stdarg.h>
	/* First heuristic to detect musl libc.  */
	#ifdef __DEFINED_va_list
	LIBC=musl
	#endif
	#endif
	EOF
	cc_set_libc=`$CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep '^LIBC' | sed 's, ,,g'`
	eval "$cc_set_libc"

	# Second heuristic to detect musl libc.
	if [ "$LIBC" = unknown ] &&
	   command -v ldd >/dev/null &&
	   ldd --version 2>&1 | grep -q ^musl; then
		LIBC=musl
	fi

	# If the system lacks a compiler, then just pick glibc.
	# We could probably try harder.
	if [ "$LIBC" = unknown ]; then
		LIBC=gnu
	fi
	;;
esac

Remark: And once again, the musl preprocessor debate shows up. Just one macro could avoid the hacks above.

# Case Branches

Depending on the configuration, yielded by uname, different case branches (not exclusive) are executed. Here is a few examples of the preconditions:

case $UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION in
    *:NetBSD:*:*)
    # ...
    ;;
    *:Redox:*:*)
    ;;
    alpha:OSF1:*:*)
    ;;
    Tek43[0-9][0-9]:UTek:*:*) # Tektronix 4300 system running UTek (BSD)
    ;;
# *:NetBSD:*:*

For the resulting guess, the CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM form is used: GUESS=$machine-${os}${release}${abi-}. Depending on whether the system supports ELF object format, os might be set to netbsd or netbsdelf.

Vendor can't be deduced, so it's always set to unknown, e.g.:

	case $UNAME_MACHINE_ARCH in
	    aarch64eb) machine=aarch64_be-unknown ;;
	    armeb) machine=armeb-unknown ;;
        sh3el) machine=shl-unknown ;;

For Debian GNU/NetBSD machines release is set to -gnu. Otherwise, it is set to echo "$UNAME_RELEASE" | sed -e 's/[-_].*//' | cut -d. -f1,2.

# *:SecBSD:*:* and other uncommon systems...

For them, GUESS is set to the following:

	UNAME_MACHINE_ARCH=`arch | sed 's/SecBSD.//'`
	GUESS=$UNAME_MACHINE_ARCH-unknown-secbsd$UNAME_RELEASE

secbsd is replaced by libertybsd for LibertyBSD, sortix for Sortix, twizzler for Twizzler and so on.

# Special cases

Some special cases are treated separately, for example, for mips:OSF1:*.*, GUESS is set to mips-dec-osf1.

# x86_64:Linux:*:*

Here is the case of mine and most other machines utilizing gnuconfig. The full block of code is the following:

    x86_64:Linux:*:*)
	set_cc_for_build
	CPU=$UNAME_MACHINE
	LIBCABI=$LIBC
	if test "$CC_FOR_BUILD" != no_compiler_found; then
	    ABI=64
	    sed 's/^	    //' << EOF > "$dummy.c"
	    #ifdef __i386__
	    ABI=x86
	    #else
	    #ifdef __ILP32__
	    ABI=x32
	    #endif
	    #endif
EOF
	    cc_set_abi=`$CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep '^ABI' | sed 's, ,,g'`
	    eval "$cc_set_abi"
	    case $ABI in
		x86) CPU=i686 ;;
		x32) LIBCABI=${LIBC}x32 ;;
	    esac
	fi
	GUESS=$CPU-pc-linux-$LIBCABI
	;;

CPU is set to $UNAME_MACHINE, LIBCABI to $LIBC (or ${LIBC}x32 for x32 systems). Then, ABI is determined, and finally the GUESS is set to $CPU-pc-linux-$LIBCABI.

# Output

After all cases are processed, if the GUESS is not an empty string, it is outputted and the script terminates.

if test "x$GUESS" != x; then
    echo "$GUESS"
    exit
fi

Otherwise, the script resorts to compiler aid. A few extra heuristics are performed to detect some systems. A few examples:

main ()
{
#if defined (MULTIMAX) || defined (n16)
#if defined (UMAXV)
  printf ("ns32k-encore-sysv\n"); exit (0);
#else
#if defined (CMU)
  printf ("ns32k-encore-mach\n"); exit (0);
#else
  printf ("ns32k-encore-bsd\n"); exit (0);
#endif
#endif
#endif

#if defined (__386BSD__)
  printf ("i386-pc-bsd\n"); exit (0);
#endif

#if defined (sequent)
#if defined (i386)
  printf ("i386-sequent-dynix\n"); exit (0);
#endif
#if defined (ns32000)
  printf ("ns32k-sequent-dynix\n"); exit (0);
#endif
#endif

  /*...*/

  exit (1);
}

# Failure to Recognize the System

In case config.guess fails to recognize the system, the following text is outputted:

This script (version $timestamp), has failed to recognize the
operating system you are using. If your script is old, overwrite *all*
copies of config.guess and config.sub with the latest versions from:

  https://git.savannah.gnu.org/cgit/config.git/plain/config.guess
and
  https://git.savannah.gnu.org/cgit/config.git/plain/config.sub
EOF

our_year=echo $timestamp | sed 's,-.*,,'
thisyear=date +%Y
# shellcheck disable=SC2003
script_age=expr "$thisyear" - "$our_year"
if test "$script_age" -lt 3 ; then
   cat >&2 <<EOF

If $0 has already been updated, send the following data and any
information you think might be pertinent to config-patches@gnu.org to
provide the necessary information to handle your system.

config.guess timestamp = $timestamp

uname -m = (uname -m) 2>/dev/null || echo unknown
uname -r = (uname -r) 2>/dev/null || echo unknown
uname -s = (uname -s) 2>/dev/null || echo unknown
uname -v = (uname -v) 2>/dev/null || echo unknown

/usr/bin/uname -p = (/usr/bin/uname -p) 2>/dev/null
/bin/uname -X     = (/bin/uname -X) 2>/dev/null

hostinfo               = (hostinfo) 2>/dev/null
/bin/universe          = (/bin/universe) 2>/dev/null
/usr/bin/arch -k       = (/usr/bin/arch -k) 2>/dev/null
/bin/arch              = (/bin/arch) 2>/dev/null
/usr/bin/oslevel       = (/usr/bin/oslevel) 2>/dev/null
/usr/convex/getsysinfo = (/usr/convex/getsysinfo) 2>/dev/null

UNAME_MACHINE = "$UNAME_MACHINE"
UNAME_RELEASE = "$UNAME_RELEASE"
UNAME_SYSTEM  = "$UNAME_SYSTEM"
UNAME_VERSION = "$UNAME_VERSION"
EOF
fi

exit 1

2022-09-13

7 days passed since start.