← Back to team overview

debcrafters-packages team mailing list archive

[Bug 2089789] Re: malloc performance degradation with CPU affinity masks

 

I can reproduce the bug on a local system with the attached reproducer:

wesley@glibc-jammy:~$ apt-cache policy libc6
libc6:
  Installed: 2.35-0ubuntu3.11
  Candidate: 2.35-0ubuntu3.11
  Version table:
 *** 2.35-0ubuntu3.11 500
        500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status
     2.35-0ubuntu3 500
        500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 false false
nr_cpu: 24 pin: no fix: no
thread average (ms): 65.088057
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 false false
nr_cpu: 24 pin: no fix: no
thread average (ms): 69.766305
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 false false
nr_cpu: 24 pin: no fix: no
thread average (ms): 62.821834
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 false false
nr_cpu: 24 pin: no fix: no
thread average (ms): 65.807690
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 true false
nr_cpu: 24 pin: yes fix: no
thread average (ms): 1949.944540
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 true false
nr_cpu: 24 pin: yes fix: no
thread average (ms): 1792.266083
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 true false
nr_cpu: 24 pin: yes fix: no
thread average (ms): 1785.785353
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 true false
nr_cpu: 24 pin: yes fix: no
thread average (ms): 2373.913114

### Verification Done Jammy ###

The build-time test suite passed.

The autopkgtest excuses report is clear.

wesley@glibc-jammy:~$ apt-cache policy libc6
libc6:
  Installed: 2.35-0ubuntu3.12
  Candidate: 2.35-0ubuntu3.12
  Version table:
 *** 2.35-0ubuntu3.12 500
        500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     2.35-0ubuntu3.11 500
        500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
     2.35-0ubuntu3 500
        500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 false false
nr_cpu: 24 pin: no fix: no
thread average (ms): 61.100255
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 false false
nr_cpu: 24 pin: no fix: no
thread average (ms): 66.433792
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 false false
nr_cpu: 24 pin: no fix: no
thread average (ms): 65.547364
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 false false
nr_cpu: 24 pin: no fix: no
thread average (ms): 61.842890
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 true false
nr_cpu: 24 pin: yes fix: no
thread average (ms): 62.259289
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 true false
nr_cpu: 24 pin: yes fix: no
thread average (ms): 65.616804
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 true false
nr_cpu: 24 pin: yes fix: no
thread average (ms): 64.859172
wesley@glibc-jammy:~$ ./test-glibc-malloc 24 true false
nr_cpu: 24 pin: yes fix: no
thread average (ms): 61.196404

### Verification Done Jammy ###

** Tags removed: verification-needed verification-needed-jammy
** Tags added: verification-done verification-done-jammy

-- 
You received this bug notification because you are a member of
Debcrafters packages, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/2089789

Title:
  malloc performance degradation with CPU affinity masks

Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Jammy:
  Fix Committed

Bug description:
  [Impact]

   * Jammy has a malloc() performance degradation
     if CPU affinity masks are used (not default).

   * The maximum number of arenas for malloc() is
     calculated based on the number of processors.

     However, glibc 2.34 changed that to be based
     on sched_getaffinity(), which is the number
     of processors available _to the process_
     (i.e., based on CPU affinity masks). [0]

     Previously, glibc 2.33 instead used the
     of processors available _in the system_
     (i.e., based on sysfs and procfs files).

   * This is not an issue by default, as without
     CPU affinity masks, the returned number of
     processors is the same as sysfs and procfs.

     But it _is_ an issue if CPU affinity masks
     are set, as it can increase lock contention
     (less arenas), and thus degrade performance.

   * CPU affinity can be set at the process-level
     (e.g., taskset, numactl, sched_setaffinity())
     or at the system-level (kernel boot options).

     The latter is common in hypervisor and/or DPDK
     deployments, where CPU partitioning is applied
     with isolcpus, cpusets, systemd's CPUAffinity.

  [Test Plan]

   * The upstream bug report [1] has a reproducer,
     used in comment #5 to reproduce the problem,
     and in comment #6 to validate the fix patch.

     It is copied/attached to this bug as backup
     (test-glibc-malloc.c).

     The expected behavior is that these 2 steps
     (measuring the average time taken by 50.000
     malloc+free calls, with one thread per CPU)
     take similar amounts of time with & without
     CPU affinity masks (parameter 2: true/false),
     in a system with a great number of CPUs.

     $ ./test-glibc-malloc $(nproc) false false
     $ ./test-glibc-malloc $(nproc) true false

   * glibc has a build-time test suite.

   * glibc has autopkgtests (rebuild, ie, above)
     and triggers autopkgtests in a great number
     of reverse test dependencies.

  [Regression Potential]

   * Theoretically, any fallout should be contained
     in malloc() and be related only to performance,
     not to functional errors.

   * This happens because this malloc() patch [2] changes
     only which method to get the number of processors.

   * The method it changes to is what has been already
     used by previous versions of glibc (up to 2.33),
     which has been adopted back (2.39) and backported
     to all glibc releases after that version (2.34-2.38),
     which includes the version in Jammy (2.35 [3]).

   * The method it changes to is also exercised in other
     code paths (not just malloc()), thus it is already
     used and tested in Jammy -- it is not something new.

  [Other Info]

   * For details and analysis of (no) required
     dependencies, see comments #1, #2, and #3.

   * Upstream bug report [1]

   * Build-tested in PPA with supported archs
     and only -security (ppa:mfo/lp2089789) [4],
     with successful build & test-suite results.

  [0]

  glibc 2.33:
  $ git log --oneline origin/release/2.33/master -- sysdeps/unix/sysv/linux/getsysstats.c | grep 'misc: Add __get_nprocs_sched'
  $

  glibc 2.34:
  $ git log --oneline origin/release/2.34/master -- sysdeps/unix/sysv/linux/getsysstats.c | grep 'misc: Add __get_nprocs_sched'
  e870aac8974c misc: Add __get_nprocs_sched

  glibc 2.35:
  $ git log --oneline origin/release/2.35/master -- sysdeps/unix/sysv/linux/getsysstats.c | grep 'misc: Add __get_nprocs_sched'
  11a02b035b46 misc: Add __get_nprocs_sched

  [1] https://sourceware.org/bugzilla/show_bug.cgi?id=30945

  [2]
  https://sourceware.org/git/?p=glibc.git;a=commit;h=472894d2cfee5751b44c0aaa71ed87df81c8e62e

  [3]
  https://sourceware.org/git/?p=glibc.git;a=commit;h=d47c5e4db7924bb10efe14b787c4bd868b604e48

  [4] https://launchpad.net/~mfo/+archive/ubuntu/lp2089789

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/2089789/+subscriptions