Thursday, September 19, 2013

Schedule Migration Cost - (sched_migration_cost)




A fewer months ago I've got this situation in production environment that a lot of vm's running over hypervisor KVM in HP DL980 with more than 100 physical cpu's. There are known issues around the 'sched_migration_cost' and large NUMA servers. The issue manifests itself as a hanging node with many processes locked with 'spinlock' and the server stops. 

Scheduler tries to keep all CPUs busy by migrating tasks between overloaded CPUs to idle CPUs

Parameter :
/proc/sys/kernel/sched_migration_cost

Amount of time after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated, so increasing this variable reduces task migrations.

The default value is 500000 (ns).

If the CPU idle time is higher than expected when there are runnable processes, try reducing this value. If tasks bounce between CPU's or nodes too often, try increasing it.

Tip :  Increase by 2 until 10x to reduce load balancing

Advantages: Increasing sched_migration_cost results in waiting longer to migrate processes from CPUs that are overloaded to idle CPU's results in fewer migrations.

Disvantages: Increasing sched_migration_cost can result in waiting too long to migrate processes from CPUs that are overloaded to idle CPUs.  This means only a fraction of the CPUs will be running user processes and those CPUs will have load averages greater than 1 while other CPUs are idle.  


References :

http://rhsummit.files.wordpress.com/2012/03/shak_larry_perf_analysis_and_tuning.pdf
http://opensource.wolfsonmicro.com/cgi-bin/gitweb.cgi?p=linux-2.6-asoc.git;a=commitdiff;h=da84d96176729fb48a8458561e5d8647103168b8
http://www.fizyka.umk.pl/~jkob/prace-mag/cfs-tuning.pdf

No comments:

Post a Comment