Driving the power of AIX: Performance Tuning on IBM Power Systems - Milberg K.

Chapter 1
Step 1. Establishing a Baseline

Step 3. Identifying the Bottleneck
Step 4. Tuning

Chapter 5
vmstat (Unix-generic)
sar (Unix-generic)

Computational Memory
File Memory
Paging and Swapping

Virtual Memory Summary
sar (Unix-generic)

minperm, maxperm, maxclient, and lru_fi le_repage

Paging Space Tuning
Thrashing and Load Control

Asynchronous I/O
Logical Volumes and Disk Placement

Network Subsystem Memory Management
Virtual and Shared Ethernet

Monitoring Network Packets
iptrace, ipreport, and ipfi lter

Section VI
Multiple Choice
True or False

Quiz Answers
Section I
Section II
Section III
Section IV

Section V
Section VI / Chapter 17
Section VI / Chapter 18

Author: Milberg K.

Tags: software computer science

ISBN: 978-158347-098-5

Year: 2009

Similar

IBM System Storage Solutions Handbook

Operating System Concepts

DB2 pureXML Cookbook. Master the Power of the IBM Hybrid Data Server

Embedded Systems Architecture. A Comprehensive Guide for Engineers and Programmers. (Embedded Technology)

Text

Driving the Power of AIX

Driving the Power of AIX
Performance Tuning on IBM Power Systems

Ken Milberg

MC Press Online, LP
Lewisville, TX 75077

™

Driving the Power of AIX : Performance Tuning on IBM Power Systems
Ken Milberg
Photography by Michele Huttler Silver, Michele Silver Photography
First Printing—October 2009
© 2009 Ken Milberg. All rights reserved.
Portions © MC Press Online, LP

Every attempt has been made to provide correct information. However, the publisher and the author do not
guarantee the accuracy of the book and do not assume responsibility for information included in or omitted
from it.
IBM is a registered trademark of International Business Machines Corporation in the United States, other
countries, or both. AIX, POWER and POWER6 are registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. All other product names are trademarked or copyrighted by their respective manufacturers.
Printed in Canada. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission
in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.
MC Press offers excellent discounts on this book when ordered in quantity for bulk purchases or special
sales, which may include custom covers and content particular to your business, training goals, marketing
focus, and branding interest.
For information regarding permissions or special orders, please contact:
MC Press
Corporate Offices
125 N. Woodland Trail
Lewisville, TX 75077 USA
For information regarding sales and/or customer service, please contact:
MC Press
P.O. Box 4300
Big Sandy, TX 75755-4300 USA
ISBN: 978-158347-098-5

Acknowledgements
First and foremost, this book is dedicated to my children—Hadara, Ori, Rani
and Elana, whom I love and adore with all my heart and who have been a constant source of joy to me throughout their lives. Thank you Vera, for providing
me with these incredible children. Thank you Mom and Dad, for all the love
you have given me through the years. This book is dedicated to my parent’s
family, all of whom perished during the Holocaust, except for my dear Aunt
Molly, who passed away several years ago and whom I still miss dearly.
The publication of this book could not have been possible without the support
and encouragement of many individuals throughout my career. I want to thank
David Brodt for giving me my first job in systems and keeping me around even
after I mistakenly destroyed his entire B90 Burroughs system (even though it
was a Burroughs VMS bug) along with all his backups during a failed operations activity. I stayed on and led their project, my first, to convert their legacy
system to Unix over 20 years ago—SCO Unix 3.2.2. I want to thank Terry
Every for giving me my first opportunity in NYC in the early 1990s as a Unix
Systems Manager, working on HP9000s and HP-UX. I learned so much from
him, less about systems (though he is technical), and more about people and
class.
I want to thank Mark Mulconry for giving me my first opportunity to manage a
large production IBM AIX environment and my homeboys at Empire BC/BS
(Greg Pastuzyn, Steven Goldman, Steven Gerasimovich, Amit Goel, Arkady

Getselis) as well as my homegal, Marilyn Walter. To Winston, an AIX system
administrator who worked for me at the World Trade Center. We’ll always remember you. You will never be forgotten!
I want to thank the folks at IBM, who at the turn of the century thought
enough of me to put me on their AIX performance team in Washington DC,
working for the US Census Bureau (which is perhaps where this whole train
started).
I want to thank Nicolete McFadden and Bharvi Parikh for their work helping
me through many IBM initiatives, including founding and leading the NY
Metro PowerAIX/Linux Users Group. And thanks go to Randy Default, the former President of COMMON, who made me a permanent Guest on their Board
of Directors representing AIX interests. I want to thank Bess Protacio and her
AIX team of Bradd Baldwin, Abid Khwaja, and Jonathan Mencher for the times
we had at Adecco migrating to AIX from that nameless Sun Unix operating system. I want to thank Dan Raju and Wahid Ullah for the great AIX fun we had
in Ann Arbor and Ed Braunstein for providing my first exposure to AIX in
1996, when I was a CIO (before my career starting going downhill) and for the
great times we had at LAS.
I want to thank Brian Shorter, Mitch Diodato, Bruce Slaven, Jennifer Weems and
Tim Paramore at Arrow for giving me the confidence and tools to start my own
company, PowerTCO an IBM Business Partner, and for Raffi Princian for believing in me and leading our first assessment. Thanks also to the fine folks at Future
Tech (Bob Venero, Phil Preston, Karen Sinda, Mike Rosatto, Steven Vames, Bill
Daub, and Lynn Keegan) who showed me the ropes of working for a BP.
It must be said that I would not even have considered writing if not for the
folks at TechTarget who took a chance years ago on a neophyte writer. Thank
you TechTarget (in the early days it was Amy Kucharik and Jan Stafford) for
sticking by me and helping me launch my Ask The Expert Linux site as well as
my writing career. I still do quite a bit of work for searchdatacenter.techtarget.com
and searchenterpriselinux.com and love the assignments (thank you Matt
Stansberry and Leah Rosin). You can see my blog also at itknowledgeexchange,
another TechTarget offering.
I want to thank James Proescholdt, formerly of IBM Systems Magazine for giving me the opportunity to write for them and Rob McNelly, who runs their
AIXchange blog, who provided me with contact information that enabled me to
further my writing career with IBM. Thank you to Natalie Boike, my present
editor at IBM Systems Magazine for all the fun work. I am also very thankful to
Troy Mott at Backstop Media for being my editor/publisher on content through

IBM developerWorks and for helping advise me during the early conceptual
stages of my book.
I want to thank Susan Schreitmueller, IBM’s most renowned and well-known
performance expert, who reviewed my book and from whom I learned so much.
And Jaqui Lynch, among other performance gurus, from whom I also learned
so much through the years.
Finally the publication of this book could not have been possible but for the ungrudging efforts put in by the writer of the foreword of my book, IBM Distinguished Engineer Joefon Jann, and for Chris Gibson, IBM AIX guru and writer
who took the time out of his busy schedule to proofread the myriad mistakes in
my first drafts.
I want to thank Michele Huttler Silver, with Michele Silver Photography
(msilverphotograpy.com) for the incredible job she did with the breathtaking
photographs you will see interspersed throughout the book.
And thanks again to my publisher Merrikay Lee—for giving me the opportunity to write this book, for believing in me, for sponsoring our book signing,
book fair, and presentation seminar during the summer of 2009 in NYC and for
taking a chance on an IBM Power AIX book. Thanks also go to my copy editor,
Katie, for the stellar job. You are amazing!
I’ll add a special mention to my dear friends, Steven and Shelly, Mitch and
Candy, David and Laurie, who’ve always been there for me and my children,
through thick and thin.
Last, but definitely not least, thank you M—the love of my life, the one who
makes my heart sing and race, and the one person in my life who has never wavered in her belief in me. You’re my muse and inspiration to keep going (with
this book and through all life’s trials and tribulations), and one of the few folks
who think that I am more than an idiot savant. You are the one who has helped
keep things together for me, through good times and bad.
—Ken Milberg
September 2009

Contents
Foreword
Preface

xi
xiii

SECTION I: INTRODUCTION
Chapter 1: Performance Tuning Methodology
Step 1. Establishing a Baseline
Step 2. Stress Testing and Monitoring
Step 3. Identifying the Bottleneck
Step 4. Tuning
Step 5. Repeat

3
3
4
5
5
6

Chapter 2: Introduction to AIX
Unix
AIX
AIX Market Share

7
7
8
9

Chapter 3: Introduction to POWER Architecture
POWER5
POWER6

11
13
14

Section I: Summary, Tips, and Quiz
Summary
Tips

17
17
18

QUIZ
Multiple Choice
True or False
Fill In the Blank(s)

19
19
20
20

Contents

SECTION II: CPU
Chapter 4: CPU: Introduction

Chapter 5: CPU: Monitoring
vmstat (Unix-generic)
sar (Unix-generic)
iostat (Unix-generic)
w (Unix-generic)
lparstat (AIX-specific)
mpstat (AIX-specific)
topas (AIX-specific)
nmon
Using nmon for Historical Analysis
ps (Unix-generic)
Tracing Tools
tprof
Timing Tools
time
timex

25
25
28
31
31
32
33
35
36
37
38
39
39
41
41
42

Chapter 6: CPU: Tuning
Process and Thread Management
nice
renice
ps
schedo
sched_R and sched_D
fixed_pri_global
timeslice
bindprocessor
smtctl
gprof

45
45
46
47
48
48
50
51
51
52
53
54

Section II: Summary, Tips, and Quiz
Summary
Tips

55
55
55

QUIZ
Multiple Choice
True or False
Fill in the Blank(s)

57
57
59
59

Contents

vii

SECTION III: MEMORY
Chapter 7: Memory: Introduction
Virtual Memory Manager
Computational Memory
File Memory
Paging and Swapping
VMM Tuning Evolution

63
63
65
65
65
66

Chapter 8: Memory: Monitoring
vmstat (Unix-generic)
Virtual Memory Summary
sar (Unix-generic)
lsps (AIX-specific)
ps (Unix-generic)
svmon (AIX-specific)
Memory Leak

67
68
71
71
73
73
74
77

Chapter 9: Memory: Tuning
vmo
minperm, maxperm, maxclient, and lru_file_repage
minfree and maxfree
Page Space Allocation
How Much Paging Space?
Paging Space Tuning
Thrashing and Load Control
Memory Scanning and lrubucket
rmss

81
81
82
84
85
86
87
87
88
89

Section III: Summary, Tips, and Quiz
Summary
Tips

91
91
92

QUIZ
Multiple Choice
True or False
Fill in the Blank(s)

94
94
96
96

SECTION IV: DISK I/O
Chapter 10: Disk I/O: Introduction
Direct I/O
Concurrent I/O

99
101
101

viii

Contents

Asynchronous I/O
Logical Volumes and Disk Placement: Intra- and Inter-Policy
Inter-Disk Policy
File Systems

102
102
105
105

Chapter 11: Disk I/O: Monitoring
sar
topas
Logical Volume Monitoring
AIX LVM Commands
filemon and fileplace
filemon
fileplace

107
107
108
111
112
116
116
117

Chapter 12: Disk I/O: Tuning
lvmo
ioo
JFS2 Tuning Options

119
119
120
122

Section IV: Summary, Tips, and Quiz
Summary
Tips

125
125
126

QUIZ
Multiple Choice
True or False
Fill in the Blank

128
128
129
130

SECTION VNETWORK I/O

131

Chapter 13: Network I/O: Introduction
Network I/O Overview
NFS
Media Speed
Network Subsystem Memory Management
Virtual and Shared Ethernet

133
134
136
139
141
141

Chapter 14: Network I/O: Monitoring
netpmon
Monitoring NFS
nfsstat
nfs4cl

143
145
148
149
151

Contents

netpmon and NFS
Monitoring Network Packets
iptrace, ipreport, and ipfilter
tcpdump

152
154
154
156

Chapter 15: Network I/O: Tuning
Name Resolution
Maximum Transfer Unit
Tuning: Client
Tuning: Server

157
161
162
162
164

Section V: Summary, Tips, and Quiz
Summary
Tips

167
167
168

QUIZ
Multiple Choice
True or False
Fill in the Blank

170
170
171
172

SECTION VI: BONUS TOPICS
Chapter 16: AIX 6.1
Introduction
Memory
CPU
Disk I/O
JFS2
iSCSI
I/O Pacing
Asynchronous I/O
Network
NFS

175
175
176
179
179
179
179
180
180
182
183

Section VI: Chapter 16 Quiz
Multiple Choice
True or False
Fill in the Blank

185
185
187
187

Chapter 17: Tuning AIX for Oracle
Memory
CPU

189
189
192

Contents

Asynchronous I/O Servers
Concurrent I/O
Oracle Tools
Statspack
Oracle Enterprise Manager

192
193
194
194
195

Section VI: Chapter 17 Quiz
Multiple Choice
True or False
Fill in the Blank

197
197
198
198

Chapter 18: Linux on Power
Monitoring
Handy Linux Commands
Virtualization
Tuning

199
199
200
201
202

Section VI: Chapter 18 Quiz
Multiple Choice
True or False
Fill in the Blank(s)

205
205
205
206

Quiz Answers
Section I: Introduction
Section II: CPU
Section III: Memory
Section IV: Disk I/O
Section V: Network I/O
Section VI / Chapter 16: AIX 6.1
Section VI / Chapter 17: Tuning AIX for Oracle
Section VI / Chapter 18: Linux on Power

207
207
207
207
207
208
208
208
208

Foreword

As computers have become increasingly sophisticated, the task of tuning
the operating system to yield high performance for its applications while
providing optimal total cost of ownership (TCO) for the IT owners has
become increasingly complex. In the early days of computers, the OS typically ran only one application at a time, and most performance tuning was
targeted at minimizing the number of instructions required to run the application within the limited resources (CPU, memory, disk/tape, networking)
of a uniprocessor system. With advances in virtual memory, multitasking,
multicore, caches, faster networks, huge storage devices and databases,
and, in the past decade, the flourishing of virtualization technologies (e.g.,
LPARs, DLPARs, simultaneous multithreading, WPARs, virtual Ethernet,
virtual SCSI), the task of performance optimization has become far more
complex and has shifted to tuning the OS and balancing the hardware
resources across LPARs within a hardware box. Nonetheless, the tuning
goals remain the same: to yield high performance for applications while
providing optimal TCO for IT owners.
Ken Milberg, with his rich background in managing, operating, and writing
about Unix and Linux systems, has abstracted the essence of the complex
tuning process, which he clearly describes in Chapter 1. In fact, the tuning
methodology described therein is applicable to most OS types: establish a
baseline, stress test and monitor, identify the bottleneck, tune, and repeat.
The rest of the book highlights the important monitoring and tuning tools
for each major subcomponent of the AIX/POWER system. The progression of the topics is great, from the core to progressively further-away

xii

Foreword

components — from CPU to memory to disk to network, paralleling the
AIX tools schedo, vmo, ioo, no, and nfso.
The tips and quiz at the end of each section are a treat. Not only do they
give a summary review of the key items covered, but they also provide a
lot of fun and satisfaction, especially when you can verify whether you’ve
understood everything correctly by checking against the provided answers.
To sum up, this is a book that every AIX system administrator and systems
manager should read.
—Joefon Jann
Distinguished Engineer,
Research Lead in AIX and POWER Systems Software
IBM Thomas J. Watson Research Center, Yorktown Heights, New York

Preface
Why this book? Although a Google search may show a fair number of
books about AIX, including a couple about performance tuning, just about
all of them are at least a decade old. IBM provides a tremendous amount
of information through its portals and Redbooks, but it is not unusual for
administrators seeking to tune their boxes to examine dozens of Web sites
and Redbooks before finding the information they need. This book brings
it all together for you, and more. Further, I review best practices and provide tips and tricks that are not usually covered in the IBM literature. Last,
the book provides an impartial view (I don’t work for IBM) of systems
performance tuning based on the real-world experiences of a battle-scarred
systems administration veteran.
This book is intended for systems professionals who need to understand,
monitor, and control the factors that affect AIX performance on their IBM
POWER servers. It also includes bonus chapters on the recent innovations
of AIX 6.1, Linux on Power (LoP) performance, and running Oracle on
AIX.
This is an intermediate book about AIX performance analysis and systems
tuning. The material comes both from IBM sources and from real life,
based on my experiences as a Unix professional supporting production systems for more than 20 years (almost half of them on AIX), in many capacities and for a broad range of industries.
Because this book is not an introduction to Unix, prior knowledge of Unix
(and AIX in particular) is recommended, although I would not say it is a
prerequisite. The book covers tuning methodology, systems monitoring,

xiv

Preface

and performance tuning on all subsystems, including CPU, RAM, and
I/O (network and disk). As an introduction, I review time-tested tuning
and analysis methodology, steps that will assist you throughout the tuning
lifecycle.
The monitoring sections describe tools that will let you immediately gain a
foothold (taking quick-and-dirty snapshots on the health of the system) on
your system. They also discuss tools that will help you collect historic data
for the purpose of analyzing trends and results. All the tools used in this
book either are part of the standard IBM AIX systems build or are opensource products written by folks who work for IBM (e.g., nmon) and used
widely in the field of battle.
—Ken Milberg
August 2009

Section I
Introduction
This section introduces the concept of performance tuning methodology
and discusses the AIX operating system and how it has evolved through
the years. We also explore the development of IBM’s POWER architecture
and how it has changed from its early stages to the POWER6.

C h a p t e r

Performance Tuning Methodology

Performance tuning is a never-ending process, and an important concept
to understand is that it is not unusual to fix one bottleneck only to create another. That’s part of what makes our lives as AIX administrators so
indispensable! The following time-tested tuning and analysis methodology
will aid you throughout the tuning lifecycle:
1. Establish a baseline
2. Stress test and monitor
3. Identify bottleneck
4. Tune
5. Repeat (starting with step 2)

Step 1. Establishing a Baseline
Well before you ever tune a system, it is imperative to establish a baseline.
The baseline is a snapshot of what the system looks like when you first put
it into production, while it is performing at acceptable enough levels to the
business for it to be deployed. The baseline should not only capture performance statistics but also document the actual configuration of the system (amount of memory, CPU, and disk). It’s important to document the
system configuration because otherwise you won’t be comparing apples
with apples when the time comes to examine the baseline to your current

Chapter 1: Performance Tuning Methodology

configuration. This step is particularly relevant in our new partitioned
world, when you can dynamically add or subtract CPU resources at a
moment’s notice.
To come up with a proper baseline, you must first identify the appropriate tools to use for monitoring. Some tools are more suited to immediate
gratification, while others are geared more toward historical trending and
analysis. Tools such as nmon and topas, which we’ll discuss in detail in
Chapter 5, can serve both purposes.
Once you’ve identified your monitoring tools, you need to gather your
statistics and performance measurements. This information helps you to
define what an acceptable level of performance is for a given system. You
need to know what a well-performing system looks like before you start
receiving calls complaining about performance. You should also work with
the appropriate application and functional teams to define exactly what a
well-behaved system is. At that time, you would translate that definition
into an acceptable service level agreement (SLA), on which the customer
would sign off.

Step 2. Stress Testing and Monitoring
This step is where you monitor the system at peak workloads and during
problem periods. Stressing your system, preferably in a controlled environment, can help you make the right diagnosis — an essential part of performance tuning. Is your bottleneck really a CPU bottleneck, or is it related
more to memory or I/O?
It’s also important not to fall too much in love with any one utility. I like to
use several monitoring tools here to help validate my findings. For example, I might use an interactive tool (e.g., vmstat) and then a data capturing
tool (nmon) to help me track data historically.
The monitoring step is critical because you cannot effectively tune anything without having an accurate historical record of what has been going
on in your system, particularly during periods of stress. Larger organizations that recognize the importance of this process even have their own
stress-testing teams, which work together with application and infrastructure teams to test new deployments before putting them into production.

Step 4. Tuning

It’s also essential here to establish performance policies for the system.
You can determine the measures that are relevant during monitoring, analyze them historically, and then examine them further during stress testing.

Step 3. Identifying the Bottleneck
The objective of stressing and monitoring the system is to determine the
bottleneck. Ask any doctor: you cannot provide the correct medicine (the
tuning) without the proper diagnosis. If the system is in fact CPU-bound,
you can run additional tools, such as curt, ps, splat, tprof, and trace (we’ll
discuss these utilities later), to further identify the actual processes that are
causing the bottleneck.
It’s possible that your system might in fact be memory- or I/O-bound and
not CPU-bound. Fixing one bottleneck, such as a memory problem, can
actually cause another, such as a CPU bottleneck, because in this case your
system is now letting the CPU perform to its optimum capacity. At one
point in time, it might not have had the capacity to handle the increased
amount of resources given to it. I’ve seen this situation quite often, and it
isn’t necessarily a bad thing. Quite the opposite: it ultimately helps you
isolate all your bottlenecks and tune the system to its max.
You’ll find that monitoring and tuning systems is quite a dynamic process
and not always predictable. That’s what makes performance tuning as challenging as it is.

Step 4. Tuning
Once you’ve identified the bottleneck, it’s time to tune it. For a CPU
bottleneck, that usually means one of four solutions:
●

●

Balancing system workload — This solution involves running
processes at different intervals to more efficiently use the 24-hour
day. More often that not, this is what we usually do to resolve CPU
bottlenecks.
Tuning the scheduler — Tuning the scheduler using nice or renice
helps you assign different priorities to running processes to prevent
CPU hogs.

Chapter 1: Performance Tuning Methodology

●

Tuning scheduler parameters — Adjust scheduler parameters to finetune priority formulas. For example, you can use the schedo command to change the amount of time the operating system lets a given
process run before calling the dispatcher to choose another.
Increasing resources — Add CPUs or, in a virtualized environment,
reconfigure logical partitions (LPARs) to boost available resources.
This solution might include uncapping partitions or adding more
virtual processors to existing partitions. Virtualizing the partitioned
environment appropriately can help increase physical resource utilization, decrease CPU bottlenecks on specific LPARs, and reduce the
expense of idle capacity in LPARs that are not “breathing heavy.”

Step 5. Repeat
After tuning, you need to go through the process again, starting with step
2, stress testing and monitoring. Only by repeating your tests and consistently monitoring your systems can you determine whether your tuning
has made an impact. I know some administrators who simply tune certain
parameters based on best practices for a specific application and then move
on. That is the worst thing you can do. For one thing, what works in some
environments might not work in yours. More important, how do you really
know whether what you’ve tuned has helped the bottleneck unless you
look at the data?
To reiterate, AIX performance tuning is a dynamic and reiterative process,
and to achieve real success, you need to consistently monitor your systems,
which can only happen once you’ve established a baseline and SLA. The
bottom line is, if you can’t define the behavior of a system that runs well,
how will you define the behavior of a system that doesn’t?

C h a p t e r

Introduction to AIX

AIX — which stands for Advanced Interactive eXecutive — is a POSIXcompliant and X/Open-certified Unix operating system introduced by
IBM in 1986. While AIX is based on UNIX System V, it has roots in the
Berkeley Software Distribution (BSD) version of Unix as well. Today, AIX
has an abundance of both flavors (you can go with chocolate one day and
vanilla the next), providing another reason for its popularity.

Unix
From its introduction in 1969 and development in the mid-1970s, Unix
has evolved into one of the most successful operating systems to date.
The roots of this operating system go as far back as the mid-1960s, when
AT&T’s Bell Labs partnered with General Electric and the Massachusetts
Institute of Technology (MIT) to develop a multi-user operating system
called Multics (which stood for Multiplexed Information and Computer
Service). Dennis Ritchie and Ken Thompson worked on this project until
AT&T withdrew from it. The two eventually created another operating
system in an effort to port a computer game that simulated space travel.
They did so on a Digital Equipment Corporation (DEC) PDP-7 computer,
and they named the new operating system Unics (for Uniplexed Information and Computing Service). Somewhere along the way, “Unics” evolved
into “Unix.”

Chapter 2: Introduction to AIX

AIX
AIX was the first operating system to introduce the idea of a journaling
file system, an advance that enabled fast boot times by avoiding the need
to perform file system checking (fsck) for disks on reboot. AIX also has
a strong, built-in Logical Volume Manager (LVM), introduced as early as
1990, which helps to partition and administer groups of disks.
Another important innovation was the introduction of shared libraries,
which avoided the need for an application to statically link to the libraries
it used. The resulting smaller binaries used less of the hardware RAM to
run and required less disk space for installation.
IBM ported AIX to its RS/6000 platform of products in 1989. The release
of AIX Version 3 coincided with the announcement of the first RS/6000
models. At the time, these systems were considered unique in that they not
only outperformed all other machines in integer compute performance but
also beat the competition by a factor of 10 in floating-point performance.
Version 4, introduced in 1994, added support for symmetric multiprocessing (SMP) with the first RS/6000 SMP servers. The operating system
evolved until 1999, when AIX 4.3.3 introduced workload management
(WLM). In May 2001, IBM unveiled AIX 5L (the L stands for “Linux affinity”), coinciding with the release of its POWER4 servers, which provided for the logical partitioning of servers. In October of the following year,
IBM announced dynamic logical partitioning (DLPAR) with AIX 5.2.
The latest update to AIX 5L, AIX 5.3 (introduced in August 2004), provided innovative new features for virtualization, security, reliability, systems
management, and administration. Most important, AIX 5.3 fully supported
the Advanced Power Virtualization (APV) capabilities of the POWER5
architecture, including micropartioning, virtual I/O servers, and symmetric multithreading (SMT). Arguably, this was the most important release
of AIX in more than a decade, and it remains the most popular (as of this
writing). That is why we’ll primarily focus on AIX 5.3 for the purposes of
this book.
IBM announced AIX 6-Beta in May 2007 and formally introduced AIX
6.1 in November 2007. Major innovations of AIX 6.1 include workload

AIX Market Share

partitions (WPARs), which are similar to Solaris containers, and Live
Application Mobility (not available with Solaris), which lets you move
the partitions without application down time. Chapter 16 discusses performance monitoring and tuning on AIX 6.1.

AIX Market Share
AIX celebrated its 20th anniversary in January 2006, and it appears to have
an extremely bright future in the Unix space. IBM’s AIX has been the only
Unix that increased its market share through the years, and IBM continues
to own the market space for Unix servers. Most of the Unix growth at this
time stems from IBM.
AIX has benefited from the many hardware innovations that the POWER
platform has introduced through the years, and it continues to do so. IBM
has also made good decisions around its Linux strategy. Linux, supported
natively on the POWER5, more or less complements, rather than competes
with, AIX on the POWER architecture.

C h a p t e r

Introduction to POWER Architecture

The “POWER” in POWER architecture stands for Power Optimization
with Enhanced RISC, and it is the processor used by IBM’s midrange
Unix offering, AIX. POWER is a descendant of IBM’s 801 CPU and is a
second-generation Reduced Instruction Set Computer (RISC) processor. It
was introduced in 1990 to support Unix RS/6000 systems.
The POWER architecture incorporated many characteristics that were
already common in most RISC architectures. The instructions were fixed
in length (four bytes) and had consistent formats. What made the architecture unique among existing RISC architectures was that it was functionally
partitioned, separating the functions of program flow control, fixed-point
computation, and floating-point computation.
The objective of most RISC architectures was to be extremely simple
so that implementations would have an extremely short cycle type. This
approach would result in processors that could execute instructions at
the fastest possible clock rate. The designers of the POWER architecture
chose to minimize the total time spent to complete a task. This time was a
byproduct of three different components: path length, the number of cycles
needed to complete an instruction, and cycle time.
During the early 1990s, five different RISC architectures actively competed with one another. IBM partnered with Apple and Motorola to come up
with a common architecture that would meet the standards of an alliance
they would form. The first design was very simple, and all its instructions

Chapter 3: Introduction to POWER Architecture

were completed in one cycle. It lacked floating-point and parallel processing capability. The POWER architecture was a real attempt to correct this
flaw. It consisted of more than 100 instructions and was known as a complex RISC system.
The POWER1 chip consisted of 800,000 transistors per chip and was
functionally partitioned. It had separate floating-point registers and could
scale from low-end to the highest-end workstations. The first chip actually consisted of several chips on a single motherboard but was refined to
one RISC chip with more than a million transistors. Some of you may be
surprised to learn that this chip was actually used as the CPU for the Mars
Pathfinder mission.
The POWER2 chip was released in 1993 and was the standard-bearer for
nearly five years. It contained 15 million transistors per chip. It also added
a second floating-point unit (FPU) and extra cache. This chip was known
for powering the IBM Deep Blue supercomputer that would beat Garry
Kasparov at chess in 1997. (Joefon Jann, whose team developed this system, wrote the Foreword to this book.)
The POWER3 architecture was the first 64-bit symmetric multiprocessor.
Designed to work on both scientific and technical computer applications,
it included a data prefetch engine, dual floating-point execution units, and
a nonblocked interleaved data cache. It used copper interconnect, which
delivered double the performance for the same price.
The POWER4 (code-named Regatta) architecture, released in 2001,
featured 174 million transistors per processor. It incorporated micron
copper and silicon-based technology. Each processor had 64-bit, 1 GHz
PowerPC cores and could execute as many as 200 instructions simultaneously. POWER4 became the driving force behind the IBM Regatta Servers, which supported logical partitioning. The POWER4 processor supported logical partitioning with a new privileged processor state called the
POWER Hypervisor mode.

POWER5

As wonderful as the Regattas were, if you purchased one shortly before the
POWER5 systems were released, you were not a happy camper.

POWER5
IBM’s POWER5 architecture, introduced in 2003, contained 276 million
transistors per processor. It was based on the 130 nm copper/silicon-oninsulator (SOI) process and featured chip multiprocessing, a larger cache,
a memory controller on the chip, simultaneous multithreading (SMT),
advanced power management, and improved Hypervisor technology. The
POWER5 was built to allow up to 256 logical partitions and was available
on IBM’s System i and System p servers. Each POWER5 core is designed
to support SMT and single-threaded modes. The software (the Hypervisor)
switches the processor from SMT to single-threaded mode.
Some of the objectives of the POWER5 were
●

To maintain binary capability with older POWER4 systems

●

To enhance and extend symmetric multiprocessing (SMP) scalability

●

To improve performance and reliability

●

To provide additional server flexibility

●

To improve power efficiency

●

To provide virtualization capabilities

As a result of its dual-core design and support for SMT, one POWER5 chip
appears as a four-way microprocessor to the operating system. Processors
using SMT can issue multiple instructions from different code paths during
a single cycle. Multiple instructions from both hardware threads can be
issued from one cycle.

Chapter 3: Introduction to POWER Architecture

Figure 3.1 depicts the Hypervisor, without which there is no virtualization.

Programs
AIX 5L

Programs
Linux

Programs
IBM i

Open Firmware
RTAS

TIMI
SLIC

POWER Hypervisor
POWER 64-bit Processor
Figure 3.1: Hypervisor architecture

As you examine this architecture, you can see that the layers above the
POWER Hypervisor are similar, but the contents are characterized by the
operating system. The layers of code supporting AIX and Linux consist of
system firmware and Run-Time Abstraction Services (RTAS). Open Firmware and RTAS are both platform-specific firmware, and both are tailored
by the platform developer to manipulate the specific platform hardware.
In the POWER5 processor, IBM introduced further design enhancements
that enabled the sharing of processors by multiple partitions. The POWER
Hypervisor Decrementer (HDEC) is a new hardware facility in the POWER5 design that is programmed to provide the POWER Hypervisor with
a timed interrupt independent of partition activity. It was the POWER5
architecture, along with the extraordinary virtualization capabilities of
Advanced Power Virtualization (APV) that really paved the way for server
consolidation around IBM POWER systems. (IBM has since rebranded the
term Advanced Power Virtualization to PowerVM.)

POWER6
The POWER6, with approximately 790 million transistors, debuted in
June 2007. Its dual-core design enabled it to reach 4.7 GHz. Innovations

POWER6

in energy and cooling let it retain the same power consumption as the
POWER5 while almost doubling performance.
The POWER6 has hardware support for decimal arithmetic. It also has the
first decimal floating-point unit integrated in silicon. Several important
APV enhancements were also released with the POWER6, including Live
Partition Mobility, Decimal Floating Point, and Dynamic Energy Management. It was around this time that IBM rebranded APV to PowerVM.

Section I
Summary, Tips, and Quiz

Summary
●

The five-step performance tuning methodology is:
1. Establish a baseline
2. Stress test and monitor
3. Identify bottleneck
4. Tune
5. Repeat (starting with step 2)

●

Unix was “invented” in 1969, the result of an effort by Dennis Ritchie
and Ken Thompson to port a computer game to a DEC PDP-7 following
their work with AT&T’s Bell Labs.
AIX, which stands for Advanced Interactive eXecutive, was introduced
by IBM in 1986. It is the first version of Unix to provide a journaling
file system and to incorporate a Logical Volume Manager (LVM) in the
base operating system.
IBM’s Power Optimization with Enhanced RISC (POWER) architecture
was introduced in 1990 to support RS/6000 systems.
AIX 5L, introduced in May 2001, provided for the logical partitioning
of servers with the POWER4 architecture.
AIX 5.3, released in 2004, would become the most important release
of AIX in more than a decade. It boasted support for Advanced Power
Virtualization (APV) and the new POWER5 architecture. IBM has since
rebranded the term Advanced Power Virtualization to PowerVM.

Section I: Summary, Tips, and Quiz

●

AIX 6 and the POWER6 architecture were released in 2007 (the former
in the spring and the latter in the fall). AIX 6 enhancements include
workload partitioning and Live Application Mobility. POWER6 innovations include Live Partition Mobility, Decimal Floating Point, and
Dynamic Energy Management.

Tips
●

●

Do not, under any circumstances, introduce an application into
production without first implementing a proactive performance
monitoring strategy. Otherwise, you will never really know what
your subsystems (CPU, I/O, memory) should look like when the
system is performing well and its performance has been deemed acceptable to the business and/or application folks. The time to start monitoring your system is before you’ve been told that the system is slow,
not after.
Use more than one monitoring tool so that you can use each to validate
the findings of the others.
Create multiple environments for your application architecture, including development, test, and/or quality assurance.
Establish a deployment and stress-testing strategy for how applications
are tested and deployed into production. These measures will help you
ensure the reliability and performance of your applications.
Spend time analyzing your performance data. Remember, you can’t
prescribe the right medicine (tune) without a proper diagnosis (analysis
of historic data).
Introduce one change at a time when tuning your systems. Otherwise,
how will you really know what the true effect of each change is?
Use the virtualization capabilities of AIX 5.3 and APV (now
PowerVM). These innovations can help you save big money on total
cost of ownership and help drive a large return on investment for server
and data center consolidation projects.
Don’t upgrade to AIX 6.1 simply because you’ve fallen in love with
the new technology. Remember that your production application might
not share that love. Create a 6.1 partition on your POWER server so

Multiple Choice

that you can start playing nicely in the sandbox. Note that POWER6
innovations such as Live Partition Mobility are fully supported on AIX
5.3 (Technology Level 7, or TL_7).

Quiz
Multiple Choice
1. AIX stands for
a. Advanced Interactive Unix
b. Advanced Interactive eXecutive
c. Advanced Unix
d. It’s just an acronym.
2. AIX was introduced in
a. 1969
b. 1986
c. 1990
d. 1994
3. Which is the first Unix that introduced journaling file systems?
a. Solaris
b. HP-UX
c. AIX
d. Linux

Section I: Summary, Tips, and Quiz

4. Advanced Power Virtualization was introduced with which combination?
a. AIX 5.3 and POWER5
b. AIX 5.2 and POWER5
c. AIX5L and POWER4
d. AIX 6.1 and POWER5
5. DLPAR stands for
a. Logical partitioning
b. Advanced power virtualization
c. Dynamic logical partitioning
d. Nothing

True or False
6. Linux cannot run natively on the POWER architecture.
7. Performance monitoring and tuning is a never-ending process.
8. Fixing a bottleneck should not cause another bottleneck to occur.
9. Never make more than one tuning change at the same time.

Fill In the Blank(s)
10. Fill in the missing steps of the five-step tuning methodology described
in this book:
1. __________________
2. Stress test and monitor
3. __________________
4. __________________
5. __________________

Section II
CPU
This section provides an overview of CPU monitoring and tuning and
discusses best practices for CPU performance tuning, given the various
considerations that can impact performance.

C h a p t e r

CPU: Introduction

Unlike other subsystems (e.g., memory, I/O), when it comes to CPU, there
is less to actually tune and more you can do on the back end (e.g., balancing systems workload) to ensure your systems are running smoothly. As a
Unix administrator, you need to understand which tools are best used for
which purpose. As far as monitoring is concerned, some tools are better
suited to quick-and-dirty system snapshots, while others are clearly more
effective for long-term trending and analysis. Choose the tool that best fits
the situation you’re faced with.
For example, if you’re experiencing a serious production problem, you
don’t have five days to perform long-term analysis — you may not even
have more than five minutes to come up with something. Nevertheless, you
still need to arrive at the right diagnosis to help determine the bottleneck.
Often, you’ll find that the bottleneck isn’t actually CPU but relates to
memory or I/O. Most users assume CPU is the problem and figure the
box needs more horsepower. However, CPU usually isn’t the culprit, and
throwing more iron at a problem is neither the quickest nor the most costeffective way to solve the issue. Furthermore, trying to tune the CPU subsystem when virtual memory is the problem could be a real disaster. Before
you look for a way to tune, take the time to analyze the system properly.
I don’t mean to be condescending here. It’s just that sometimes we don’t
take the time to monitor and analyze. We rush to judgment because of the
pressure we’re under to solve problems and move on to the next issue or

Chapter 4: CPU: Introduction

production concern. This is one reason that, when first investigating any
performance bottleneck, I prefer to use tools that focus less on a specific
area but provide a better understanding of the big picture. The bottom line
is that you really want to make sure you have a CPU problem if that’s what
you’re trying to tune. More on this point later.
As an AIX administrator, you should already know some of the basic
tools of performance monitoring — commands such as vmstat and topas
— and you should be familiar with ways to identify processes that are
CPU hogs. What some people have a hard time understanding is that CPU
performance tuning isn’t about running some tuning commands but about
proactively monitoring systems, particularly when you’re not experiencing
performance problems. Without historical data to analyze, there can be no
effective performance tuning.
Performance in a virtualized environment provides challenges to even the
most senior of administrators, so I’ll also go over specific concepts for a
virtualized environment, including simultaneous multithreading (SMT),
virtual processors, and the POWER Hypervisor.
As far as the methodology, when investigating a perceived performance
problem, start by monitoring the statistics of CPU utilization. It’s important
to continuously observe system performance because you need to compare
the loaded system data with normal usage data, which is the baseline. Because the CPU is one of the fastest components of the system, if CPU utilization keeps the CPU 100 percent busy (which happens to every system at
some time), you’ll need to investigate the process that causes this situation.
AIX provides many trace and profiling tools to follow the most complex of
processes. Don’t be afraid to also use any application or database tools at
your disposal to help you further. In a CPU-bound system, all the processors are 100 percent busy, and some jobs may be waiting for CPU time in
the run queue. Generally speaking, a system has an excellent chance of becoming CPU-bound if the CPU is 100 percent busy, has a large run queue
compared with the number of CPUs, and requires more context switches
than usual.
That’s the quick and dirty. We’ll get into much more detail in the next
couple of chapters.

C h a p t e r

CPU: Monitoring

AIX systems administrators have much more at their disposal than the average Unix administrator. Not only can you use the standard Unix generic
monitoring tools that have been around nearly as long as Unix itself, but
a potpourri of AIX-specific commands is also available. Some of these
commands come standard with an AIX build, while others are tools that,
although not officially supported by IBM, are widely distributed and are
used by most administrators. We’ll discuss all these types of monitoring
tools in this chapter, including those we don’t use very often.
As we go through the tools, note that four commands — mpstat, sar,
topas, and vmstat — have been enhanced in AIX 5.3 to enable the tools
to report back accurate statistics about shared partitions using Advanced
Power Virtualization (PowerVM). The trace-based tools curt, filemon,
netpmon, pprof, and splat have also been updated. One command not
covered here, lparmon, is the most comprehensive tool you can use in a
partitioned environment.

vmstat (Unix-generic)
vmstat [-fsviItlw] [[-p|-P] pagesize|ALL] [Drives] [Interval [Count]]

While the vmstat command is more commonly associated with viewing information about virtual memory (hence the “vm”), it is the first
tool most administrators invoke when trying to get a quick assessment of
their systems. That’s because vmstat reports back all kinds of pertinent

Chapter 5: CPU: Monitoring

performance-related information, including data about memory, paging,
blocked I/O, and overall CPU activity. Because it reports virtually all
subsystem information line by line in a quick and painless way, running
vmstat is probably the simplest and most efficient way to gauge exactly
what is going on in your system.
A common way to run vmstat is for five iterations every two seconds:
vmstat 2 5

Running the command in this way produces the following results:
# vmstat 2 5
System configuration: lcpu=4 mem=3072MB ent=0.40
kthr

memory

page

faults

----- ------------- ---------------------avm

fre

cpu

---------- ----------------------

0 128826 641397

448 87 138

in sy

cs us sy id wa
0

1 98

0.01

2.8

0 128826 641397

385 10 136

1 99

0.01

2.2

0 128826 641397

381 13 138

1 99

0.01

2.2

0 128826 641397

364 40 138

1 99

0.01

2.4

0 128826 641397

610 13 138

2 98

0.01

3.3

In addition to specific monitoring information, vmstat provides a very
high-level snapshot of the system, which can be useful. Just by running
vmstat in the preceding snapshot, we know that we have a system with
four logical CPUs and 3 GB of RAM and are using shared processors. (In
actuality, this partition is using two physical CPUs; symmetric multithreading is enabled, yielding the four logical CPUs. More about SMT later.)
Some of the more important fields in the vmstat output include the
following:
●

r — The average number of runnable kernel threads over the sampling interval you have chosen.

vmstat (Unix-generic)

●

b — The average number of kernel threads in the virtual memory
waiting queue over the sampling interval. The r value should always
be higher than b; if it is not, you probably have a CPU bottleneck.
fre — The size of the memory free list. Don’t worry too much if this
number is really small. More important, determine whether any paging is going on if this size is small.

●

pi — Pages paged in from paging space.

●

po — Pages paged out to paging space.

Our focus in this chapter is on the last section of output, CPU:
●

us — User time

●

sy — System time

●

id — Idle time

●

wa — Time spent waiting on I/O

●

pc — Number of physical processors consumed (displayed only if
the partition is configured with shared processors)
ec — Percentage of entitled capacity (displayed only if the partition
is configured with shared processors)

Clearly, the system in our example has no bottleneck to speak of. How can
we tell this? Let’s look at us and sy. If these entries combined consistently
averaged more than 80 percent, you more than likely would have a CPU
bottleneck. If you are in a state where the CPU is running at 100 percent
(which happens on occasion to everyone), your system is really breathing hot and heavy. If the numbers are small but the wait time (wa) is on
the high side (usually greater than 30), this usually signals that there may
be I/O problems, which in turn can cause the CPU not to work as hard as
it can. Alternatively, if more time is spent in sy time than us time, your
system is probably spending less time crunching numbers and more time
processing kernel data. When this happens, it is usually a sign either of
badly written code or that something has run amok.

Chapter 5: CPU: Monitoring

Let’s look at another system:
# vmstat 2 5
System configuration: lcpu=4 mem=3072MB ent=0.40
kthr
----r b
2 1
3 2
4 1
2 1
6 2

memory
page
faults
cpu
------------- ---------------------- ------------- ----------------------avm
fre
re pi po fr sr cy in
sy cs us sy id wa
pc
ec
169829 600290 0
0
0
0
0
0 553 36538 175 64 32 4 0 0.79 84.9
169829 600290 0
0
0
0
0
0 778 33033 175 60 29 11 0 0.84 73.2
169828 600291 0
0
0
0
0
0 403 11904 179 76 10 4 10 0.69 87.8
169828 600291 0
0
0
0
0
0 368 30745 175 82 14 2 2 0.91 85.5
169830 600289 0
0
0
0
0
0 395 27898 173 57 34 4 5 0.89 91.5

What kind of determination can we make here? When we add us and sy,
our numbers come out much differently this time — fairly close to 100
percent. This system is clearly CPU-bound. If paging were going on, we
would see numbers in the paging (page) columns. In this case, no paging
is occurring, nor are there any I/O problems to speak of. Because vmstat is
an all-purpose utility, it can help you perform this quick-and-dirty analysis
on the fly. If the blocked processes represented three times the number of
runnable processes and everything else stayed the same, I/O would likely
be causing the CPU bottleneck. In that case, you should be prepared to
have even more of a CPU bottleneck once you fix the I/O problem. As I
explained previously, this is all part of systems tuning; fixing one bottleneck often causes another.

sar (Unix-generic)
sar {-A [-M]|[-a][-b][-c][-d][-k][-m][-q][-r][-u][-v][-w][-y][-M]}
[-s hh[:mm[:ss]]] [-e hh[:mm[:ss]]]
[-P processor_id[,...] | ALL]
[-f file] [-i seconds] [-o file] [interval [number]]
[-X file] [-i seconds] [-o file] [interval [number]]

The sar command is the Unix System Activity Reporting tool (part of the
bos.acct fileset). It is most commonly used to analyze CPU activity. The
command writes to standard output the contents of the cumulative activity,
similar to vmstat. The default version of sar produces a CPU utilization
report:

sar (Unix-generic)

# sar 2 5
AIX lpar30p682e_pub 3 5 00CED82E4C00
12/24/07
System configuration: lcpu=4 ent=0.40 mode=Uncapped
10:13:40
10:13:42
10:13:44
10:13:46
10:13:48
10:13:50

%sys
31
30
35
11
24

%wio
0
0
0
0
0

%idle
57
58
51
83
67

physc
0.18
0.17
0.20
0.07
0.14

%entc
44.5
43.5
50.8
18.0
34.5

0.15

38.3

Average

%usr
13
12
14
6
9

Used this way, the sar command provides the same type of high-level
information that vmstat does, although it also lets you know the mode
in which the system is running, which is helpful. In the example, we can
see that our partition is an uncapped partition, which, when configured
as such, lets the partition use more resources than its entitled capacity. In
this default view, the fields themselves are the same as the vmstat fields,
but us becomes usr, sy becomes sys, id becomes idle, io becomes wio, pc
becomes physc, and ec becomes entc.
A more effective way to run sar is to view all processors by using the ALL
flag:
# sar -u -P ALL 2 5
AIX lpar30p682e_pub 3 5 00CED82E4C00
12/24/07
System configuration: lcpu=4 ent=0.40 mode=Uncapped
10:24:18 cpu
10:24:20 0
1
2
3
U
10:24:22 0
1
2

%usr
27
0
0
0
10
32
0
0

%sys
71
35
36
29
27
66
37
35

%wio
0
0
0
0
0
0
0
0
0

%idle
2
65
64
71
62
63
2
63
65

physc
0.15
0.00
0.00
0.00
0.25
0.15
0.15
0.00
0.00

%entc
37.5
0.5
0.0
0.0
61.8
38.2
37.2
0.6
0.0

Chapter 5: CPU: Monitoring

10:24:24

3
1
2
3
U
0

0
0
0
0
12
29

30
37
35
30
25
69

0
0
0
0
0
0
0

70
63
65
70
62
63
2

0.00
0.00
0.00
0.00
0.25
0.15
0.15

0.0
0.6
0.0
0.0
62.1
37.9
37.7

I prefer using vmstat to sar because vmstat provides a quick snapshot of
all subsystems, not just CPU. Although you can use other flags to obtain
additional subsystem information using sar, it just is not as efficient or
simple.
One advantage sar provides that vmstat does not is the ability to capture
information and analyze data. This is done through the System Activity Data Collector (sadc), which is essentially a back end to sar. When
enabled through cron (it is commented out on a typical default AIX partition), sadc collects data periodically in binary format. In the following
example, we run it from the command line:
# /usr/lib/sa/sadc 2 5 /tmp/sarinfo

To view the results (remember it’s in binary format), we need to use the –f
flag:
# sar -f /tmp/sarinfo
AIX lpar30p682e_pub 3 5 00CED82E4C00
12/24/07
System configuration: lcpu=4 ent=0.40 mode=Uncapped
10:41:42
10:41:44
10:41:46
10:41:48
10:41:50
Average

%usr
0
0
0
0
0

%sys
1
1
1
1
1

%wio
0
0
0
0
0

%idle
99
98
99
99
99

physc
0.01
0.01
0.01
0.01
0.01

%entc
2.4
2.6
2.1
1.9
2.3

w (Unix-generic)

iostat (Unix-generic)
iostat [-a][-l][-s][-t][-T][-z] [{-A [-P] [-q|Q]} | {-d|-D [-R]} ]
[-m] [Drives] [Interval [Count]]

The iostat command is another good first-impression type of tool, which
is more commonly used for I/O information. When run with the –t flag, it
provides only tty/cpu information. I also like to use the –T flag to obtain the
timestamp:
# iostat -tT 1
System configuration: lcpu=4 ent=0.40
tty:

tin
0.0
0.0
0.0
0.0
0.0

tout
41.0
182.0
92.0
92.0
92.0

avg-cpu: % user % sys % idle % iowait physc % entc time
0.0
1.1
98.8
0.0
0.0
2.2 10:51:13
0.0
0.9
99.0
0.0
0.0
1.8 10:51:14
0.0
0.9
99.1
0.0
0.0
1.7 10:51:15
0.1
1.1
98.8
0.0
0.0
2.1 10:51:16
0.0
1.4
98.6
0.0
0.0
2.7 10:51:17

w (Unix-generic)
/usr/bin/w64 [ -hlsuwX ] [ user ]

The w command prints a summary of all current activity on the system.
I like this command — always have and always will. Sometimes I run it
even before vmstat. I appreciate the clear, concise way in which w provides important information, such as load average. You can tell a lot about
your system from the load average. If my load average commonly varies
between 2 and 5 but is 37 when I run this command, I’m about ready to
say, “Houston we have a problem.” In the following case, we’re okay.
# w
08:29AM

up 1 day,

User
tty
u0004773 pts/0
u0004773 pts/1

23:44,
login@
06:40AM
08:28AM

2 users,
idle
0
0

load average: 1.00, 1.00, 1.01
JCPU
0
0

PCPU what
0 -ks
0 –ksh

Chapter 5: CPU: Monitoring

lparstat (AIX-specific)
lparstat { -i | [-H|-h] [Interval [Count]] }

The purpose of the lparstat command is to report logical partition (LPAR)
information statistics. This command also displays hypervisor statistical data about many POWER Hypervisor calls. Introduced in AIX 5.2,
lparstat is commonly used to assist in shared-processor partitioned
environments.
In the following command output, you should recognize the entries up
until entitled capacity (entc).
# lparstat 2 5
System configuration:
type=Shared mode=Uncapped smt=On lcpu=4 mem=3072 psize=16 ent=0.40
%user
----0.1
0.0
0.0
0.0
0.1

%sys
---1.4
1.4
1.3
1.5
1.1

%wait
----0.0
0.0
0.0
0.0
0.0

%idle physc %entc lbusy
----- ----- ----- -----98.5 0.01
2.6
0.0
98.6 0.01
2.6
0.0
98.7 0.01
2.4
0.0
98.5 0.01
2.8
1.2
98.8 0.01
2.1
0.0

vcsw phint
---- ----582
0
635
0
593
0
685
0
458
1

On shared partitions, lparstat provides the following information:
●

●

lbusy — The percentage of logical processor utilization (executing at
the user and system level)
vcsw — The number of virtual context switches that are virtual processor hardware preemptions
phint — The number of phantom interrupts (redirected to other partitions in the shared pool)

An important flag worth a mention is the –h flag, which shows the POWER
Hypervisor statistics:

mpstat (AIX-specific)

# lparstat -H 2 5
System configuration:
type=Shared mode=Uncapped smt=On lcpu=4 mem=3072 psize=16 ent=0.40
Detailed information on Hypervisor Call
Hypervisor
Call
remove
read
nclear_mod
page_init
clear_ref
protect
put_tce
xirr

Number of
Calls
0
0
0
265
0
0
0
565

%Total Time
Spent
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.1

%Hypervisor
Time Spent
0.0
0.0
0.0
0.9
0.0
0.0
0.0
2.4

Avg Call
Time(ns)

Max Call
Time(ns)

1
1
1
604
1
1
1
758

656
0
0
6593
0
0
0
1406

Hypervisor information includes:
●

Number of calls — The number of Hypervisor calls

●

%Total Time Spent — Percentage of total time spent on call

●

%Hypervisor Time Spent — Percentage of Hypervisor time spent on
call
Avg Call Time — Average call time for this type of call; the percentage of logical processor utilization executing at the user and system
level (in nanoseconds)
Max Call Time — Maximum call time for this type of call (in nanoseconds)

For partitions running AIX 5.2 or AIX 5.3, either in a dedicated environment or in shared and capped mode, the overall CPU utilization is based on
the user, sys, wait, and idle values. In AIX 5.3 partitions running in uncapped mode, the utilization is based on the entitled capacity percentage.

mpstat (AIX-specific)
mpstat [ { -a | -d | -i | -s | -h } ] [ -w ] [ interval [ count ] ]

Chapter 5: CPU: Monitoring

The mpstat command (part of the bos.acct fileset) was introduced in AIX
5.3. This tool displays overall performance numbers for all logical CPUs
on your partitioned system. When you run the command, two sections
of statistics are displayed. The first section shows system configuration
information, which is displayed when the command starts and whenever
a change in the system configuration occurs; the second section, which is
displayed at user-specified intervals, shows utilization statistics:
# mpstat 1 2
System configuration: lcpu=4 ent=0.4 mode=Uncapped
cpu min maj mpc int
cs ics
rq mig lpa sysc us sy wa id
pc %ec lcs
0
18
0
0 524 125
56
1
0 100 100 8 58 0 34 0.01 2.1 465
1
0
0
0 108
0
0
0
0
0 0 36 0 64 0.00 0.5 108
2
0
0
0
10
0
0
0
0
0 0 32 0 68 0.00 0.0
10
3
0
0
0
10
0
0
0
0
0 0 29 0 71 0.00 0.0
10
U
- - - 0 97 0.39 97.3
ALL
18
0
0 652 125
56
1
0 100 100 0 1 0 98 0.01 2.7 593
------------------------------------------------------------------------------0
1
2
3
U
ALL

3
0
0
0
3

0
0
0
0
0

392
70
10
10
482

127
0
0
0
127

58
0
0
0
58

1
0
0
0
1

0 100
0
0
0
0 100

67
0
0
0
67

5
0
0
0
0

56
34
32
29
1

0
0
0
0
0
0

38
66
68
71
98
99

Information given includes:
●

cpu — Logical CPU processor ID

●

min — Minor page faults

●

ma — Major page faults

●

mpc — Total number of interprocessor calls

●

int — Total number of interrupts

●

cs — Total number of voluntary context switches

●

ics — Total number of involuntary context switches

0.01 1.4
0.00 0.4
0.00 0.0
0.00 0.0
0.39 98.2
0.01 1.8

331
70
10
10
421

topas (AIX-specific)

●

rq — Total run queues

●

mig — Total number of thread migrations

●

lpa — Logical processor affinity

●

sysc — Total number of system calls

●

us — CPU time spent on user activity

●

sy — CPU time spent on system activity

●

wa — CPU time spent waiting on I/O

●

id — CPU time idle

●

pc — Fraction of processor consumed

●

%ec — Percentage of entitled capacity consumed

●

lcs — Total number of logical context switches

The mpstat command is a very useful command because it reports collection information for each logical CPU on your partition in a format that is
clearly illustrated. You can even view SMT utilization by specifying the –s
flag:
# mpstat -s 1
System configuration: lcpu=4 ent=0.4 mode=Uncapped
Proc0
Proc1
1.01%
0.02%
cpu0
cpu1
cpu2
cpu3
0.85%
0.16%
0.01%
0.01%
-----------------------------------------------------------------Proc0
Proc1
0.74%
0.02%
cpu0
cpu1
cpu2
cpu3
0.56%
0.18%
0.01%
0.01%

topas (AIX-specific)
IBM has improved the topas command (part of the bos.perf.tools fileset)
substantially in AIX 5.3. Before these changes, topas did not have the

Chapter 5: CPU: Monitoring

ability to capture historical data, nor was it enhanced for use in shared
partitioned environments. (The command’s –L flag now reports partitioned
information.) By incorporating these changes to let you collect performance data from multiple partitions, IBM has really simplified the capability of topas as a performance management and capacity planning tool. The
command’s look and feel is quite similar to top and monitor (used in other
Unix variants).
The topas utility displays all kinds of information on your screen in a textbased, graphical type of format. In its default mode, it provides a myriad of
CPU, memory, and I/O information. Some recent changes:
●

●

As of TL_4 of AIX 5.3, topas uses a daemon named xmwlm, which
is automatically started from the inittab.
As of TL_5 of AIX 5.3, the system keeps seven days of data as a
default and records almost all the topas data that is displayed interactively, except for process and Workload Manager (WLM) information. You can use the topasout command to generate text-based
reports. By specifying the –C flag, you can actually view monitoring
information across all partitions in an IBM POWER system.

nmon
My favorite of all performance monitoring tools is nmon, which until
recently was not an “officially” supported IBM tool; if you were going to
send data to IBM for analysis, this was not the tool you would use. nmon
is almost the perfect AIX analysis tool (it’s also available now for Linux
on POWER). The data it collects is available either from your screen or
through reports, which you can run from cron. In the words of nmon’s
creator, Nigel Griffiths, “Why use five or six tools when one free tool can
give you everything you need?”
What attracts most people to nmon is that not only does it have a very
efficient front-end monitor, but it also provides the ability (unlike topas)
to capture data to a text file for graphing reports because the output is in a
.csv (spreadsheet) format. In fact, moments after running an nmon session, you can actually view the nicely rendered charts in a Microsoft Excel
spreadsheet, which you can hand off to senior management or other techni-

Using nmon for Historical Analysis

cal teams for further analysis. Further, in contrast to topas, I’ve never seen
any performance-type overhead with this utility.

Using nmon for Historical Analysis
First, we’ll tell nmon to create a file, name the run, and do data collection
every 30 seconds for one hour (120 intervals):
# ./nmon -f -t -r test3 -s 30 -c 120
AIX version 5.3.0.0 and starting up nmon nmon_aix5

When monitoring is completed, we’ll sort the file:
# sort -A p682e_pub_071224_1411.nmon > lpar30p682e_pub_071224_411.csv

Now, we can FTP the spreadsheet to a PC and open it up. Start the nmon
analyzer, and click on Analyze nmon data. Enter the location of the file,
wait about 20 seconds, and you’ll see your nmon data in all its glory!
Figure 5.1 shows some sample output from the nmon analyzer.

Figure 5.1: Sample nmon analyzer output

The nmon analyzer is an awesome tool, written by Stephen Atkins, that
graphically presents data (CPU, memory, network, or I/O) from an Excel

Chapter 5: CPU: Monitoring

spreadsheet. Perhaps the only drawback that prevents it from being perceived as an enterprise type of tool is that it lacks the ability to gather
statistics about large numbers of LPARs at once (although it now has a
partition-viewing capability similar to that of topas). The analyzer is not
a database, nor was it meant to be. That is where a tool such as Ganglia
helps; this utility has actually received the blessing of Nigel Griffiths as the
tool that can integrate nmon analysis.
You can download the nmon analyzer for free from http://www.ibm.
com/developerworks/aix/library/au-nmon_analyser. For more information
about Ganglia, see http://ganglia.info.

ps (Unix-generic)
ps [-ANPaedfklmMZ] [-n namelist] [-F Format] [-o
specifier[=header],...] [-p proclist][-G|-g grouplist] [-t
termlist] [-U|-u userlist] [-c classlist] [ -T pid] [ -L pidlist]
ps [aceglnsuvwxU] [t tty] [processnumber]

The ps command shows the current status of processes. Upon viewing the
syntaxes shown above, the first question you may have is, why the two
sets of usage parameters? To make a long story short, the answer has to do
with the basic history of Unix — the old Berkeley versus System V (now
referred to as X/Open Standards) wars. As we discussed in Chapter 2, AIX
is a hybrid of sorts, and it contains both flavors of Unix. Most of you are
probably more familiar with the X/Open Standards usage of ps (e.g., ps
–ef), which is the first usage shown above.
How can you best use ps in CPU systems monitoring? In other words,
how can you identify processes that are taking an inordinate amount of
CPU time? If you can find these processes, you can take action on them. I
like using the Berkeley syntax better here; the information it provides is in
a nicer, more presentable format. Let’s look at ps ux, which displays the
CPU execution time of processes:
# ps ux | more
USER
root

PID %CPU %MEM
8196 0.1 0.0

SZ
384

RSS
384

TTY STAT
A

STIME
08:45:25

TIME COMMAND
1:02 wait

tprof

root
root
root
root
root
root
root
root
root
root
root

53274
86118
299158
69666
0
57372
61470
286880
258190
151642
233606

0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

0.0 384 384
0.0 504 512
0.0 472 500
0.0 960 960
0.0 384 384
0.0 384 384
0.0 384 384
0.0 900 928
0.0 1216 1216
0.0 512 512
0.0 840 956

A
A
A
A
A
A
A
A
A
A
A

08:45:25
08:45:27
08:45:44
08:45:25
08:45:25
08:45:25
08:45:25
08:45:44
08:45:35
08:45:27
08:45:44

0:30
0:08
0:06
0:04
0:04
0:02
0:02
0:01
0:01
0:01
0:00

wait
/usr/sbin/syncd
/usr/sbin/getty
gi
swappe
wait
wait
/usr/bin/xmwlmrpc.lock
rtcmd
/usr/sbin/sshd

This ps command uses two key parameters:
●

●

— Displays user-oriented output about each process: the USER
(user), PID (process ID), %CPU (CPU time used), %MEM (memory
used), SZ (size of process core image), RSS (resident set size), TTY
(controlling terminal name), STAT (process state), STIME (start
time), TIME (total run time), and COMMAND (executed command)
fields.
u

x — Displays processes without a controlling terminal in addition
to processes with a controlling terminal. To see processes that don’t
include daemons, substitute a for x.

For our purposes, the most important field of the ps output is %CPU. This
field reports the percentage of CPU time that the process has used since it
started.

Tracing Tools
Tracing tools come in handy when you want to drill down further to analyze processes that are causing bottlenecks. Among these tools are curt,
splat, tprof, trace, and trcrpt. We’ll use the tprof and trace tools here.

tprof
tprof [ -c ] [ -C { all | cpuidslist } ] [ -d ] [ -D ] [ -e ]
{ [ -E { ALIGNMENT | EMULATION | ISLBMISS | DSLBMISS | PM_<event> } ]
[ -f interval ] } [ -F ] [ -j ] [ -J profilehook ] [ -k ] [ -l ]

Chapter 5: CPU: Monitoring

[ -L objectslist ] [ -m objectslist ] [ -M sourcepathlist ]
[ -p processlist ] [ -P { all | pidslist } ] [ -s ]
[ -S searchpathlist ] [ -t ] [ -T buffersize ] [ -u ] [ -v ]
[ -V verbosefilename ] [ -I ] [ -N ] { [-z] [-Z] | -R }
{ { -r rootstring } [ -X { xmloptions } ] |
{ { [ -A { all | cpuidslist } ] [-n] } [ -r rootstring ] -x command }
}

The tprof command reports CPU usage for both individual programs and
the system as a whole. The output provides an estimate of the amount of
CPU time spent for each process that was executing while tprof was running. It also contains an estimate of the amount of CPU time spent in each
of the kernel address spaces: the kernel address space, the user address
space, and shared library address spaces.
You can use tprof to view a basic global program and thread-level summary by running the command in the following fashion:
# tprof -x sleep 20
Mon Dec 24 18:55:54 2
System: AIX 5.3 Node: lpar30p682e_pub Machine: 00CED82E4C0
Starting Command sleep 2
stopping trace collection.
Generating sleep.prof
root@lpar30p682e_pub[/]

Let’s view the file (sleep.prof) that we just created:
# more sleep.prof
Configuration information
=========================
System: AIX 5.3 Node: lpar30p682e_pub Machine: 00CED82E4C00

Next, let’s use the trace command to run a manual trace:

time

/usr/bin/trace -ad -M -L 109113753 -T 500000 -j
000,00A,001,002,003,38F,005,006,134,139,5A2,5A5,465,234, -o Total Samples = 1088
Traced Time = 20.02s (out of a total execution time of 20.02s)
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Process
Freq Total Kernel
User Shared Other
=======
==== ===== ======
==== ====== =====
wait
4 99.82 99.82
0.00
0.00
0.00
swapper
1
0.09
0.09
0.00
0.00
0.00
/usr/bin/tprof
1
0.09
0.00
0.00
0.09
0.00
Total
6 100.00 99.91
0.00
0.09
0.00
Process
PID
TID Total Kernel
User Shared Other
=======
===
=== ===== ======
==== ====== =====
wait
8196
8197 44.58 44.58
0.00
0.00
0.00
swapper
0
3
0.09
0.09
0.00
0.00
0.00
/usr/bin/tprof
418000
688307
0.09
0.00
0.00
0.09
0.00
=======
Total

===

===== ======
100.00 99.91

==== ======
0.00
0.09

=====
0.00

The tprof command is an excellent tool for identifying runaway processes
because these processes appear at the top of the output list.

Timing Tools
Two tools, time and timex, provide access to information about command
execution time.

time
time [ -p ] Command [ Argument ... ]

The time command returns the total execution time of your program,
including real time, user time, and system time. This information can be
useful when you’re trying to figure out the amount of time it takes for commands to execute. time works by counting the CPU ticks from the time the
command was first started until the time it ends:
# time find ./ -depth 1>/dev/null
real
user
sys

0m23.30s
0m0.22s
0m2.10s

Chapter 5: CPU: Monitoring

timex
timex [ -s ][ -o ][ -p [ -fhkmrt ] ] cmd

Without any flags, the timex command provides the same type of information as time, but with a prettier view. Used with the –s flag, it summarizes
all system activity while the command is being executed. This spares you
the task of starting up a sar or vmstat process while running a timing. For
this reason alone, I like to use timex, and I’ve found it a very useful tool
through the years.
# timex -s find ./ -depth 1>/dev/null
real 21.69
user 0.20
sys

AIX lpar30p682e_pub 3 5 00CED82E4C00
12/26/07
System configuration: lcpu=4 ent=0.40 mode=Uncapped
08:40:08
%usr
%sys
%wio
%idle
physc
%entc
08:40:30
5
33
0
62
0.17
43.2
System configuration: lcpu=4 ent=0.40 mode=Uncapped
08:40:08 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
08:40:30
0
0
0
0
0
0
0
0
System configuration: lcpu=4 mem=3072MB ent=0.40 mode=Uncapped
08:40:08
slots cycle/s fault/s odio/s
08:40:30 392358
0.00
18.11
0.00
System configuration: lcpu=4 ent=0.40 mode=Uncapped
08:40:08 rawch/s canch/s outch/s rcvin/s xmtin/s mdmin/s
08:40:30
0
0
0
0
0
0
System configuration: lcpu=4 ent=0.40 mode=Uncapped
08:40:08 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
08:40:30
19659
8
5522
0.14
0.18
12407 308149
System configuration: lcpu=4 ent=0.40 mode=Uncapped
08:40:08 cswch/s
08:40:30
5617
System configuration: lcpu=4 ent=0.40 mode=Uncapped
08:40:08 iget/s lookuppn/s dirblk/s
08:40:30
0
8513
0

timex

System configuration: lcpu=4 ent=0.40 mode=Uncapped
08:40:08 runq-sz %runocc swpq-sz %swpocc
08:40:30
1.3
95
System configuration:
08:40:08 proc-sz
08:40:30 68/262144

mode=Uncapped
inod-sz
file-sz
0/170
387/1124

thrd-sz
219/524288

System configuration: lcpu=4 ent=0.40 mode=Uncapped
08:40:08
msg/s sema/s
08:40:30
0.00
0.00

C h a p t e r

CPU: Tuning

This chapter identifies the AIX tools you’ll use to help resolve CPU system
bottlenecks and improve performance. Notice that I didn’t use the word
“tune” in the preceding sentence. You’ll find that improving CPU performance is less about tuning and more about improving workload utilization
and managing processes and threads more efficiently.

Process and Thread Management
A junior administrator might consider process management as little more
than monitoring active processes and killing zombie and/or runaway processes. In reality, there is a lot more to it than that.
Let’s start by addressing a fundamental question: how do processes relate
to threads? The answer is simple. While the process is the entity that AIX
uses to control the use of system resources, the threads control the actual
time consumption because each kernel thread is a single sequential flow
of control. Each process is made up of one or more threads. Controlling
thread use is where you can make a difference. To do this, you need to
understand the tools that let you work with threads to improve CPU performance. Although AIX Version 4 introduced the use of threads to control
processor time consumption, it was in AIX 5L that system management
tools really evolved to help monitor and analyze thread usage.

Chapter 6: CPU: Tuning

nice
nice [-n Increment] Command [Argument...]
nice [-Increment] Command [Argument...]

The nice command lets you adjust the priority of a given process. The
default value for processes is 20, except for korn shell (ksh) background
processes, which are set to 24. With nice, the larger the increment number
you specify, the lower the priority.
You can use the ps command with the –l (lowercase L) flag to view your
information. The NI column shows the nice value for each process:

nice

# ps -l | more
F
240001
200005
200001

S
UID
PID
PPID
A 20004773 90156 164038
A
0 376960 90156
A
0 409730 376960

C PRI
0 60
2 20
0 60

NI
ADDR
SZ WCHAN TTY TIME CMD
20 30448400 724
pts/0 0:00 ksh
60
45c400 736
pts/0 0:00 ksh
20
446400 724
pts/0 0:00 ps

Let’s start a new ksh with nice:
# nice -10 ksh
# ps -l | more
F
240001
200005
200001

S
UID
PID
PPID
A 20004773 90156 164038
A
0 311534 376960
A
0 376960 90156

C PRI NI ADDR
SZ WCHAN TTY TIME CMD
0 60 20 30448400 724
pts/0 0:00 ksh
1 80 30 48376400 688
pts/0 0:00 ksh
0 60 20 6045c400 736
pts/0 0:00 ksh

The preceding output shows that the priority of the new process (PID
311534) has been added and changed from its default. The child process
that was forked from the process is also shown.
Watch the nice syntax — it can be a little confusing. The minus sign
(–) identifies the increment value, which is assumed to be positive. To
specify a negative increment, you must use two minus signs, with no
spaces in between. When you use the renice command (covered next), the
parameter following the command name is the value, whether it is positive
or negative.

renice

renice
renice [ -n Increment ] [ -g|-p|-u ] ID . . .

The renice command dynamically reassigns a priority to a running process. Using renice can cause the system to assign either a higher or a lower
priority to a given process. When you use renice, you actually change the
value of the priority of a thread (default value of 40) by changing the nice
value of its process.
Assume that the following processes are currently running:
# ps -l
F
240001
200005
200001
200001

S
UID
PID
PPID
A 20004773 90156 164038
A
0 311534 376960
A
0 376960 90156
A
0 417842 311534

C PRI NI ADDR
SZ WCHAN TTY TIME CMD
0 60 20 30448400 724
pts/0 0:00 ksh
0 80 30 48376400 688
pts/0 0:00 ksh
0 60 20 6045c400 736
pts/0 0:00 ksh
3 81 30 30468400 732
pts/0 0:00 ps

Let’s increase the priority of a thread by changing the nice value for the
process that contains it:
# renice -10 376960
# ps -l
F
240001
200005
200001
200001

S
UID
PID
PPID
A 20004773 90156 164038
A
0 311534 376960
A
0 376960 90156
A
0 417842 311534

C PRI NI ADDR
0 60 20 30448400
0 80 30 48376400
0 50 10 6045c400
3 81 30 30468400

SZ WCHAN TTY TIME CMD
724
pts/0 0:00 ksh
688
pts/0 0:00 ksh
736
pts/0 0:00 ksh
732
pts/0 0:00 ps

It’s important to note that when not run as root, renice has some limitations. Without the protection of root, only processes by the current user ID
can be changed. In addition, you cannot increase nice values after making
a prior one less favorable.

Chapter 6: CPU: Tuning

ps
In the preceding chapter, we looked at the ps command and how you can
use it to monitor CPUs. You’ll find that ps is one of the most versatile
commands in Unix. Specifying it with the –mo flag gives you a granular
look at your threads:
# ps -mo THREAD
USER
u0004773

PID

PPID

TID ST CP PRI SC WCHAN

TT BND COMMAND

- A

240001

pts/0

933995 S

10400

root 311534 376960

- A

200005

pts/0

90156 164038
-

823311 S

10400

root 376960

90156

- A

200001

pts/0

835591 S

10400

root 409778 311534

- A

200001

pts/0

880775 R

400010

- -ksh
- - ksh
- - -ksh
- - ps -mo THREAD
- -

The TID column lists the thread ID, while the BND column shows the
processes and threads bound to a processor. Why do you need to know
this information? Because you can actually change the priority of threads,
globally. To do so, you modify the CPU scheduling parameters (using the
schedo command) that calculate the priority for each thread.

schedo
schedo -h [tunable] | {-L [tunable]} | {-x [tunable]}
schedo [-p|-r] (-a | {-o tunable}
schedo [-p|-r] (-D | ({-d tunable} {-o tunable=value})

The schedo command manages the CPU scheduler tunable parameters; it
can be used only with root. Similar to other tunable commands (e.g., vmo),
schedo can make immediate changes or can defer the changes until the
next reboot, depending on the flags you use. Use of the –p flag causes the
changes to take effect at the next reboot.
First, let’s display the existing scheduling parameters by using schedo with
the –L flag:

schedo

# schedo -L
NAME

CUR

DEF

BOOT

MIN MAX

%usDelta

100

UNIT

TYPE DEPENDENCIES

affinity_lim

100

dispatches

allowMCMmigrate

boolean

big_tick_size

100

10 ms

ded_cpu_donate_thresh

100

% busy

fixed_pri_global

boolean

force_grq

boolean

hotlocks_enable

boolean

idle_migration_barrier 4

100

sixteenth

krlock_confer2self

boolean

krlock_conferb4alloc

boolean

krlock_enable

boolean

krlock_spinb4alloc

2G-1

krlock_spinb4confer

2G-1

maxspin

16K

4G-1

n_idle_loop_vlopri

100

976K

pacefork

2G-1

clock ticks D

sched_D

sched_R

search_globalrq_mload

256

4095M

search_smtrunq_mload

256

4095M

setnewrq_sidle_mload

384

4095M

shed_primrunq_mload

4095M

sidle_S1runq_mload

4095M

134

4095M

134

4095M

4095M 4095M 4095M 0

4095M

D
D
spins

D
D

sidle_S2runq_mlo
sidle_S2runq_mload

sidle_S1runq_mload
sidle_S3runq_mloa
sidle_S3runq_mload

134

sidle_S2runq_mloa
sidle_S4runq_mload
sidle_S4runq_mload

sidle_S3runq_mload
slock_spinb4confer

2G-1

smt_snooze_delay

-1

97656K microsecs

smtrunq_load_diff

4095M

tb_balance_S0

ticks

Chapter 6: CPU: Tuning

tb_balance_S1

ticks

tb_threshold

100

1000

ticks

timeslice

2G-1

clock ticks D

unboost_inflih

boolean

v_exempt_secs

2G-1

seconds

v_min_process

2G-1

processes

v_repage_hi

2G-1

v_repage_proc

2G-1

v_sec_wait

2G-1

seconds

vpm_xvcpus

-1

2G-1

processors

----------------------------------------------------------------------------n//a means parameter not supported by the current platform or kernel
Parameter types:
S = Static: cannot be changed
D = Dynamic: can be freely change
B = Bosboot: can only be changed using bosboot and reboot
R = Reboot: can only be changed during reboot
C = Connect: changes are only effective for future socket connection
M = Mount: changes are only effective for future mountings
I = Incremental: can only be incremented
d = deprecated: deprecated and cannot be changed
Value conventions:
K = Kilo: 2^10

G = Giga: 2^30

P = Peta: 2^5

M = Mega: 2^20

T = Tera: 2^40

E = Exa: 2^60

You can also display these parameters using the –a flag, although the information given is far less meaningful.

sched_R and sched_D
The sched_R and sched_D scheduling parameters relate to process priority
calculations. The scheduler’s priority calculations are based on sched_R
and sched_D, values that are expressed in thirty-seconds (1/32). I won’t
bore you here with the complex algorithms associated with these parameters. The net of it is that lowering sched_R has the effect of helping the
scheduler distinguish between background processes and processes running as interactive foreground processes, thereby enabling it to assign a
greater priority to foreground processes. The following example lowers
sched_R from its default value of 16 to 5:
# schedo -o sched_R=5
Setting sched_R to 5

timeslice

fixed_pri_global
When a CPU is ready to dispatch a thread, the system checks the global run
queue before any of the others. When the thread completes its running slice
on the CPU, it gets put back on the queue, which helps maintain something
called processor affinity. Processor affinity is defined as the probability of
dispatching a thread to a processor that previously executed it. To improve
overall thread performance, you can enable an environment variable called
RT_GRQ, which is set to off by default. Turning on RT_GRQ automatically
places the thread on the global run queue. All fixed priority threads will be
placed on the run queue if you change the default from 0 to 1.
Let’s use schedo to change the default value of fixed_pri_global:
# schedo -o fixed_pri_global=1
# schedo -a | grep fixed_pri_global
fixed_pri_global = 0
# schedo -o fixed_pri_global=1
Setting fixed_pri_global to 1
# schedo -a | grep fixed_pri_global
fixed_pri_global = 1

The actual priority of the user processes varies over time, depending on the
amount of overall CPU time that a process has used most recently. Please
note that in some instances, this variable should be turned of because it can
impact SMT performance. Make sure that you test this in your environment to determine what works best for your application.

timeslice
Perhaps the most important schedo parameter is timeslice. This setting
represents the largest number of clock ticks that a thread can be in control
of before facing the possibility of being replaced by another thread. In
some cases, increasing the timeslice can improve system throughput by
reducing context switching.
Before changing the timeslice setting, make sure you run vmstat (or sar)
enough to determine whether there really is a considerable amount of

Chapter 6: CPU: Tuning

context switching going on. If there is, the overhead of dispatching threads
is more costly than letting them run for longer slices.
The following example increases the timeslice from 1 to 2:

# schedo -p -o timeslice=2
Setting timeslice to 2 in nextboot file
Setting timeslice to 2

In this case, we’ve also used the –p flag, which saves the parameter on a
reboot.

bindprocessor
bindprocessor { -q|-u ProcessID|-s SmtSetID|-b BindId|ProcessID
[ProcessorNum] }

CPU binding lets processes run on a specific processor, a capability that
relates to the processor affinity concept I defined earlier. Binding threads
to specific processors has many purposes; for example, you might bind
threads to a given processor to find the root cause of a hanging program.
More commonly, the technique is used when you’re trying to spread
around the wealth of a system — in a symmetric multiprocessing (SMP)
box, for example. To display the available (logical) processors on your
box, you would use the –q flag:

# bindprocessor -q
The available processors are:
# CPU binding

0 1 2

Assuming that symmetric multithreading (SMT) is enabled (it is by default), each and every hardware thread of the physical processor is listed
as a separate processor when you run the bindprocessor command. On
POWER5 chips, two hardware threads exist on each processor. With
shared processor logical partitions (LPARs), using this command binds
to virtual CPUs, so you must be careful because problems can result for

smtctl

applications that are predisposed to run on a specific CPU. If you want to
bind a process to a particular CPU, it’s as simple as running this command:
# bindprocessor 12769 3

This example assigns process ID (PID) 12769 to logical CPU 3.

smtctl
smtctl [ -m off|on [ -w boot|now ] ]

The smtctl command (introduced in AIX 5.3) displays SMT information.
To determine whether SMT is enabled, you simply run the command without any flags:
# smtctl
This system is SMT capable
SMT is currently enabled
SMT boot mode is not set
SMT threads are bound to the same virtual processor.
proc0 has 2 SMT thread
Bind processor 0 is bound with proc
Bind processor 1 is bound with proc
proc2 has 2 SMT threads.
Bind processor 2 is bound with proc2
Bind processor 3 is bound with proc2

System performance usually increases about 30 percent when SMT is
enabled, so you almost always want to activate this functionality. Processor affinity also occurs naturally. When a thread is running on a CPU and is
interrupted, it usually is placed back on the same CPU because the processor’s cache might still have lines belonging to the thread. If the thread were
to be dispatched to a different CPU, it might have to obtain information
from RAM, which would slow processing time dramatically. You can also
bind threads using subroutines, although I advise caution if you attempt
to do so. This technique binds all kernel threads in a process to a processor, which has the effect of forcing these threads to be run on that specific
processor until they are unbound.

Chapter 6: CPU: Tuning

gprof
/usr/ccs/bin/gprof [-b] [ -c [filename] ] [-e Name] [-E Name] [-f
Name] [-g filename] [-i filename] [-p filename] [-F Name] [-PathName]
[-s] [-x [filename]] [-z] [a.out [gmon.out ...]]

The gprof command, used during programming, produces an execution
profile of your compiled programs in C, Fortran, Pascal, or even Cobol.
The command reports on flow control through all the subroutines of your
program and tells you the amount of CPU time each subroutine consumed.
This information is useful when you’re troubleshooting how processes use
CPU resources. You can use gprof to profile your program and determine
which functions are using the CPU. The profile data is taken from the call
graph profile file (gmon.out by default).
AIX 5.3 lets you assign a user-specified name to the profiling output files
by setting special environment variables. Version 5.3 also provides additional profiling support for threads and new options that affect the type of
profiling data collected.

Section II
Summary, Tips, and Quiz

Summary
●

●

CPU monitoring tools you can use include iostat, lparstat, mpstat,
nmon, sar, topas, vmstat, and w.
Tracing tools include curt, splat, tprof, trace, and trcrp.
The nice and renice commands are important utilities that can help you
prioritize your processes and treads.
is the command used to manage the CPU scheduler’s tunable
parameters.
schedo

●

The smctl command is used for symmetric multithreading (SMT).

●

is an extremely versatile command that can help you identify process
hogs, thread utilization, and nice priorities.

and mpstat are important performance tools you should use in
a partitioned environment.
lparstat

Tips
●

●

Identifying workload is of paramount importance to improving CPU utilization. Running jobs and processes during off-peak hours, using cron
and/or other third-party types of scheduling tools (e.g., IBM’s Workload
Manager, CA’s AutoSys) can make a big difference in performance.
Users usually will assume that your systems bottleneck is with the CPU,
but more often than not the problem is either memory- or I/O-related.
Tune a subsystem only when you’re certain of the diagnosis.

Section II: Summary, Tips, and Quiz

●

Before making any changes to production systems, make the changes
to either your test or development environment first so you can analyze
their effect. This advice is particularly important when using the schedo
command for AIX CPU tuning.
Once you’ve determined that you’re experiencing a CPU bottleneck,
adding CPUs is always an option. With dynamic logical partitioning
(DLPAR), this solution is much easier to accomplish than it used to be
because you can just add or subtract CPUs dynamically. Tools such as
the DLPAR toolset and Partition Load Manager (PLM) can automate
the process, letting you add or subtract CPUs to or from your partition
based on variables you’ve already identified. Uncapping partitions in a
virtual environment can also alleviate CPU bottlenecks.
Using nmon, ps, tprof, or any number of other tools, you might have
identified processes that are hogging CPU time. If you question whether
these processes are necessary, try contacting the process owners (if possible). You may find out that you can kill the processes. If you’re told
you can do so, be sure to kill them using kill -1 and not kill -9. Also, be
careful about zombie processes that can be created when you kill parent
processes and leave their children alone. It may not sound proper, but
make sure the children are dead, too; otherwise, you’re at risk for runaway and/or zombie processes.
Starting with the POWER5, SMT is built into the POWER architecture.
This capability provides two independent threads of instruction execution for each processor. Enabling SMT makes one processor appear as
two processors on the partition. Always make sure SMT is enabled (by
running the smtctl command), except where an ISV explicitly states
that it is not recommended. SMT’s performance gain depends on many
variables, each of which you should analyze carefully. SMT is bestsuited for multithreaded, I/O-intensive applications. It is not a good fit
for numerically intensive workloads.
Use tools such as nmon (and the nmon analyzer) or topas to store
historical performance data for trending and analysis. Don’t wait to use
these tools until you have a problem. You should be using them when
you are first in production with your system.
Other IBM utilities are available that don’t come standard with AIX and
have a cost associated with them, including:

Multiple Choice

❍

Performance Toolbox (PTX), which as of AIX 5.3 includes the
procmon utility (used for process management)

❍

IBM Tivoli Monitoring System Edition for System p for AIX 5L V5

❍

PM for System p, an IBM Global Technology Services offering

Quiz
Multiple Choice
1. Which iostat flag reports AIO information?
a. –A
b. –a
c. –v
d. –e
2. Which topas flag reports all partitioned information in your managed
system?
a. –c
b. –L
c. –j
d. –p
3. Which tool is best used to monitor performance numbers for all logical
CPUs on a partitioned system?
a. iostat
b. lparstatc
c. mpstat
d. lvms

Section II: Summary, Tips, and Quiz

4. Which ps flag reports thread information?
a. –a
b. –u
c. –ef
d. –mo
5. Which of the following is not an example of a trace tool?
a. lprof
b. tprof
c. curt
d. splat
6. timex –s reports the total execution time of the program as well as the
a. Number of referenced inodes
b. Percentage of blocked processes
c. Number of threads
d. Summary of systems activity
7. Given the following results of the vmstat command, are you experiencing a CPU bottleneck?
# vmstat 2
System configuration: lcpu=4 mem=3072MB ent=0.
kthr

memory

----- -----------avm

fre

page
----------------------

faults

cpu

------------

----------------------

4 128826 641397

448

87 138

us sy id wa
24

1 40 35

0.01

2.8

7 128826 641397

385

10 136

35 14 20 31

0.01

2.2

7 128826 641397

381

13 138

4 20 41

0.01

2.2

4 128826 641397

364

40 138

40 17 16 27

0.01

Fill in the Blank(s)

a. Yes
b. No
c. Maybe
d. Not enough information

True or False
8. With nice, the larger the number, the lower the priority.
9. Lowering the schedo parameter sched_R has the effect of giving a higher preference to foreground processes than to background
processes.

Fill in the Blank(s)
10. Define processor affinity:
__________________________________________

Section III
Memory
This section provides an overview of the AIX Virtual Memory Manager
and other important memory-related concepts, including how to monitor
and tune your virtual memory. We also discuss best practices for virtual
memory monitoring, analysis, and tuning, given the various considerations
that can impact performance.

C h a p t e r

Memory: Introduction

What, exactly, is involved in memory performance tuning? As a systems
administrator, you’re probably already familiar with the basics of memory,
such as the differences between physical and virtual memory. What we’ll
be discussing here is how the Virtual Memory Manager (VMM) works in
AIX and how it relates to overall systems performance. We’ll also review
some of the more important recent enhancements.
Let me reiterate that regardless of which subsystem you want to tune, you
should always think of the process as an ongoing one. Start monitoring
your system as soon as you put it into production and have it running well,
rather than when users are screaming about slow performance. Review
Chapter 1 on tuning methodology. I’m not saying that you must follow that
specific methodology, but without a plan, you won’t succeed in optimizing
the performance of your environments. Further, be sure to make only one
change at a time unless otherwise noted (as when changing related parameters, such as minperm% and maxperm%). In addition, capture and analyze
data as quickly as possible after making a change to determine what difference, if any, the change has really made.

Virtual Memory Manager
AIX newbies are sometimes surprised to hear that the Virtual Memory
Manager (VMM) services all memory requests from the system, not
just virtual memory. When the system accesses random access memory
(RAM), the VMM needs to allocate space, even when plenty of physical

Chapter 7: Memory: Introduction

memory is left on the system. It implements a process of early allocation
of paging space. Using this method, the VMM plays a vital role in helping
manage real memory, not just virtual memory. In AIX, all virtual memory
segments are partitioned into pages, with a default page size of 4K. Because virtual memory consists of real memory and paging space, allocated
virtual memory segments can be either RAM or paging space (virtual
memory stored on disk).
This is an important concept to understand, so read that last paragraph at
least twice.
VMM also maintains what is referred to as a free list, which is defined as
unallocated page frames. These are used to satisfy page faults. There are
usually a very few unallocated pages (which you configure) that the VMM
uses to free up space and reassign the page frames to. The VMM then
selects the virtual memory pages (whose page frames are to be reassigned)
using its page replacement algorithm. The paging algorithm determines
which virtual memory pages currently in RAM ultimately have their page
frames brought back to the free list. AIX uses all available memory, except
that which is configured to be unallocated — the free list.
To reiterate, the purpose of VMM is to manage the allocation of both
RAM and virtual pages. VMM’s objectives are to help minimize both the
response time of page faults and the use of virtual memory where it can.
Given the choice between RAM and paging space, the preference is to use
physical memory — if the RAM is available.
VMM also classifies virtual memory segments into two distinct categories, which are critical for you to understand. This concept is the most
important to grasp, and I’ll admit that when I first started working with
AIX, it took me a while to fully understand the concept and the tuning recommendations (which we’ll discuss later) behind it. The two
categories are working segments using computational memory and persistent segments using file memory. Simply put, without fully grasping
these concepts, you won’t be able to tune your systems to their optimum
capabilities.

Paging and Swapping

Computational Memory
Computational memory is used while your processes are actually working
on computing information. These working segments are temporary (transitory) and exist only up until the time a process terminates or the page is
stolen. They have no real permanent disk storage location. When a process
terminates, both the physical and paging spaces are released in many cases.
When a large spike occurs in available pages, you can actually see this
happening while monitoring your system.
In the world of virtual memory, when free physical memory starts getting
low, programs that have not been used recently are moved from RAM to
paging space to help release physical memory for more real work. Remember, virtual memory consists of real and paging space; it is not just paging
space. The most important point to remember about computational memory is that when the system pages, you do not want it to page out computational memory; your preference is file memory.

File Memory
File memory (unlike computational memory) uses persistent segments (not
working segments), and it has a permanent storage location on the disk.
Data files or executable programs are mapped to persistent segments rather
than to working segments. The data files can relate to file systems, such as
the Journaled File System (JFS), Enhanced Journaled File System (JFS2),
or Network File System (NFS). These files remain in memory until the
time when a file is unmounted, a page is stolen, or a file is unlinked. After a
data file is copied into RAM, VMM controls when these pages are overwritten or used to store other data. Given the alternative, you would much
rather have file memory paged to disk than computational memory.

Paging and Swapping
When a process references a page on disk, the page must be paged in,
which could cause other pages to page out again. VMM is constantly
working in the background, stealing frames that have not been recently
referenced using the page replacement algorithm. It also helps detect
thrashing, which can occur when memory is extremely low and pages are
constantly being paged in and out to support processing. VMM actually

Chapter 7: Memory: Introduction

has a memory load control algorithm, which can detect whether the system
is thrashing and actually tries to remedy the situation. Unabashed thrashing
can literally cause a system to come to a standstill, as the kernel becomes
so concerned with making room for pages that it just can’t do anything
productive.
What about swapping? Although the terms are often used interchangeably, there is a subtle difference between paging and swapping. As we’ve
discussed, with paging, only parts of the process are moved back and forth
between disk and RAM. When swapping occurs, you are moving entire
processes back and forth. For this to happen, AIX would need to suspend
the entire process before moving it to paging space. It could then only continue to process when the process was swapped back into RAM at a later
time. The difference that is not subtle is this: while paging is often okay,
swapping is a very bad thing.

VMM Tuning Evolution
Before AIX 5L, you would have used the vmtune command to tune your
VMM system. Although the vmo command came around in AIX 5.2,
vmtune actually hung around until AIX 5.3. With AIX 5.3, vmtune is no
more. Although most of the actual parameters are the same (and remain
the same in AIX 6.1), there are some fundamental changes in the recommended tuning parameters. (AIX 5.3 also does away with the schedtune
command, whose function is now performed by schedo.)
One important change in AIX 5L relates to page frames. Starting with the
POWER4 processor, AIX supported up to 16MB page sizes. The POWER5
chip supports four virtual memory page sizes: 4K, 64K, 16 MB, and 16
GB. With a simple vmo change here that reflects these sizes, you can actually tune the system to provide for large page usage, which can improve
system performance substantially in very memory-intensive application.
The recommendations for the minperm and maxperm settings have also
changed substantially. Furthermore, starting with AIX 5.2, we no longer
save our tunables in rc.tune but in /etc/tunables.

C h a p t e r

Memory: Monitoring

As with CPU monitoring, the AIX systems administrator has a myriad
tools at his or her disposal when tuning the Virtual Memory Manager
(VMM). Some of the tools are Unix-generic, while others are AIX-specific.
We’ll discuss these tools in the context of real performance issues and what
you can do to address them. IBM enhanced the following tools in AIX
5.3 to allow more accurate statistics on shared partitions using APV: sar,
topas, and vmstat.
Suppose that while you’re surfing the Internet and enjoying your coffee,
one of the DBAs knocks on the side of your cubicle (why Unix administrators never get an office, I’ll never know) and informs you that “We have
a real memory problem.” Although your first reaction might be to dismiss
the suggestion entirely (do you tell the DBA that the indexes need rebuilding?), the first thing I would do is ask why the person came to this conclusion. The more information you have at your disposal, the more effective
you’ll be in your efforts to resolve the alleged bottleneck.
More than likely, the DBA used a graphical tool such as nmon or topas that
indicated that real memory was low. This is a common event. However, one
of the biggest incorrect assumptions is concluding that you have a memory
problem because real memory is low. On the contrary, we want real memory
to be low — because that means we’ve sized the system properly.
So, where do we begin to troubleshoot this issue? If you’ve read the CPU
monitoring chapter, you’ll know that I like to start with vmstat.

Chapter 8: Memory: Monitoring

vmstat (Unix-generic)
vmstat [-fsviItlw] [[-p|-P]

pagesize|ALL] [Drives] [Interval [Count]]

In Chapter 6, we used vmstat to monitor CPU. In this chapter, we’ll look
at how to use this command for virtual memory analysis, which was actually the intended purpose of the tool (remember the “vm” in vmstat).
Here’s a summary of the relevant output fields:
●

●

r — Average number of runnable kernel threads over a sampling
interval, which you specify when running the command. “Runnable”
includes threads that are ready but are waiting to run as well as those
that are already running. I start to become concerned when this number is three or four times greater than the number of processors on
the system.
b — Average number of kernel threads placed in the VMM wait
queue that are waiting on I/O. This is an extremely important field; if
these numbers are higher than r (runnable processes), that is usually
symptomatic of I/O problems. Watch this field very carefully!
avm — Contrary to what most people think, this field does not report
the average memory. It shows the number of active virtual pages —
the sum of virtual and real memory pages (remember this concept).
Each page is 4,096 bytes.
fre — Size of the free list. It’s important to note that you shouldn’t
concern yourself too much if these numbers look really low, because
a large part of RAM is used as a cache for file system data. Applications people will always point out this field to you and say, “There is
no more memory left on the system.” If no bottlenecks are occurring,
it is just not a problem.
re — Pager input/output list
pi — Number of pages paged in from paging space. This field becomes populated when there are lots of processes starting up, which
can occur during a CPU or memory bottleneck.
po — Number of pages paged out to paging space. If the numbers are
high here, paging is occurring, which can certainly signify a memory
bottleneck.

vmstat (Unix-generic)

●

fr — Number of pages freed (page replacement)

●

sr —Number of pages scanned by the page replacement algorithm

●

cy — Number of clock cycles executed by the page replacement
algorithm

●

in — Device interrupt

●

sy — System calls

●

cs — Kernel thread context switches

The following output is a snapshot of a very well-behaved system. This
system is easily handling the number of runnable processes; there are no
blocked processes, no paging going on, nor any waiting on I/O. I love this
system.
# vmstat 2 5
System configuration: lcpu=4 mem=3072MB ent=0.40
kthr

memory

-----

page

faults

------------- ---------------------avm

fre

cpu

---------in sy

----------------------

173838 576044

365 87 144

us sy id wa
0

1 99

0.01

2.4

173837 576045

297 13 149

1 99

0.01

1.9

173838 576044

329 37 143

1 99

0.01

2.2

173838 576044

337 10 143

1 99

0.01

2.0

173838 576044

364 13 143

1 99

0.01

2.1

Here is a snapshot of the system that the DBA was looking at:
# vmstat 2 5
System configuration: lcpu=4 mem=3072MB ent=0.40
kthr
----r

memory
---------avm

fre

page

faults

--------------------re pi

cpu

-------------in

-------------------us sy id wa

4 19

173838 123

92 104 208 417

365 12001 6004

4 30 54

2.4

9 36

173837 567

53 109 229

297 17002 8124

9 12 58

1.9

2 19

173838 191

0 22 127 140 287 567

329 18229 9374

41 23

2.1

2 34

Chapter 8: Memory: Monitoring

At first glance, it appears that there are memory problems. In this case,
we’re looking at the free (fre) list because there is paging going on. If there
were not, I wouldn’t even give the low numbers a second glance. Oftentimes, one bottleneck will be the cause of another.
In this case, it appears that significant I/O problems are causing other
bottlenecks to occur. There are many blocked processes (b); and the wait
time (wa) in the CPU section is also extremely high. Preliminary analysis
shows us that the system just cannot keep up with the workload. The CPU
can’t work hard because of the I/O problems. The paging is occurring
because of the excessive I/O, which appears to have also caused a memory
bottleneck.
Let’s change just a few numbers around. What do we see now?
kthr

memory

-----

---------avm

fre

page

faults

--------------------re pi

cpu

-------------in

--------------------

us sy id wa

173838 123

92 104 208 417

365 12001 6004

81 19

2.4

173837 567

53 109 229

297 17002 8124

74 25

1.9

173838 191

0 22 127 140 287 567

329 18229 9374

41 23

2.1

Clearly, there are no I/O problems to speak of; the wait times and blocked
processes are not there. The CPU obviously is running hot and heavy. Do
we have a CPU bottleneck? Sure we do, because the CPU is running at
almost 100 percent busy. But what is causing that bottleneck? Because
excessive paging is going on and the numbers in fre list are low, I would
guess that in this case a memory bottleneck is causing the CPU bottleneck, not the reverse. In fact, this snapshot could have been taken after we
fixed the I/O bottleneck in the previous snapshot. Remember, fixing one
bottleneck often causes others, but that’s okay; it’s just part of the circle of
tuning. In any case, the system is at least processing data here, where in the
prior example, it was just stuck in the mud. If we can tune the memory accordingly here, the CPU bottleneck may break through, or it may continue.
In the latter event, we might have to throw more iron at the box or manage
our workload more efficiently.

sar (Unix-generic)

Virtual Memory Summary
You’ll be interested to know that AIX 5.3 introduced a new vmstat flag,
–v, that summarizes overall virtual memory statistics:
# vmstat -v
786432
748478
574790
5
110858
80.0
20.0
80.0
4.4
33524
0.0
0
4.4
80.0
33524
0

memory pages
lruable page
free page
memory pool
pinned pages
maxpin percentag
minperm percentage
maxperm percentage
numperm percentage
file pages
compressed percentage
compressed percentage
numclient percentage
maxclient percentag
client page
remote pageouts schedule

0 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2484 filesystem I/Os blocked with no fsbbuf
0 client filesystem I/Os blocked with no fsbuf
0 external pager filesystem I/Os blocked with no fsbuf
0 Virtualized Partition Memory Page Faults
0.00 Time resolving virtualized partition memory page faults

sar (Unix-generic)
sar { -A [-M] | [-a][-b][-c][-d][-k][-m][-q][-r][-u][-v][-w][-y][-M]
[-s hh[:mm[:ss]]] [-e hh[:mm[:ss][-P processor_id[,...] | ALL]
-f file] [-i seconds] [o file] [interval [number]] [-X file]
[-i seconds] [o file] [interval [number]]

Let’s turn our attention now to the sar command and try using it to examine data that can impact virtual memory performance. In the following

Chapter 8: Memory: Monitoring

view, we’ll use the –rm flag, which enables us to view paging statistic (-r)
and semaphore information (–m). The output reports the following fields:
●

cycle/s — Number of page replacement cycles per second

●

fault/s — Number of page faults per second

●

slots — Number of free pages on the paging spaces

●

odio/s — Number of non-paging disk I/Os per second

●

msg/s — Number of Interprocess Communication (IPC) message
primitives
sema/s — Number of IPC semaphore primitives

# sar -rm 1 5
AIX lpar30p682e_pub 3 5 00CED82E4C00

12/30/07

System configuration: lcpu=4 mem=3072MB ent=0.40 mode=Uncapped
15:21:14

slots cycle/s fault/s
msg/s sema/s

odio/s