Tuning Linux, the tools... a brief overview
Files
We start out with files... because that's what we start out with... the kernel isn't running
when we boot up Linux, it's just sitting there on a drive. When we install Linux, we install the
files first. So... we start with files.
Putting the files on the media is an important first step, because once they're on the media, it's a bit
of a challenge to move them again. Not impossible, just not necessarily easy.
The first step is determining how much space you need for your files. You pick a hard drive, or a LUN (logical unit)
on a RAID or SANS. A thumbdrive, a floppy, whatever it is that you will work from.
Looking at a typical Linux filesystem, using "du" (disk usage) we can see the summary (-s) in human readable form (-h):
--> sudo du -sh *
9.7M bin
46M boot
4.0K cdrom
4.0K dev
25M etc
171G home
0 initrd.img
308M lib
4.0K lib64
16K lost+found
12K media
4.0K mnt
386M opt
du: cannot access ‘proc/16007/task/16007/fd/4’: No such file or directory
du: cannot access ‘proc/16007/task/16007/fdinfo/4’: No such file or directory
du: cannot access ‘proc/16007/fd/4’: No such file or directory
du: cannot access ‘proc/16007/fdinfo/4’: No such file or directory
0 proc
315M root
du: cannot access ‘run/user/1000/gvfs’: Permission denied
1.7M run
15M sbin
4.0K srv
0 sys
48K tmp
3.9G usr
1.4G var
0 vmlinuz
------------------------------------------------
177G .
NOTE: we were seeking how much space we needed... this would suggest something more than 177G, however,
examine /home, as this could be mounted separately... /home is using 171G... that means the rest of Linux
should fit within 6GB. Keeping /home separated from the root filesystem is a very good idea for a variety of
reasons including the ability to upgrade or replace the OS, security benefits, easier to backup, better performance
as any user activity will not impact the root filesystem and so on. This is all part of the design for tuning.
Separating filesystems that are more active such as /var and /home are often good ideas. Unfortunately someone
didn't like that /var gets separated, so they add a /run to mount things that someone felt important enough to
mount on the root filesystem. More clutter, but allegedly with some benefit. The /run directory seems to have
a lot of empty files and directories as well. Again, the idea is to "visualize" a process by showing it in /run,
and by having this on the root filesystem.
Using df (disk free) human readable (-h), we can see several things that are not filesystems listed.
--> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 227G 177G 39G 83% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 1.4G 4.0K 1.4G 1% /dev
tmpfs 288M 1.4M 287M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 1.5G 256K 1.5G 1% /run/shm
none 100M 16K 100M 1% /run/user
One of the other 0 byte items was sys, a listing of sys shows hardware related items, drilling down to power
and resume, we see it's an ASCII file that contains 0:0. This filesystem records various states of the hardware.
These files reflect in file form what we might learn using various systems tools that we'll describe later. The
vendor of the hardware might be able to use this information to allow a sys admin to better tune a system.
More research on this feature might be useful for advanced tuning practices.
--> ll sys
total 0
drwxr-xr-x 2 root root 0 Jul 7 18:43 block
drwxr-xr-x 28 root root 0 Jul 7 18:43 bus
drwxr-xr-x 57 root root 0 Jul 7 18:43 class
drwxr-xr-x 4 root root 0 Jul 7 18:43 dev
drwxr-xr-x 13 root root 0 Jul 7 18:43 devices
drwxr-xr-x 4 root root 0 Jul 7 18:43 firmware
drwxr-xr-x 7 root root 0 Jul 7 18:43 fs
drwxr-xr-x 2 root root 0 Jul 7 22:02 hypervisor
drwxr-xr-x 7 root root 0 Jul 7 18:43 kernel
drwxr-xr-x 136 root root 0 Jul 7 18:43 module
drwxr-xr-x 2 root root 0 Jul 7 18:43 power
------------------------------------------------
--> ll sys/power
total 0
-rw-r--r-- 1 root root 4096 Jul 7 18:43 disk
-rw-r--r-- 1 root root 4096 Jul 7 22:02 image_size
-rw-r--r-- 1 root root 4096 Jul 7 22:02 pm_async
-rw-r--r-- 1 root root 4096 Jul 7 22:02 pm_freeze_timeout
-rw-r--r-- 1 root root 4096 Jul 7 22:02 pm_print_times
-rw-r--r-- 1 root root 4096 Jul 7 22:02 pm_test
-rw-r--r-- 1 root root 4096 Jul 7 22:02 pm_trace
-rw-r--r-- 1 root root 4096 Jul 7 22:02 pm_trace_dev_match
-rw-r--r-- 1 root root 4096 Jul 7 22:02 reserved_size
-rw-r--r-- 1 root root 4096 Jul 7 22:02 resume
-rw-r--r-- 1 root root 4096 Jul 7 18:43 state
-rw-r--r-- 1 root root 4096 Jul 7 22:02 wakeup_count
------------------------------------------------
--> file sys/power/resume
sys/power/resume: ASCII text
------------------------------------------------
--> cat sys/power/resume
0:0
The /dev directory is where tty sessions and other special block and character files reside.
--------------------------------------------------------------------------------------
Then there were some errors when we ran "du -sh" that raised some questions...
--------------------------------------------------------------------------------------
1) du: cannot access ‘proc/16007/task/16007/fd/4’: No such file or directory
Using ls -al /proc/16007 - results in file not found, looking at /proc, there is no 16007 - that likely
means that the process terminated before "du" was able to calculate it's usage.
2) du: cannot access ‘run/user/1000/gvfs’: Permission denied
This is the annoying "gvfs" file system that automounts devices. While convenient, likely a bad idea
for security... think Stuxnet. This is the "GNOME virtual file system" (gvfs). To learn more about
it cut and paste this link to the wiki: https://en.wikipedia.org/wiki/GVFS I often disable this tool
by removing FUSE or whatever else supports this. When I mount a USB device I would prefer to define
the path and mount it manually. Clearly this is beyond the capability of Grandma Moses or the average
computer user who is just trying to read the news or send email. But there are tools they can click on
that do this same function. This is one of many bad ideas that have found their way into recent distributions
that will impact system performance and security. Keep this in mind when examining processes.
3) then there are the 0 files, let's look at the files using "ls -al"
--> ls -al
total 112
drwxr-xr-x 23 root root 4096 May 4 18:52 .
drwxr-xr-x 23 root root 4096 May 4 18:52 ..
drwxr-xr-x 2 root root 4096 Jun 25 08:49 bin
drwxr-xr-x 3 root root 4096 Jun 25 08:49 boot
drwxr-xr-x 2 root root 4096 May 4 18:49 cdrom
drwxr-xr-x 15 root root 4140 Jul 7 21:38 dev
drwxr-xr-x 152 root root 12288 Jul 7 18:43 etc
drwxr-xr-x 3 root root 4096 May 4 18:50 home
lrwxrwxrwx 1 root root 33 May 4 18:52 initrd.img -> boot/initrd.img-3.13.0-37-generic
drwxr-xr-x 25 root root 4096 May 4 22:29 lib
drwxr-xr-x 2 root root 4096 May 4 22:29 lib64
drwx------ 2 root root 16384 May 4 18:38 lost+found
drwxr-xr-x 4 root root 4096 May 18 11:33 media
drwxr-xr-x 2 root root 4096 Apr 10 2014 mnt
drwxr-xr-x 4 root root 4096 May 13 00:56 opt
dr-xr-xr-x 166 root root 0 Jul 7 18:43 proc
drwx------ 12 root root 4096 Jun 10 00:53 root
drwxr-xr-x 25 root root 920 Jul 7 18:48 run
drwxr-xr-x 2 root root 12288 Jun 25 08:48 sbin
drwxr-xr-x 2 root root 4096 Nov 26 2014 srv
dr-xr-xr-x 13 root root 0 Jul 7 18:43 sys
drwxrwxrwt 9 root root 12288 Jul 7 22:04 tmp
drwxr-xr-x 10 root root 4096 Nov 26 2014 usr
drwxr-xr-x 11 root root 4096 Nov 26 2014 var
lrwxrwxrwx 1 root root 30 May 4 18:52 vmlinuz -> boot/vmlinuz-3.13.0-37-generic
This tells us why initrd.img and vmlinuz were 0. They're links. But /proc also registers as 0
because the /proc directory provides "files" to represent processes. If you look in this directory
you will see mostly 0 bytes for file sizes. For example, let's first use "file" to determine
what vmstat is, then, let's "cat" the file vmstat:
1) says it's empty
2) when viewed with cat the process spills its guts and reveals the values of memory allocations.
vmstat is also one of the tuning tools we can use to analyze a system.
------------------------------------------------
--> file vmstat
vmstat: empty
------------------------------------------------
john@silver [/proc]
------------------------------------------------
--> cat vmstat
nr_free_pages 28869
nr_alloc_batch 1290
nr_inactive_anon 105886
nr_active_anon 310861
nr_inactive_file 52981
nr_active_file 83159
nr_unevictable 0
nr_mlock 0
nr_anon_pages 390988
...
...
thp_fault_alloc 730226
thp_fault_fallback 3204
thp_collapse_alloc 1650
thp_collapse_alloc_failed 82
thp_split 1673
thp_zero_page_alloc 2
thp_zero_page_alloc_failed 2
nr_tlb_remote_flush 488793
nr_tlb_remote_flush_received 488212
nr_tlb_local_flush_all 8
nr_tlb_local_flush_one 2047273
---------------------------------------------------
du, df, cat, ls, more, strings and file
The commands above were used to examine files and how they are represented on media. Using the newer
kernels that provide /proc and /run we can examine system states and processes, as well as file usage.
- basic commands to view files and usage: du, df, cat, more, ls, strings and file
- to examine actual performance of how files are moved around we could use: iostat, vmstat and netstat
- iostat - used to analyze input/output performance, one looks for specific details related to character size and transfer
breaking it down to show user and system processes, as well as idle. It also shows transfer per second, and data r/w.
-> iostat (MINT DOESN'T INCLUDE IT)
The program 'iostat' is currently not installed. You can install it by typing:
sudo apt-get install sysstat
--> iostat (SuSE does...)
Linux 3.11.10-29-desktop (JohnMeister) 07/07/2015 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.56 0.01 0.26 0.30 0.00 98.86
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 1.13 16.22 26.82 17611960 29126804
sdb 0.04 0.03 13.45 33997 14600448
example of use of iostat while executing a dd of a 4tb drive
using "dd" to copy block by block one 4tb SATA drive to another
attached via USB in an enclosure.
--> dd if=/dev/sdb of=/dev/sdc
started this over 24 hrs ago...
load average: 2.66, 2.72, 2.97 it's working... top shows dd and usb loaded...
one of the 4tb drives shows up as 1.8T, but the block size is the same.
The 4tb drive was in an enclosure and should be new. I don't know why it is
showing as 1.8TB when queried with fdisk -l. The drive is likely out of warranty,
that's what I get for buying bleeding edge consumer grade hardware.
But, if it wasn't a 4tb drive it would have puked some time ago... but fdisk, which
can't be trusted over 3tb, is showing that the partition table copied... parted shows
the copied partition from /dev/sdb1 to /dev/sdc1 - that tells me the table moved over,
not sure why it reports as 1.8TB, won't know until the dd is complete. The iostat
info will disclose the rate at which the data is being transferred... basically
divide the total kB on the 4tb drive by the kB read/s and divide by 60 to get the
minutes, divide by 60 to get the hours...
I'm afraid to do the math... I should have unmounted the 4tb drive and
put it in my disk "toaster" a usb device that allows direct copy within that
unit between two SATA drives...
so, looking at iostat I can see that data is being written... older
hardware... usb 2.0 port... I hope... I'm afraid to do the math... if it's
not done in another 24hrs I may pull the plug...
--> iostat /dev/sdc
Linux 3.11.10-29-desktop 07/14/15 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.01 0.01 0.95 3.05 0.00 94.98
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdc 66.20 256.16 255.84 433073443 432530688
------------------------------------------------
--> iostat /dev/sdc
Linux 3.11.10-29-desktop 07/14/15 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.01 0.01 0.95 3.11 0.00 94.92
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdc 67.65 261.78 261.46 443042771 442498968
--> iostat /dev/sdc
Linux 3.11.10-29-desktop 07/14/15 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.01 0.01 0.95 3.12 0.00 94.91
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdc 67.91 262.77 262.45 444804343 444259608
------------------------------------------------------------------------
killed dd after 2 days: - disk still reporting incorrectly:
------------------------------------------------------------------------
--> dd if=/dev/sdb of=/dev/sdc | tee -a dd-if-dev-sdb-of-dev-sdc-2015-jul-13.txt
^C
1725601961+0 records in
1725601961+0 records out
883508204032 bytes (884 GB) copied, 163363 s, 5.4 MB/s
------------------------------------------------------------------------
Disk /dev/sdb: 4000.8 GB, 4000787030016 bytes, 7814037168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: dos
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdb1 1 4294967295 2147483647+ ee GPT
Partition 1 does not start on physical sector boundary.
Disk /dev/sdc: 1801.8 GB, 1801763774464 bytes, 3519069872 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdc1 1 4294967295 2147483647+ ee GPT
------------------------------------------------------------------------
--> mount /dev/sdc1 4tb-USB/
mount: special device /dev/sdc1 does not exist
------------------------------------------------------------------------
--> fdisk /dev/sdc
Welcome to fdisk (util-linux 2.23.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): p
Disk /dev/sdc: 1801.8 GB, 1801763774464 bytes, 3519069872 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdc1 1 4294967295 2147483647+ ee GPT
Command (m for help):
------------------------------------------------------------------------
--> mount /dev/sdc 4tb-USB/
mount: /dev/sdc is write-protected, mounting read-only
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
clearly this drive is damaged... fdisk, parted and dd did not change its reporting.
Will attempt to use Gparted on it later, but first, will contact the vendor, there is a possibility
it is still under warranty. In the meantime, my server data cleanup project is forced to use a 2TB and 3TB drive.
- vmstat - used to analyze memory and how it is being used, e.g.
--> vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
5 0 46624 150264 12764 598780 0 2 79 227 546 1323 57 8 34 0 0
- netstat - used to analyze network devices and traffic. Analysis looks for collisions, amount of traffic and sockets used.
--> netstat --help
usage: netstat [-vWeenNcCF] [] -r netstat {-V|--version|-h|--help}
netstat [-vWnNcaeol] [ ...]
netstat { [-vWeenNac] -i | [-cWnNe] -M | -s }
-r, --route display routing table
-i, --interfaces display interface table
-g, --groups display multicast group memberships
-s, --statistics display networking statistics (like SNMP)
-M, --masquerade display masqueraded connections
-v, --verbose be verbose
-W, --wide don't truncate IP addresses
-n, --numeric don't resolve names
--numeric-hosts don't resolve host names
--numeric-ports don't resolve port names
--numeric-users don't resolve user names
-N, --symbolic resolve hardware names
-e, --extend display other/more information
-p, --programs display PID/Program name for sockets
-c, --continuous continuous listing
-l, --listening display listening server sockets
-a, --all, --listening display all sockets (default: connected)
-o, --timers display timers
-F, --fib display Forwarding Information Base (default)
-C, --cache display routing cache instead of FIB
={-t|--tcp} {-u|--udp} {-w|--raw} {-x|--unix} --ax25 --ipx --netrom
=Use '-6|-4' or '-A ' or '--'; default: inet
List of possible address families (which support routing):
inet (DARPA Internet) inet6 (IPv6) ax25 (AMPR AX.25)
netrom (AMPR NET/ROM) ipx (Novell IPX) ddp (Appletalk DDP)
x25 (CCITT X.25)
The analysis of files focuses on space used, amount moved, and traffic analysis whether on the internal bus or
on the network. Moving data around can improve overall performance and one of the reasons clusters were chosen for
projects like big data and databases. Analyzing data movement allows one to segregate it and optimize overall loads
for the best overall performance for the system and the users.
Processes
Once the files are on the system and used to begin computing operations, the processes become the next element
of analysis and tuning. The idea is to make effective use of the resources. The resources include the CPU's
interprocess communication, registers, ALU and i/o controls such as busses and drivers for devices. Distributing
files across various devices improves movement of data and increases throughput, but also allows better utilization
of the internal hardware by balancing it as well.
- some of the key commands to understand processes are: ps, w, uptime, top, sar, nice, kill as well as iostat and vmstat
- ps - process status. shows PID and PPID, along with other attributes
--> ps -ef
UID PID PPID C STIME TTY TIME CMD
john 2233 2065 0 Jul07 pts/8 00:00:00 bash
------------------------------------------------
--> ps -ef | grep bash
john 2215 2065 0 Jul07 pts/0 00:00:00 bash
john 2221 2065 0 Jul07 pts/2 00:00:00 bash
john 2228 2065 0 Jul07 pts/7 00:00:00 bash
john 2233 2065 0 Jul07 pts/8 00:00:00 bash
john 2238 2065 0 Jul07 pts/9 00:00:00 bash
john 23675 2215 0 00:08 pts/0 00:00:00 grep --colour=auto bash
------------------------------------------------
--> ps -help
error: unsupported SysV option
Usage:
ps [options]
Try 'ps --help '
or 'ps --help
'
for additional help text.
For more details see ps(1).
- w - shows who is logged in and uptime
-> w
00:16:04 up 5:32, 6 users, load average: 0.36, 0.32, 0.40
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
john tty8 :0 18:43 5:32m 5:46 0.39s x-session-manager
john pts/0 :0 18:43 4.00s 0.37s 0.00s w
- uptime - shows current time, how long in hours, or days the system has been up, number of users,
and the system load averages for the past 1, 5, and 15 minutes.
-> uptime
00:18am up 12 days 15:08, 8 users, load average: 0.00, 0.01, 0.05
- ping - used to determine network performance in times to respond
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=56 time=2517 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=56 time=6665 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=56 time=7126 ms
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 2517.208/5436.214/7126.359/2072.621 ms, pipe 3
- top - identifies the current processes along with a summary of system performance. top can be
customized using a .toprc file.
Typing "man top" will provide hours of entertainment and enjoyment for your systems administrator.
--> top --help
top: inappropriate '-help'
Usage:
top -hv | -bcHiOSs -d secs -n max -u|U user -p pid(s) -o field -w [cols]
--> top -b -d 3
top - 00:22:31 up 5:39, 7 users, load average: 0.18, 0.20, 0.31
Tasks: 155 total, 1 running, 154 sleeping, 0 stopped, 0 zombie
%Cpu(s): 50.8 us, 7.4 sy, 0.0 ni, 41.6 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 2946556 total, 2431928 used, 514628 free, 21656 buffers
KiB Swap: 3005436 total, 233628 used, 2771808 free. 555028 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2065 john 20 0 708424 13488 7640 S 6.3 0.5 0:33.96 mate-terminal
4868 john 20 0 1710912 237000 8444 S 6.3 8.0 287:33.83 plugin-containe
24543 john 20 0 24812 1492 1076 R 6.3 0.1 0:00.01 top
1 root 20 0 33780 2672 1268 S 0.0 0.1 0:03.06 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.16 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
- sar - "The sar command writes to standard output the contents of selected cumulative activity counters in the operating system. "
--> sar -h
Usage: sar [ options ] [ [ ] ]
Main options and reports:
-b I/O and transfer rate statistics
-B Paging statistics
-d Block device statistics
-H Hugepages utilization statistics
-I { | SUM | ALL | XALL }
Interrupts statistics
-m { [,...] | ALL }
Power management statistics
Keywords are:
CPU CPU instantaneous clock frequency
FAN Fans speed
FREQ CPU average clock frequency
IN Voltage inputs
TEMP Devices temperature
USB USB devices plugged into the system
-n { [,...] | ALL }
Network statistics
Keywords are:
DEV Network interfaces
EDEV Network interfaces (errors)
NFS NFS client
NFSD NFS server
SOCK Sockets (v4)
IP IP traffic (v4)
EIP IP traffic (v4) (errors)
ICMP ICMP traffic (v4)
EICMP ICMP traffic (v4) (errors)
TCP TCP traffic (v4)
ETCP TCP traffic (v4) (errors)
UDP UDP traffic (v4)
SOCK6 Sockets (v6)
IP6 IP traffic (v6)
EIP6 IP traffic (v6) (errors)
ICMP6 ICMP traffic (v6)
EICMP6 ICMP traffic (v6) (errors)
UDP6 UDP traffic (v6)
-q Queue length and load average statistics
-r Memory utilization statistics
-R Memory statistics
-S Swap space utilization statistics
-u [ ALL ]
CPU utilization statistics
-v Kernel table statistics
-w Task creation and system switching statistics
-W Swapping statistics
-y TTY device statistics
- nice - nice is not something that should be attempted without understanding the system. nice-ing a process gives
it increased priority, this can lead to system failure if it's priority exceeds that of a system process.
- kill - kill allows the user to stop a process.
EXAMPLES
kill -9 -1
Kill all processes you can kill.
kill -l 11
Translate number 11 into a signal name.
kill -L
List the available signal choices in a nice table.
kill 123 543 2341 3453
Send the default signal, SIGTERM, to all those processes.
SEE ALSO
kill(2), killall(1), nice(1), pkill(1), renice(1), signal(7), skill(1)
|