Systems Administration Overview
Systems Administration skills
- System hardware setup
- Systems software installation and configuration
- user accounts
- file system management
- system backups
- system upgrades
- Troubleshooting systems and fixing the problem
Mon Jan 6 22:37:54 2003
Below is probably the best article on Systems Administration I've seen...
I've selected a few passages below, but the entire article is a must read.
copied here for training purposes, found on link above, active as of 9/12/2015:
Life in the trenches: a sysadmin speaks
By Sam Varghese December 27 2002
Craig Sanders: "A sysadmin who doesn't have strong experience-based opinions about how things should be
done probably isn't very confident in their own ability to do the job."
As recently a decade ago, a systems administrator wasn't really needed in every medium- or large-sized corporation. There were motley assemblages of computers which were used for this task and that and if one or two broke down, then the supplier came in and fixed them.
But as use of the Internet spread, offices began to be increasingly networked, servers appeared in numbers and men and women were needed on-site to keep these metallic objects - which had slowly assumed tremendous importance as data repositories - going. Uptime became important.
Early on, the men and women - and lots of pimply-faced teenagers - who took on these jobs were considered a breed apart. They weren't exactly flavour of the month - and seemed to return the compliment by sticking to themselves as much as possible.
But as geeks became more and more socially accepted, it came to be known as a cool profession - though most people never knew what these IT folk really did.
Some migrated to this line out of a genuine liking for what they would be doing; as the tech boom gathered momentum, many others with dollar signs in their eyes joined what looked like a never-ending job queue.
Craig Sanders belongs to the former category. Around the time when IBM put out its first PC, he was already working as a programmer - at 14.
In 1982, he went into a support/sysadmin role and has stayed in that line ever since. Says he: "I guess this job was inevitable for me since I discovered computers at the age of 11. The only job I've ever had that wasn't in the computer industry was a brief stint selling hotdogs outside a pub while I was at university, which lasted until I found a part-time programming job."
From the early 1990s onwards, Sanders began to focus on Unix systems administration almost exclusively. From 1994, his focus has been Linux. He is a developer for the free Linux distribution, Debian.
Sanders currently works at Vicnet, an Internet Service Provider focusing on community groups and libraries. He started as a systems administrator in November 1997, and was promoted to senior systems administrator a year or so later. Most of the Vicnet servers run Debian GNU/Linux; some run Sun Microsystems' Solaris operating system, and there are also a few Windows NT servers.
Sanders inspires strong emotions - he is convinced about what he believes in and does not suffer fools gladly. He is forthright in his opinions but is rarely technically challenged on them. He is probably one of the few 35-year-olds in the country who until recently did not have a television set because he hates advertising. He now has one but uses it only to watch DVDs and the news on the ABC.
He was interviewed by email.
What are your fundamental tasks as a sysadmin?
To keep the systems running.
To plan and implement upgrades and new services.
To plan for disaster, minimising the risk and the potential damage, including backups and disaster recovery planning.
To resolve any systems problems that crop up or, better yet, to see the warning signs and head them off before they become a problem.
To keep my skills up-to-date.
To be a knowledge resource for the company.
What qualities do you rate as essential for a good sysadmin?
In rough order of importance:
Ability to learn and understand complex subjects quickly.
Ability to hold a mental model of How Things Work.
Caution and knowing how to make changes in a way that you can quickly and easily undo if you need to i.e. revision management skills.
Communications skills - you need to not only know something, you need to be able to explain it to others in plain English so that
reasonably intelligent non-experts can understand it.
Note that training and formal qualifications aren't on that list.
They're useful, but only in addition to the above traits, not as a substitute for them.
Sysadmins are often accused of being control freaks. They are also accused of being vengeful people, who use their technical knowledge to harass users and keep upper management in check. Your comment?
I can understand why some people might feel this way, but I don't agree. There is an inherent tension between maintaining a system's current functionality and developing new functionality. Part of a sysadmin's role is to manage the impact of development projects so that they don't negatively affect the existing systems. This is often interpreted as being adversarial.
A sysadmin has to know not only what can be done but also what cannot (or should not) be done. Sometimes that means stopping people from doing the wrong thing and sometimes it means making sure that they do the right thing. This can annoy people or lead them to believe that they are being deliberately thwarted, but it's really just the sysadmin doing the job they were hired to do.
It's difficult to put it in more general terms than that, because it is highly situational - for most tasks, there are several ways to do it. Some ways are obviously better and anyone can see them; others are not so obvious, it requires a lot of experience to be able to foresee how subtle differences and even subtler interactions between different components can have an enormous impact on the final outcome; and some ways are obviously wrong to an experienced tech but may appear to be right to someone blinded by glossy marketing brochures or a slick sales-pitch for whatever the latest snake-oil buzzword is.
Also, a sysadmin who doesn't have strong experience-based opinions about how things should be done probably isn't very confident in their own ability to do the job... and if they're not confident, why should you be? Sometimes this strength of will and confidence may be interpreted as being a "control freak", especially by people who don't have the background to understand the reasons why a sysadmin has made particular decisions.
Does life as a sysadmin really end after you leave work? Or are you on edge, waiting for your mobile to ring?
The job never really ends, but I'm certainly not on edge. I'm on call 24/7 but if I've done my job right I generally don't have to worry about being called in the middle of the night.
Have you ever been in the position where you had to act as mentor for someone in this line? If so, how did you go about it?
Yes, I have had (and still have) several junior system admins. Part of my job is to train them. I do that by setting an example, setting standards (e.g. of quality) for how things should be done, teaching them how to do something and, most importantly, teaching them how or why it works. Then I gradually give them responsibilty for their own systems or service areas.
I think that having a good understanding of how something works is far more valuable than having a specific rote procedure to follow. If you understand it, you can deal with situations that haven't been pre-scripted i.e. you can deal with unplanned emergencies. If all you know is a set of rote procedures then you're in serious trouble when something crops up for which you don't have a set procedure.
What's been the biggest crisis you've faced as a sysadmin? How did you resolve it?
The worst disaster I can recall was when a rack shelf fell apart (the builder put it together the wrong way) and dropped a few servers on the floor from about two metres high. One of our Web servers died, the disk heads crashed. I had to build a replacement from spare parts and restore the data from backup. It was back up and running the same day, and we only lost a few hours worth of Web server log files.
Do you find that your IT involvement cuts you off from people? Has it affected you in any way?
No, not really. I have noticed that until the Internet became popular in the mid-90s it was social death to admit to any interest in computers, and it was certainly not acceptable to talk about them at parties. That's changed now. It's still considered "geeky" but it's not the unforgivable social crime that it once was. You still have to pretend not to know much about computers, but these days it's so you don't waste the entire party solving someone's computer problems for them.
I think, though, that to be any good at this job you have to have a particular way of thinking and looking at the world. For those who like personality tests, Myers-Briggs personality types INTJ and INTP typically make good systems admins. These personality types are fairly uncommon (less than five percent of the population), and the worldview is moderately alien to most people... so, while there may be some level of "cut off" from other people, the job isn't the cause.
This is not to say you have to be INTJ/INTP to be a good system admin, just that the percentage within sysadmin and related professions is many times higher than the percentage within the general population.
What is your partner's reaction to the line you have chosen (and love)?
The flippant answer is that I solved that problem by training her to be a systems administrator too :-). My partner's response is: "It's good, it keeps him out of my hair while I'm programming". Actually, we both work in the Internet industry. Her skill set is slightly different to mine. She's better at programming and much better at management tasks, whereas I'm better at systems administration and don't have much interest at all in taking on management roles.
How much input would a good sysadmin have into choice of platforms in a company? Or is this solely a matter for management?
Management should set the budget and the overall needs. Systems staff need free reign to implement a solution that meets those needs within the budget.
Otherwise, what you end up with is a system that doesn't work very well because it was designed by people who are not qualified to design it. Managers are skilled at management tasks, they know what the business needs of the company are but, as a general rule, they do not have the knowledge or experience required to make technical decisions.
In my experience, it's an iterative process where management sets the budget and outlines the requirements. The sysadmin does the research and comes back with a list of options that may meet those needs, detailing the pros and cons of each option. A few rounds of this narrows down the options under consideration until only one or two are left. Then a decision is made and implementation planning begins.
How would you go about introducing new technology in a company - stuff which you know will make life easier for both users and admins but which has no support from a management team which views change as disruptive?
As a general rule, it's best to talk about feature sets and not about particular brands of technology. That's a good way to look at it anyway, because a good design is modular and any component should be easily replacable by a similar component that does the same job.
I guess you're asking about Linux and other Open Source software here, so I'll use Linux and Samba as an example: when a need comes up for a new file or print server, don't talk about installing a Linux box, talk about installing a new file or print server. As long as what you implement does the job and works reliably, no one will care how it's done as long as it works.
Otherwise, a generally cautious approach is the best way. Don't introduce sweeping changes, overnight - migrate to them gradually. start with small narrowly-defined services, e.g. take some of the workload off your NT file server by adding a Linux print-server or two (you can do this at effectively no cost by recycling an obsolete desktop machine). or protect your MS Exchange server by hiding it behind a firewall and using Linux and postfix as a safe, anti-spam, virus-scanning email gateway between Exchange and the Internet.
And finally, you need to be able to recognise when it isn't a good idea to change something. even though the new technology may be better, the workflow and routine of your site may be too closely tied to the existing product. No amount of superior technology is going to justify disrupting a routine that works. If you can introduce the new technology without disruption, then do it. Otherwise, don't.
What's your biggest complaint about the profession?
I don't have much to complain about. I like the job, I enjoy the challenges, and I get a real sense of accomplishment from making sure that the systems I'm responsible for work reliably 24 hours a day, 7 days a week.
The biggest issue would be that often there is no clear distinction between work and non-work hours - it's very easy to work 12 or 15 hours or more per day when you have a difficult or interesting problem to work on.
This is true for the job in general, but telecommuting makes it even more so. OTOH, (on the other hand) telecommuting is one of the major benefits of the job.
And the biggest plus point?
Telecommuting. I can do at least 70 percent of my job from home at any hour of the day or night. With appropriate encryption, it makes no difference whether I am sitting at the console or at my desk at the office or at my desk at home - or anywhere for that matter.
I've logged in to my systems at work while away at conferences and fixed things. I've even logged in from an Internet cafe while on holiday in India, although the lag on that link was too slow to get much done.
Systems Administration is the kind of job that nobody notices if you're doing it well. People only take notice of their systems when they're not working, And they tend to forget that a lot of work and expertise goes into making sure that they continue working.
But that's as it should be - computer networks are infrastructure that you should be able to rely on, to take for granted, just like telephones and electricity. If you can't do that, then there's something wrong, something that can and should be fixed.
Troubleshooting - things to do, questions to ask, in no particular order
- first... gather the details
- What was the configuration? User, Shell, Command or Application
- What happened? Error message? Bright flash? Smoke?
- what did the error message, if any say? (remember errors may report a line before, or after... )
- did you grep for the "key" word in the error message?
- did you look at the man page or --help?
- did you search for the error message on line?
- did you look at the file if it was a script?
- did you try it again?
- what were the environment variables
- what versions?
- how much disk space is available?
- how big was the file?
- is the system using swap?
- is there adequate system resources and memory?
- have you tried "top" to see what processes are using the cpu?
- is there a defunct process?
- is the service desired enabled? running?
- can you view the log files?
- can you view the configuration files, any changes from when it worked?
- did it ever work?
- are the file permissions correct?
- is the network working ok?
- is this app licensed and is the license current?
- is their a website that this needs to access to work, is it pingable?
- are you using the correct account for this script or app?
- are all the dependencies correct for this application?
- any recent updates?
- does this use a specific version of... java? html? bash? firefox?
- has anyone else experienced this problem?
- how many files are in /tmp?
- did you see what children the process has?
- did you examine the process tree of the parents?
- what is the workload of the system?
- what is the priority/nice value of the app?
- are there different versions on the system? if so, is this one pointing the all of the correct versions?
- what desktop are you using and have you logged off and back on recently?
- is there a dotfile directory for this app? did you see how large it is? du -sh .dotfile
- has this app ever failed before?
- are all services running that are necessary for this app?
- if this app was started with an init script, is the system using systemd? - if so, you may need to convert and establish a service.
- if the system has been running for a long time, has it been updated?
- if all else fails, and everything else is correct, and it's been up for months, a reboot might be tried... last resort...
- rebooting will not solve a problem generally unless it was caused by a failed dependency... LEARN how to fix it before rebooting.
- understand the tool, what does it do... what does it use for libraries, commands, variables.
- it is highly unlikely the application has become corrupted in the code, it is a variable, dependency or version of some other process/code.
- understand the dependencies, be careful of updates and suspect them if there is a problem.
Systems Administration Descriptions
Systems Administration is the kind of job that nobody notices if you're
doing it well. People only take notice of their systems when they're not
working, And they tend to forget that a lot of work and expertise goes into
making sure that they continue working.
Craig Sanders: "A sysadmin who doesn't have strong experience-based
opinions about how things should be done probably isn't very confident in
their own ability to do the job." Sometimes this strength of will and
confidence may be interpreted as being a "control freak", especially by
people who don't have the background to understand the reasons why a
sysadmin has made particular decisions.
How much input would a good sysadmin have into choice of platforms in a
company? Or is this solely a matter for management?
Management should set the budget and the overall needs. Systems staff need
free reign to implement a solution that meets those needs within the budget.
Otherwise, what you end up with is a system that doesn't work very well
because it was designed by people who are not qualified to design it.
Managers are skilled at management tasks, they know what the business needs
of the company are but, as a general rule, they do not have the knowledge
or experience required to make technical decisions.
In my experience, it's an iterative process where management sets the
budget and outlines the requirements. The sysadmin does the research and
comes back with a list of options that may meet those needs, detailing the
pros and cons of each option. A few rounds of this narrows down the options
under consideration until only one or two are left. Then a decision is made
and implementation planning begins.
What are your fundamental tasks as a sysadmin?
* To keep the systems running.
* To plan and implement upgrades and new services.
* To plan for disaster, minimising the risk and the potential damage, including backups and disaster recovery planning.
* To resolve any systems problems that crop up or, better yet, to see the warning signs and head them off before they become a problem.
* To keep my skills up-to-date.
* To be a knowledge resource for the company.
a summary of techncial areas to be mastered by a Sys Admin
UNIX and Linux Systems Administration Overview
- Unix/Linux History - understand how and why commands were created as they are
- Operating system - kernel, drivers, libraries, shell, application
- man pages, gnu --help
- supporting users
- Hardware Requirements
- Installation methods
- Disk configuration (SCSI, IDE, SATA)
- Host Table
- starting and stopping services
- system run levels
Software and Patch administration
- Package Information
- Adding Packages/Packages via Scripts or the Command Line
- Removing Packages
- Displaying installed Packages/Patches
- The /etc/passwd ; /etc/shadow File ; /etc/group files (/etc/skel and bashrc or profile)
- using command line tools or GUI - or manual using vi
- archiving removed users
- password security (shadow, crack, aging, etc.)
- ssh configuration
- log files - failed attempts, system access via network
- sudoers (visudo or vi /etc/sudoers)
- chown, chmod and permissions
- disallowing ftp, or using vsftp
System Service and/or INIT files
/etc/inittab or related to show run level
- /etc/init.d (rc5.d)
/boot grub.cfg menu.lst and other related files
File Systems, Disks and Directory Structure
- /dev Device drivers
- mount, umount and /etc/fstab (mount points and drives)
- partitions, fdisk, fsck, du, df, parted
- LVM - volume groups, physical volumes, logical volumes
- dd, rsync, tar, dump, zip, compress, and backups
- NFS, samba
- swap and virtual filesystems
- RAID (0, 1, 1/0, 5)
- ps, top, kill, xkill
- vmstat, iostat, netstat
- crontab, at and nohup
- /run, /proc and other directories representing processes
- log files
Backup and Recovery
- Full and Incremental Backups
- Tape Devices, SANS, filesystem snaps, rsync
- ifconfig, netstat, route
- ssh, scp, ftp, telnet and "r" commands
- xinetd and inetd
- /etc/hosts, /etc/resolv.conf, /etc/nsswitch.conf
- NIS, LDAP, MS_DNS, DNS, DHCP and other network services
Shell Scripting (cron, at)
- Comments, echo, variable declaration, loops, if, while, case
John Meister - Adjunct Faculty - Computer Science Dept.
City University Circa 2003