Top 10 System Administrator Truths 561
Vo0k writes "What are your top ten system administrator truths? We all know them already, but it's still fun re-telling them. Stuff like "90% of all hardware-related problems come from loose connectors", even though you already know it's true, may save you from replacing the "faulty" motherboard if you recall it at the right time."
Re:95% of all problems.... (Score:5, Interesting)
Truth... (Score:5, Interesting)
Top 3 (Score:5, Interesting)
2) Always ask the dumb questions: is it switched on?
3) Reboot cures most things EXCEPT rm -r * when logged in as root
After that, things could get tricky.
My own list (Score:5, Interesting)
-Make sure you can leave exactly like it was before you touched it.
-Dont fix what aint broken.
-Start from a known state of the system (switch off - switch on).
-Even you are genius level techie, follow the manual, RTFM.
-Dont reinvent the wheel. Compare with something thats working.
-Cables are not perfect. If something doesnt connect, check lower levels first.
-If its there, ther must be a reason. Never ever delete anything. Rename instead.
-You memory is not infinite. Write what you do.
Re:In no particular order.... (Score:4, Interesting)
Google is your best freind. ever. period.
This goes for admins, programmers, and just about every other profession, especially in IT.
Good managers ask for something in 5 days, but need it in 6.
Such a basic thing, but so so important. I always try to pad estimates for our department, but I should be sure to pad my requirements for my staff as well.
Reboots, are you kidding? (Score:3, Interesting)
When I started working at my job, we had serveral servers that would reboot on a cron for the sole reason that someone was too lazy to figure out the problem. We eliminated all but one of these reboots, mainly because we don't care about the last one.
My holy grail would have to be strace/truss/tusk. I would take that tool over reboot any day. It doesn't always fix the problem, but at least you will know what it is, instead of rebooting like a coward.
4 Rules (Score:5, Interesting)
Rule number 1. People are stupid. This one is true of all people. Tech support, highways, shopping, whatever. This rule has been extended to cover just about any stupid thing that anyone does.
"Why did that guy just..."
"Rule number 1."
"Did she think she could get away with that?"
"Rule number 1."
Rule number 2. People lie.
Me: "Has the computer been restarted since the problem started?"
Them: "Yes..."
Me: "OK. Let's try restarting the computer now and see what happens."
Them: "What do you mean by restart?"
And when you add 1 and 2 together, you get 3. Sometimes, people are so stupid, they don't know that they're lying. You know these people. They're the ones who have "Windows 2000 XP" or "2000 ME." They're the people for whom "Nothing happens when I try to check my email. Nothing! Just this error message..." Not realizing that the error message is *exactly* what I was looking for. An error message is *not* nothing. Grr.
There is a fourth rule that also shows up from time to time:
Rule number 4. No good deed goes unpunished.
In the famous words of the leader of the Uruk Hai from his battle call at Helm's Deep in The Two Towers: "Grr."
Re:From the user's side... (Score:3, Interesting)
It would be really interesting to see a study to determine whether changing passwords frequently actually increases or decreases your vulnerability.
One step at a time fool! (Score:3, Interesting)
Disk... (Score:3, Interesting)
I know, those are all corrolaries of Murphy's law, but hey.
Is the monitor plugged in? (Score:3, Interesting)
They always claim there is only one socket the monitor will plug into, and without fail so far there has been an onboard one, which they are using, and one on a card, which is the one they should be using, and have completely missed
MS has permanently brain damaged all IT workers (Score:3, Interesting)
If you believe this or if you need this, you are running a
POS operating system and its probably from Microsoft.
That this would even be considered a rule by a professional IT
worker is all the proof we need that Bill Gates has caused
more damage than he can ever hope to make up for.
What utter crap.
Re:In no particular order.... (Score:5, Interesting)
Anyway, for the benefit of those who haven't seen this (very old and long, but somewhat entertaining) email that was doing the rounds a while ago... disclaimer: someone else wrote it, and I don't know who.
KNOW YOUR UNIX SYSTEM ADMINISTRATOR - A FIELD GUIDE
There are four major species of Unix sysad:
1) The TECHNICAL THUG. Usually a systems programmer who has been forced into system administration; writes scripts in a polyglot of the Bourne shell, sed, C, awk, perl, and APL.
2) The ADMINISTRATIVE FASCIST. Usually a retentive drone (or rarely, a harridan ex-secretary) who has been forced into system administration.
3) The MANIAC. Usually an aging cracker who discovered that neither the Mossad nor Cuba are willing to pay a living wage for computer espionage. Fell into system administration; occasionally approaches major competitors with indesp schemes.
4) The IDIOT. Usually a cretin, morpohodite, or old COBOL programmer selected to be the system administrator by a committee of cretins, morphodites, and old COBOL programmers.
HOW TO IDENTIFY YOUR SYSTEM ADMINISTRATOR:
-- SITUATION: Low disk space. --
TECHNICAL THUG: Writes a suite of scripts to monitor disk usage, maintain a database of historic disk usage, predict future disk usage via least squares regression analysis, identify users who are more than a standard deviation over the mean, and send mail to the offending parties. Places script in cron. Disk usage does not change, since disk-hogs, by nature, either ignore script-generated mail, or file it away in triplicate.
ADMINISTRATIVE FASCIST: Puts disk usage policy in motd. Uses disk quotas. Allows no exceptions, thus crippling development work. Locks accounts that go over quota.
MANIAC:
# cd
# rm -rf `du -s * | sort -rn | head -1 | awk '{print $2}'`;
IDIOT:
# cd
# cat `du -s * | sort -rn | head -1 | awk '{ printf "%s/*\n", $2}'` | compress
-- SITUATION: Excessive CPU usage. --
TECHNICAL THUG: Writes a suite of scripts to monitor processes, maintain a database of CPU usage, identify processes more than a standard deviation over the norm, and renice offending processes. Places script in cron. Ends up renicing the production database into oblivion, bringing operations to a grinding halt, much to the delight of the xtrek freaks.
ADMINISTRATIVE FASCIST: Puts CPU usage policy in motd. Uses CPU quotas. Locks accounts that go over quota. Allows no exceptions, thus crippling development work, much to the delight of the xtrek freaks.
MANIAC:
# kill -9 `ps -augxww | sort -rn +8 -9 | head -1 | awk '{print $2}'`
IDIOT:
# compress -f `ps -augxww | sort -rn +8 -9 | head -1 | awk '{print $2}'`
-- SITUATION: New account creation. --
TECHNICAL THUG: Writes perl script that creates home directory, copies in incomprehensible default environment, and places entries in
ADMINISTRATIVE FASCIST: Puts new account policy in motd. Since people without accounts cannot read the motd, nobody ever fulfills the bureaucratic requirements; and so, no new accounts are ever created.
MANIAC: "If you're too stupid to break in and create your own account, I don't want you on the system. We've got too many goddamn sh*t-for-brains a**holes on this box anyway."
IDIOT:
# cd
# echo "Bob Simon:gandalf:0:0::/dev/tty:compress -f" >
-- SITUATION: Root disk fails. --
TECHNICAL THUG: Rep
Geek aura (Score:5, Interesting)
Seriously, anthropomorphizing machines is a powerful technique. It gives you an approximate but effective mental model of a complex system. "Primitive" cultures are not dumb when they attribute personalities to objects. Our brains are wired to use personality to predict complex behaviour.
My Mother had no technical skills or knowlege - but she treated the automobile like a pet. She was alert to the tiniest change in sound or vibration of the machine, and very often alerted my Dad to problems long before he was aware of anything. One time, driving across country, my Mom said the right front wheel "didn't sound right". We were cruising along at 70, and everything seemed fine. But she insisted, so my Dad pulled over and checked all the tires. No sign of a problem. He pulled the hub cap off the right front wheel - and noticed that the cotter pin had broken! A few more miles and the wheel would have come off. My Dad panicked, since we didn't have any cotter pins in his repair kit. But my Mom dug in her purse and offered a bobby pin. My Dad didn't want to use it, because it was the wrong kind of metal and would break easily. My Mom said she had more, so he put it in. That bobby pin took us another 5000 miles.
My Dad does all his own work on his cars - at least he did until he ruined the valves on his Honda Accord a few years ago. Now he lets a mechanic do some stuff for him. I learned to be in tune with machines from my Mom, and learned to fix them from my Dad. When designing file system software back in the '70s, the rhythmic sounds of the disk access mechanism was my best feedback on its efficiency. Those were the days of 14" disk platters.
Re:Geek aura (Score:2, Interesting)
So true. Frequently in an office environment somebody will come to me and say "I tried to do foo and it didn't work". My previous starting point was always "what happened?", now I usually say "Show me.".
Nine times out of ten they'll attempt to do whatever it was they were doing and it will work perfectly. I assume they did something wrong the first time.
Re:In no particular order.... (Score:5, Interesting)
1. Adobe products and antivirus cause the most software problems, but you cannot live without either.
2. Most computer hardware problems are the result of sticky rolls, janitors cleaning, computers being accidently kicked, or power failures. In that order.
3. When calling HP or Dell about anything other than servers, you will get bad tech support.
4. Three year warranties on individual PCs do not matter. On a system with dozens of computers, they pay for themselves.
5. There will always be a lower price. Get over it.
6. Phones cannot fail. Five nines of reliability is not good enough.
7. Documented organization of the network and supplies will save you more time than the knowledge a thousand certifications brings (which isn't that much anyways).
8. Researching and backing up information before beginning a project is the sign of a professional. So is spelling.
9. Soft operating expenses are always more expensive than hard operating expenses.
10. When working on a project, document everything. It is almost never needed, but if your coworkers know you have it, they will not try to screw you.
Re:Diabetic Shock in 3, 2, 1... (Score:5, Interesting)
Do not befriend the users. Do not tell them what is actually going wrong. Never accept blame. Do not rush to complete requests.
Here are the reasons why:
If you befriend them, they will cease to be able to do the simplest thing without your help. This is fine if they're hot, but not if they're not.
If you tell them what is actually wrong, they will get it more wrong when they report it up the line, and you will be blamed for something. Instead tell the users something hugely general that will fit into that comfortable place in their minds.
If you accept blame, users will view this as a sign of weakness, and assign blame the next time, without waiting for you to volunteer.
If you rush to complete non-critical, non-it projects, users will use this as a performace benchmark, and you'll be forced to complete all of their projects first to avoid the appearance of slacking off, in the course of this you will have to ignore critical maintenance that can get you in real trouble later.
3 from me (Score:3, Interesting)
Power cord story (Score:3, Interesting)
I went through three power supplies before I discovered the fact that I actually had a power cable that was going bad.
I used to work for a company that developed a very highly customized package for our customers, put it on the *NIX of their choice, and installed it in their data centers. Although based in the US, one customer, whose site I was working on, was in Basingstoke, England.
The client was (and probably still is) a hard-core Big Blue shop, so the *NIX of choice was AIX, running on a two-piece RS6K machine. One piece was the server itself, and the other piece was an 8-disc SSA drive tower.
The drive tower had three power supplies, allegedly for redundancy, but these, in turn, were connected together via a three-way IEC Y cable. This then plugged into a normal IEC cable that then had the monster 13A plug they use in the UK on the other end. (If you haven't seen one of these, they're huge. If we used these in the US, we'd probably rate them for 50A).
The plug had a fuse in it.
I'll say that again, because this is important, but not something that you typically see outside the UK: The plug had a fuse in it.
After we hardware guys left the customer site, and left it in the capable hands of our software guys, we got a frantic call from the software guys that the discs had "just disappeared from the system".
To make a long story short (if it's not too late for that), the fuse in the plug had blown, thus killing power to all three power supplies, in turn killing power to the discs. Once we figured that out, we had our software guys get the customer's IT guy on the phone, he got out two more IEC to 13A cords and a fuse, and the problem was fixed in ten minutes plus reboot time. The Y cable was relegated to the scrap heap.
Re:Just to PO the "Don't post your list here" folk (Score:3, Interesting)
I'm very good at what I do, not even 5% of my peers are as good as I am (admittedly I work on the helpdesk so the bar isn't necessarily too high in some cases). I know my stuff in a lot of detail (I'm a geek) and am usually the most intelligent person in any room I'm in. These are plain simple facts and even my employer wouldn't deny them, I am however (despite the seeming arrogance of the preceding statements) willing to learn and depressingly aware that I don't know everything (I generally find the more I learn the more I realise I don't know). I treat users as human beings and enjoy the problem solving parts of my job. Ok, so repairing an oversized .pst for the nth time is less than fun but I usually get all the difficult stuff no-one else knows what to do with. Fortunately my employer recognises this and my pay slip is suitably well padded. Getting someone with my level of knowledge who actually enjoys helpdesk work is worth the extra shekels to them, it means the systems and comms teams can get on with taking things forward while I make sure the current setup keeps ticking over.
Most users are perfectly capable of firing up a command line and following instructions if they're given clearly and unambiguously. Obviously you want to keep it simple (ipconfig, set etc) but it's the quickest way to get their IP address (assuming you don't have central login histories built in to your call logging software or it's not working).
This one makes me shudder. Repairing the damage done by those who went before me and rebuilding the permission structures ("user in the global, global in the local", it's not rocket science for crying out loud!) once the directory structure is sane (and incidentally only allowing list access to the root file share) has eaten up more of my time than I want to even think about.
And don't forget that accurate backup reporting is just as critical. Finding out the backup has failed the last 2 weeks and the software didn't report it is not something you ever want to go through (fortunately we also do manual checks). This is a sore point with me, one of those head->wall things I don't want to talk about.
This is the core of my job. I have to balance network integrity and security with user needs, frequently the "obvious" (to the user) solution is not acceptable in some way or other (wireless for example is an absolute no go area on our network) so I have to work out one that is. I'm here to enable users to achieve their tasks and goals, not to get in the way.
See above, it just doesn't happen on anything connected to the core network.
#7 - No One Ever Got Fired For Buying Microsoft (Score:3, Interesting)
Contrast that with:"# 9 - Know Your Needs:
"This one could also be called 'Learn Linux.'...When you want a spam solution, before looking at $5,000 servers and huge licensing fees for Windows Server software take a look at one of those old 'junk' PCs you have in the closet, download your favorite distro of Linux, and install procmail and spamassassin. You (and your budget) will thank me later."
Ok...., so which is it?
Re:yes, but not the aura (Score:2, Interesting)
The first thing I do in every single problem is 'attempt to replicate it'. (You know that joke about the computer scientist and the brake failure? So true.)
I will admit that often times it's pointless, you technically should probably recheck your work and then try again, but it always amazes me when someone has a problem and then goes and involves someone else before trying it a few more times.
The next step is 'change a few minor things and try again'. Again, it always trips me up when 'The printer doesn't work' and no one's tried to reseat the cable, or turn it off and back on. I do that shit automatically.
The problem isn't people who think like this, it is school systems and offices where no one understands technology, and thus grants technology some sort of mystical 'Don't ever do anything unless you know exactly what you're doing' field.
These people get exposed to this attitude for a decade and they are scared to death to push any button they do not understand, even if it's obviously the right one. You've basically turned their problem solving ability off WRT to those things.
You sit them down in their car, and if it fails to start, they try again, and be able to tell you if it's a dead battery or no fuel. You hand them their cellphone on the wrong screen and they're sunk.
Most people on here have not been exposed to, or ignored when exposed to, that field. And thus we can do trivial things without even realizing it that solves this problem. Don't congratulate yourself too much, however, because a man from the 1500s could do basically the same thing once he understood the concept, just like I can figure out basic problems with a water pump...our problem solving ability is turn on.
As for why this field exists? The basic principle that people do not know how incompetant they are. Somewhere, at every institution, there really is someone who should not, under any circumstances, touch any computer in any way, because they will probably cause a nuclear meltdown. (I don't understand it! There wasn't any nuclear material in the truck!) At some point, they did touch one, and from them, everyone has learned to never touch a computer.
And this is why it is okay to kill incompetant people.
Also it's why you should never start drinking in the middle of a post.
Re:95% of all problems.... (Score:3, Interesting)