primary drive failing... how to rescue system without a backup

  1. Monday morning... email from research scientist... the electromagnetic analysis software license server is down...
  2. log into the system... ps -ef | grep license-server-process ... nothing...
  3. dmesg | more... i/o error..., try ls -al... command not found..., (not good)... init 6... command not found... (not good worse...)
  4. press power button... wait for the electrons to settle to the bottom of the capacitors... press power again
  5. system boots up... check license process... happiness... check dmesg (system log files)
  6. find an odd entry... search for the string... find it is an indication of a failing drive, learn of the new command below
  7. try the command: --> sudo /usr/sbin/smartctl -i /dev/sda and for /dev/sdb... details presented as shown below
  8. --> sudo /usr/sbin/smartctl -H /dev/sda smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === Log Sense failed, IE page [scsi response fails sanity test] (NOT GOOD... again)
  9. about this time commands stop working, i/o errors... been here, done that, pull the power cord, yank the drive, toss it in the freezer
  10. pull disk from freezer, find another drive of the same size, connect both to the SATA power and data
  11. insert Gparted drive (see http://johnmeister.com/linux/Notes/Gparted-for-Recovery/ALL.html
  12. instead of using the GUI tool to rework partions, open a terminal window, change colors (black text on yellow), type sudo su -
  13. type fdisk -l - note that the failing drive (or so I thought) was /dev/sda, new drive, /dev/sdb
  14. in the terminal window I type; dd if=/dev/sda of=/dev/sdb ; place a fan aimed at the drives with the side cover off...
  15. I watch the blinking light showing disk activity... email users, and let it run... overnight
  16. come in, it's done, execute fdisk /dev/sdb... minor error... type "w" to write, it prints... type fdisk -l, both drives read the same
  17. success, cloned the 500GB drive perfectly, file systems match, mount to the magical "a" mount:, mkdir a ; mount /dev/sdb1 a
  18. cd a, ls -al... all good... about this time I notice that there's no swap... wait... the 750GB drive was the primary... uh oh...
  19. I look at the 750GB drive, look in cabinets, drawers, etc... no happiness... the failing drive was the 750, not the 500!!! (too rushed, or senior moment?)
  20. at this point I also realize that the workstation is obsolete, and this is the 3rd or 4th time I've rescued it...
  21. problem is this device provides samba, web and license support (tied to hardware!)... the OS is old... but it works... PLAN B
  22. I realize that the 750GB drive is failing, fading fast... so... the 750GB goes into the freezer, this time...
  23. I remove Gparted from the drive, unplug the old workstation and raid another device at my disposal that was wasted by using Microsoft on it
  24. I set it up, install Gparted, bring it up and check BiOS... then get into Gparted and NUKE the drive... it was encrypted.
  25. I create a new partition using ext3, apply it, and eject GParted and install the SuSE 13.2 DVD, reboot to DVD
  26. install 13.2, configure with the failing system's IP, etc. THEN pull the failing 750GB drive out of the freezer, power down the system and connecxt.
  27. I bring up the new system, do some basic config stuff so I can use the system... sudo su -
  28. make sure the system as the correct IP, gateway, network, netmask, hostname
  29. as luser, update /home/luser/.ssh and .bashrc and .History... then sudo su -
  30. cd to the primary user, mkdir OLD-SYSTEM-FILES,and TMP... mount the drive, mount /dev/sdb1 /home/luser/TMP
  31. then begin to look at system files in TMP and cp -rp directories such as etc, var/log, home/luser, srv/www, and so on to OLD-SYSTEM-FILES
  32. all the while watching the condition of the 750GB failing drive... making sure the /etc gets copied with the primary user...
  33. it's clear a few files are corrupted... ls -al shows up ??? in a few places...
  34. once the key original /etc files are present start backing up the "default" files, mv /etc/apache2 /etc/apache2-original-2015-06-16
  35. copy /home/luser/OLD-SYSTEM-FILES/etc/apache2 /etc (as superuser of course)... verify that the config files are correct.
  36. try to start apache2... /etc/init.d/apache2 configcheck ; /etc/init.d/apache2 start ; ps -ef | grep http
  37. repeat process for NFS/Samba and other tools...
  38. verify that samba, web server is working... if something starts going sideways, check the gateway again...
  39. after validating that the web server works, cgi-bin scripts work and that the system is on the network, email users, and head home and type notes.
  40. to bring up the license server from the failing 750gb drive the plan will be to copy directly from the 750GB disk to a 500Gb, excluding luser files.

smartctl info

SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 90 to 87 http://www.linuxjournal.com/content/know-when-your-drives-are-failing-smartd Use smartctl -X to abort test. --> /usr/sbin/smartctl -i /dev/sda smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org Smartctl open device: /dev/sda failed: Permission denied ------------------------------------------------ --> sudo /usr/sbin/smartctl -i /dev/sda smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Blue (SATA) Device Model: WDC WD7500AAKS-00RBA0 Serial Number: WD-WCAPT0448703 LU WWN Device Id: 5 0014ee 255de83fa Firmware Version: 30.04G30 User Capacity: 750,156,374,016 bytes [750 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA/ATAPI-7 (minor revision not indicated) Local Time is: Mon Jun 15 13:52:23 2015 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled ------------------------------------------------ --> sudo /usr/sbin/smartctl -i /dev/sdb smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.9 Device Model: ST3500641AS Serial Number: 3PM1VB7X Firmware Version: 3.ADG User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA/ATAPI-7 (minor revision not indicated) Local Time is: Mon Jun 15 13:53:10 2015 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled ------------------------------------------------ --> sudo /usr/sbin/smartctl -H /dev/sda smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED ------------------------------------------------ --> sudo /usr/sbin/smartctl -H /dev/sdb smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED --> sudo smartctl -t short /dev/sda sudo: smartctl: command not found ------------------------------------------------ luser@linuxBox [/home/luser/CONFIG] ------------------------------------------------ --> sudo /usr/sbin/smartctl -t short /dev/sda smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Mon Jun 15 14:00:49 2015 Use smartctl -X to abort test. ------------------------------------------------ luser@linuxBox [/home/luser/CONFIG] ------------------------------------------------ --> sudo /usr/sbin/smartctl -t short /dev/sdb smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Mon Jun 15 14:01:02 2015 Use smartctl -X to abort test. ------------------------------------------------ --> sudo /usr/sbin/smartctl -H /dev/sda smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === Log Sense failed, IE page [scsi response fails sanity test] ------------------------------------------------ --> sudo /usr/sbin/smartctl -H /dev/sdb smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED ------------------------------------------------ --> sudo /usr/sbin/smartctl -t long /dev/sda smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org Extended Background Self Test has begun scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 Use smartctl -X to abort test ------------------------------------------------------------------------- ------------------------------------------------ --> sudo /usr/sbin/smartctl --help smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.10-29-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org Usage: smartctl [options] device ============================================ SHOW INFORMATION OPTIONS ===== -h, --help, --usage Display this help and exit -V, --version, --copyright, --license Print license, copyright, and version information and exit -i, --info Show identity information for device --identify[=[w][nvb]] Show words and bits from IDENTIFY DEVICE data (ATA) -g NAME, --get=NAME Get device setting: all, aam, apm, lookahead, security, wcache, rcache, wcreorder -a, --all Show all SMART information for device -x, --xall Show all information for device --scan Scan for devices --scan-open Scan for devices and try to open each device ================================== SMARTCTL RUN-TIME BEHAVIOR OPTIONS ===== -q TYPE, --quietmode=TYPE (ATA) Set smartctl quiet mode to one of: errorsonly, silent, noserial -d TYPE, --device=TYPE Specify device type to one of: ata, scsi, sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbsunplus, marvell, areca,N/E, 3ware,N, hpt,L/M/N, megaraid,N, cciss,N, auto, test -T TYPE, --tolerance=TYPE (ATA) Tolerance: normal, conservative, permissive, verypermissive -b TYPE, --badsum=TYPE (ATA) Set action on bad checksum to one of: warn, exit, ignore -r TYPE, --report=TYPE Report transactions (see man page) -n MODE, --nocheck=MODE (ATA) No check if: never, sleep, standby, idle (see man page) ============================== DEVICE FEATURE ENABLE/DISABLE COMMANDS ===== -s VALUE, --smart=VALUE Enable/disable SMART on device (on/off) -o VALUE, --offlineauto=VALUE (ATA) Enable/disable automatic offline testing on device (on/off) -S VALUE, --saveauto=VALUE (ATA) Enable/disable Attribute autosave on device (on/off) -s NAME[,VALUE], --set=NAME[,VALUE] Enable/disable/change device setting: aam,[N|off], apm,[N|off], lookahead,[on|off], security-freeze, standby,[N|off|now], wcache,[on|off], rcache,[on|off], wcreorder,[on|off] ======================================= READ AND DISPLAY DATA OPTIONS ===== -H, --health Show device SMART health status -c, --capabilities (ATA) Show device SMART capabilities -A, --attributes Show device SMART vendor-specific Attributes and values -f FORMAT, --format=FORMAT (ATA) Set output format for attributes: old, brief, hex[,id|val] -l TYPE, --log=TYPE Show device log. TYPE: error, selftest, selective, directory[,g|s], xerror[,N][,error], xselftest[,N][,selftest], background, sasphy[,reset], sataphy[,reset], scttemp[sts,hist], scttempint,N[,p], scterc[,N,M], devstat[,N], ssd, gplog,N[,RANGE], smartlog,N[,RANGE] -v N,OPTION , --vendorattribute=N,OPTION (ATA) Set display OPTION for vendor Attribute N (see man page) -F TYPE, --firmwarebug=TYPE (ATA) Use firmware bug workaround: none, nologdir, samsung, samsung2, samsung3, xerrorlba, swapid -P TYPE, --presets=TYPE (ATA) Drive-specific presets: use, ignore, show, showall -B [+]FILE, --drivedb=[+]FILE (ATA) Read and replace [add] drive database from FILE [default is +/etc/smart_drivedb.h and then /usr/share/smartmontools/drivedb.h] ============================================ DEVICE SELF-TEST OPTIONS ===== -t TEST, --test=TEST Run test. TEST: offline, short, long, conveyance, force, vendor,N, select,M-N, pending,N, afterselect,[on|off] -C, --captive Do test in captive mode (along with -t) -X, --abort Abort any non-captive test on device =================================================== SMARTCTL EXAMPLES ===== smartctl --all /dev/hda (Prints all SMART information) smartctl --smart=on --offlineauto=on --saveauto=on /dev/hda (Enables SMART on first disk) smartctl --test=long /dev/hda (Executes extended disk self-test) smartctl --attributes --log=selftest --quietmode=errorsonly /dev/hda (Prints Self-Test & Attribute errors) smartctl --all --device=3ware,2 /dev/sda smartctl --all --device=3ware,2 /dev/twe0 smartctl --all --device=3ware,2 /dev/twa0 smartctl --all --device=3ware,2 /dev/twl0 (Prints all SMART info for 3rd ATA disk on 3ware RAID controller) smartctl --all --device=hpt,1/1/3 /dev/sda (Prints all SMART info for the SATA disk attached to the 3rd PMPort of the 1st channel on the 1st HighPoint RAID controller) smartctl --all --device=areca,3/1 /dev/sg2 (Prints all SMART info for 3rd ATA disk of the 1st enclosure on Areca RAID controller) ---------------------- OOPS… ------------------------------------------------ --> ps -ef | grep smart -bash: /usr/bin/ps: Input/output error THE SYSTEM WAS FAILING DURING THE TEST!!!
note: when cutting and pasting the info below, for some reason extra blank lines were added, to get rid of them the following steps were taken:
  1. >esc< :set nu
  2. >esc< :13,542g/^$/d (266 fewer lines)
hard drive reliability by manufacturerStatistics Based on 49,056 Hard Drives

SEARCH and Navigation TOOL
Google     select a domain to search or visit.
(use back key to return )

johnmeister.com/jeep/sj

FULL SIZE JEEPS
JeepMeister
"Jeep is America's
only real sports car."
-Enzo Ferrari
JohnMeister.com- fotos LinuxMeister- CS
MeisterTech- Diesels FotoMeister.us- fotos
BibleTech- Bible Overview search the the internet
Everett weather - Seattle traffic - pollen count -
NEWS: BBC: Middle East - Israel - Spiegel
NASB/KJV/ES/D - SE Asian Missions - jihad - persecution info
e-books by john:

AMSOIL product guide,
AMSOIL web, or 1-800-956-5695
use customer #283461

Amsoil dealer since 1983

CAMERAS: Nikon Lumix Canon DSLRs Lenses
Computers: Toshiba Toughbook Apple Dell
BOOKS: Auto Repair Diesels BioDiesel
PARTS: Wagoneer J-truck Benz VW
books and computers


SJ - 1962-1991

XJ - 1984-2001

WJ - 1999-2004

KJ - 2002-2007

WK - 2005-2010

Find the recommended
AMSOIL synthetics
for your Jeep

CJ-10A - 1984-1986

Jeepsters

MJ - 1984-1992

Willys - 1946-1965

Other Jeeps (FC)