こんな感じのエラーメールが3つほどローカルのメールボックスに来てた。

Subject: SMART error (FailedReadSmartErrorLog) detected on host

Subject: SMART error (ErrorCount) detected on host

The following warning/error was logged by the smartd daemon:

Device: /dev/sda, ATA error count increased from 0 to 19

For details see host's SYSLOG (default: /var/log/syslog).

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.

Subject: SMART error (Health) detected on host

The following warning/error was logged by the smartd daemon:

Device: /dev/sda, FAILED SMART self-check. BACK UP DATA NOW!

For details see host's SYSLOG (default: /var/log/syslog).

どうやらHDDの調子が悪いらしく S.M.A.R.T (smartmontools smartctl) からのメールらしい。

このPCはハードディスクドライブ2台(/dev/sda, /dev/sdb)でミラーリング(RAID 1)していて、そのうちのひとつ(/dev/sda)がやばそう。

smartmontools の smartctl を使ってハードディスクの S.M.A.R.T. 情報を表示してみた (一部伏字)。


# smartctl -a /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10 family
Device Model:     ST3320620AS
Serial Number:    XXXXXXXX
Firmware Version: 3.AAK
User Capacity:    320,072,933,376 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed May 25 05:17:33 2011 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.
 
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 107) minutes.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   100   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       21
  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 7589
  7 Seek_Error_Rate         0x000f   084   054   030    Pre-fail  Always       -       18170249169
  9 Power_On_Hours          0x0032   066   066   000    Old_age   Always       -       30460
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       21
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       130
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   053   047   045    Old_age   Always       -       47 (Lifetime Min/Max 35/53)
194 Temperature_Celsius     0x0022   047   053   000    Old_age   Always       -       47 (0 27 0 0)
195 Hardware_ECC_Recovered  0x001a   051   044   000    Old_age   Always       -       86361790
197 Current_Pending_Sector  0x0012   078   077   000    Old_age   Always       -       464
198 Offline_Uncorrectable   0x0010   078   077   000    Old_age   Offline      -       464
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0
 
SMART Error Log Version: 1
ATA Error Count: 130 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
Error 130 occurred at disk power-on lifetime: 30459 hours (1269 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 3d c4 a5 e0  Error: UNC at LBA = 0x00a5c43d = 10863677
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 38 c4 a5 e0 00      06:58:05.038  READ DMA EXT
  27 00 00 00 00 00 e0 00      06:58:03.119  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      06:58:02.610  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      06:58:02.593  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      06:58:02.526  READ NATIVE MAX ADDRESS EXT
 
Error 129 occurred at disk power-on lifetime: 30459 hours (1269 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 3d c4 a5 e0  Error: UNC at LBA = 0x00a5c43d = 10863677
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 38 c4 a5 e0 00      06:58:00.633  READ DMA EXT
  25 00 08 30 c4 a5 e0 00      06:58:03.119  READ DMA EXT
  25 00 08 28 c4 a5 e0 00      06:58:02.610  READ DMA EXT
  25 00 08 20 c4 a5 e0 00      06:58:02.593  READ DMA EXT
  25 00 08 18 c4 a5 e0 00      06:58:02.526  READ DMA EXT
 
Error 128 occurred at disk power-on lifetime: 30459 hours (1269 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 3d c4 a5 e0  Error: UNC at LBA = 0x00a5c43d = 10863677
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 f8 08 c4 a5 e0 00      06:57:55.852  READ DMA EXT
  27 00 00 00 00 00 e0 00      06:57:53.566  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      06:57:53.509  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      06:57:53.508  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      06:57:58.322  READ NATIVE MAX ADDRESS EXT
 
Error 127 occurred at disk power-on lifetime: 30459 hours (1269 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 3d c4 a5 e0  Error: UNC at LBA = 0x00a5c43d = 10863677
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 f8 08 c4 a5 e0 00      06:57:55.852  READ DMA EXT
  27 00 00 00 00 00 e0 00      06:57:53.566  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      06:57:53.509  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      06:57:53.508  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      06:57:53.508  READ NATIVE MAX ADDRESS EXT
 
Error 126 occurred at disk power-on lifetime: 30459 hours (1269 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 3d c4 a5 e0  Error: UNC at LBA = 0x00a5c43d = 10863677
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 f8 08 c4 a5 e0 00      06:57:47.568  READ DMA EXT
  27 00 00 00 00 00 e0 00      06:57:53.566  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      06:57:53.509  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      06:57:53.508  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      06:57:53.508  READ NATIVE MAX ADDRESS EXT
 
SMART Self-test log structure revision number 1
 
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Vendor Specific SMART Attributes with Thresholds を見ると、HDDやばそうな感じ。

ID#             属性 ID
ATTRIBUTE_NAME  属性名
FLAG            ?
VALUE           現在の属性値(ファームウェアが 1~254 に最適化した値)
WORST           現在までで最も低い属性値
THRESH          しきい値(VALUE がこれを下回るとエラー)
TYPE            標準の製品寿命に達した(Old_age)/エラー発生間近(Pre-fail)
UPDATED         データの更新が常に可能(Always)/オフラインのみ(Offline)
WHEN_FAILED     エラーが発生した場合「FAILING_NOW」と表示
RAW_VALUE       最適化される前の値/ベンダー固有でフォーマットは規定されていない

smartctl の出力についてまとめてみた - BitWalker

特に重要な項目。

  1 Raw_Read_Error_Rate       データ読み込み時に発生したエラーの割合
  5 Reallocated_Sector_Ct     データを予備エリアに移動した不良セクタ数
196 Reallocation Event Count  セクタの代替処理が発生した回数
197 Current_Pending_Sector    現在異常があって代替処理を待つセクタ数
198 Offline_Uncorrectable     オフラインテストで発見された回復不可能なセクタ数

smartctl の出力についてまとめてみた - BitWalker

Wikipediaには他の項目についても説明されてる。

各検査項目(属性)には、「現在の値」(Value)、「閾値」(Threshold)、「ワースト値」(Worst)、そして「生の値」(data)の4つの項目が設定されており、現在の値またはワースト値が閾値を下回ることがあれば、データのバックアップやハードディスクの交換など必要な処置を施すべきであると考えられる。但し、これらの値がどのような方法によって算出されているかは各ベンダーによって異なるため、一概にどの値がどうなっていれば良いとは言い切れない点もある。また、Temperature(C2)やReallocated Sectors Count(05)などの「生の値」(data)が重要な項目も存在している。

以下はS.M.A.R.T.によって報告される主な検査項目の一覧である。ATA仕様では属性のIDが何を示すかは規定していないためこの表は基本的にすべてベンダ独自の意味を解釈しているにすぎないことを注意。特に重要な項目については太字で注釈をつけた。ただし、HDDベンダーによって調査可能な検査項目は若干異なるため、必ずしも全ての項目を調査できるわけではない。また、HDDベンダーが独自の検査項目を設定していたり、IDが異なっていたり、独自の名称を設定している場合もあるが、それらについてはここでは網羅していない。

Self-Monitoring, Analysis and Reporting Technology - Wikipedia

ついでに調べてみたけど /dev/sdb はエラー無しで元気。


# smartctl -a /dev/sdb
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10 family
Device Model:     ST3320620AS
Serial Number:    XXXXXXXX
Firmware Version: 3.AAK
User Capacity:    320,072,933,376 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed May 25 05:17:38 2011 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 115) minutes.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       16
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   090   060   030    Pre-fail  Always       -       985270603
  9 Power_On_Hours          0x0032   066   066   000    Old_age   Always       -       30409
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       16
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   052   046   045    Old_age   Always       -       48 (Lifetime Min/Max 37/54)
194 Temperature_Celsius     0x0022   048   054   000    Old_age   Always       -       48 (0 28 0 0)
195 Hardware_ECC_Recovered  0x001a   059   054   000    Old_age   Always       -       215794996
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
 
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ちなみにOS環境。


# uname -mrsv
Linux 2.6.26-2-amd64 #1 SMP Tue Jan 25 05:59:43 UTC 2011 x86_64

smartctl のバージョン。


# smartctl -V
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
smartctl comes with ABSOLUTELY NO WARRANTY. This
is free software, and you are welcome to redistribute it
under the terms of the GNU General Public License Version 2.
See http://www.gnu.org for further details.
 
CVS version IDs of files used to build this code are:
Module: atacmdnames.cpp  revision: 1.16  date: 2008/03/04
  uses: atacmdnames.h    revision: 1.6   date: 2008/03/04
Module: atacmds.cpp      revision: 1.190 date: 2008/03/04
  uses: atacmds.h        revision: 1.90  date: 2008/03/04
  uses: configure.in     revision: 1.135 date: 2008/03/10
  uses: extern.h         revision: 1.54  date: 2008/03/04
  uses: int64.h          revision: 1.17  date: 2008/03/04
  uses: scsiata.h        revision: 1.2   date: 2006/07/01
  uses: utility.h        revision: 1.51  date: 2008/03/04
Module: ataprint.cpp     revision: 1.185 date: 2008/03/04
  uses: atacmdnames.h    revision: 1.6   date: 2008/03/04
  uses: atacmds.h        revision: 1.90  date: 2008/03/04
  uses: ataprint.h       revision: 1.31  date: 2008/03/04
  uses: configure.in     revision: 1.135 date: 2008/03/10
  uses: extern.h         revision: 1.54  date: 2008/03/04
  uses: int64.h          revision: 1.17  date: 2008/03/04
  uses: knowndrives.h    revision: 1.18  date: 2008/03/04
  uses: smartctl.h       revision: 1.25  date: 2008/03/04
  uses: utility.h        revision: 1.51  date: 2008/03/04
Module: knowndrives.cpp  revision: 1.166 date: 2008/02/02
  uses: atacmds.h        revision: 1.90  date: 2008/03/04
  uses: ataprint.h       revision: 1.31  date: 2008/03/04
  uses: configure.in     revision: 1.135 date: 2008/03/10
  uses: extern.h         revision: 1.54  date: 2008/03/04
  uses: int64.h          revision: 1.17  date: 2008/03/04
  uses: knowndrives.h    revision: 1.18  date: 2008/03/04
  uses: utility.h        revision: 1.51  date: 2008/03/04
Module: os_linux.cpp     revision: 1.100 date: 2008/03/04
  uses: atacmds.h        revision: 1.90  date: 2008/03/04
  uses: configure.in     revision: 1.135 date: 2008/03/10
  uses: int64.h          revision: 1.17  date: 2008/03/04
  uses: os_linux.h       revision: 1.27  date: 2008/03/04
  uses: scsicmds.h       revision: 1.66  date: 2008/03/04
  uses: utility.h        revision: 1.51  date: 2008/03/04
Module: scsicmds.cpp     revision: 1.96  date: 2008/03/04
  uses: configure.in     revision: 1.135 date: 2008/03/10
  uses: extern.h         revision: 1.54  date: 2008/03/04
  uses: int64.h          revision: 1.17  date: 2008/03/04
  uses: scsicmds.h       revision: 1.66  date: 2008/03/04
  uses: utility.h        revision: 1.51  date: 2008/03/04
Module: scsiprint.cpp    revision: 1.121 date: 2008/03/04
  uses: configure.in     revision: 1.135 date: 2008/03/10
  uses: extern.h         revision: 1.54  date: 2008/03/04
  uses: int64.h          revision: 1.17  date: 2008/03/04
  uses: scsicmds.h       revision: 1.66  date: 2008/03/04
  uses: scsiprint.h      revision: 1.21  date: 2008/03/04
  uses: smartctl.h       revision: 1.25  date: 2008/03/04
  uses: utility.h        revision: 1.51  date: 2008/03/04
Module: smartctl.cpp     revision: 1.169 date: 2008/03/04
  uses: atacmds.h        revision: 1.90  date: 2008/03/04
  uses: ataprint.h       revision: 1.31  date: 2008/03/04
  uses: configure.in     revision: 1.135 date: 2008/03/10
  uses: extern.h         revision: 1.54  date: 2008/03/04
  uses: int64.h          revision: 1.17  date: 2008/03/04
  uses: knowndrives.h    revision: 1.18  date: 2008/03/04
  uses: scsicmds.h       revision: 1.66  date: 2008/03/04
  uses: scsiprint.h      revision: 1.21  date: 2008/03/04
  uses: smartctl.h       revision: 1.25  date: 2008/03/04
  uses: utility.h        revision: 1.51  date: 2008/03/04
Module: utility.cpp      revision: 1.65  date: 2008/03/04
  uses: configure.in     revision: 1.135 date: 2008/03/10
  uses: int64.h          revision: 1.17  date: 2008/03/04
  uses: utility.h        revision: 1.51  date: 2008/03/04
 
smartmontools release 5.38 dated 2008/03/10 at 10:44:07 GMT
smartmontools build host: x86_64-unknown-linux-gnu
smartmontools build configured: 2009/03/15 22:48:08 UTC
smartctl compile dated Mar 15 2009 at 22:48:32
smartmontools configure arguments:  '--prefix=/usr' '--sysconfdir=/etc' '--mandir=/usr/share/man' '--with-initscriptdir=/etc/init.d' '--with-docdir=/usr/share/doc/smartmontools' 'CXXFLAGS=-g -O2' 'LDFLAGS=' 'CPPFLAGS=' 'CFLAGS=-fsigned-char -Wall -O2'

smartmontools smartctl のヘルプ。


# smartctl --help
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Usage: smartctl [options] device
 
============================================ SHOW INFORMATION OPTIONS =====
 
  -h, --help, --usage
         Display this help and exit
 
  -V, --version, --copyright, --license
         Print license, copyright, and version information and exit
 
  -i, --info
         Show identity information for device
 
  -a, --all
         Show all SMART information for device
 
================================== SMARTCTL RUN-TIME BEHAVIOR OPTIONS =====
 
  -q TYPE, --quietmode=TYPE                                           (ATA)
         Set smartctl quiet mode to one of: errorsonly, silent, noserial
 
  -d TYPE, --device=TYPE
         Specify device type to one of: ata, scsi, marvell, sat, 3ware,N
 
  -T TYPE, --tolerance=TYPE                                           (ATA)
         Tolerance: normal, conservative, permissive, verypermissive
 
  -b TYPE, --badsum=TYPE                                              (ATA)
         Set action on bad checksum to one of: warn, exit, ignore
 
  -r TYPE, --report=TYPE
         Report transactions (see man page)
 
  -n MODE, --nocheck=MODE                                             (ATA)
         No check if: never, sleep, standby, idle (see man page)
 
============================== DEVICE FEATURE ENABLE/DISABLE COMMANDS =====
 
  -s VALUE, --smart=VALUE
        Enable/disable SMART on device (on/off)
 
  -o VALUE, --offlineauto=VALUE                                       (ATA)
        Enable/disable automatic offline testing on device (on/off)
 
  -S VALUE, --saveauto=VALUE                                          (ATA)
        Enable/disable Attribute autosave on device (on/off)
 
======================================= READ AND DISPLAY DATA OPTIONS =====
 
  -H, --health
        Show device SMART health status
 
  -c, --capabilities                                                  (ATA)
        Show device SMART capabilities
 
  -A, --attributes
        Show device SMART vendor-specific Attributes and values
 
  -l TYPE, --log=TYPE
        Show device log. TYPE: error, selftest, selective, directory,
                               background, scttemp[sts,hist]
 
  -v N,OPTION , --vendorattribute=N,OPTION                            (ATA)
        Set display OPTION for vendor Attribute N (see man page)
 
  -F TYPE, --firmwarebug=TYPE                                         (ATA)
        Use firmware bug workaround: none, samsung, samsung2,
                                     samsung3, swapid
 
  -P TYPE, --presets=TYPE                                             (ATA)
        Drive-specific presets: use, ignore, show, showall
 
============================================ DEVICE SELF-TEST OPTIONS =====
 
  -t TEST, --test=TEST
        Run test. TEST: offline short long conveyance select,M-N
                        pending,N afterselect,[on|off] scttempint,N[,p]
 
  -C, --captive
        Do test in captive mode (along with -t)
 
  -X, --abort
        Abort any non-captive test on device
 
=================================================== SMARTCTL EXAMPLES =====
 
  smartctl --all /dev/hda                    (Prints all SMART information)
 
  smartctl --smart=on --offlineauto=on --saveauto=on /dev/hda
                                              (Enables SMART on first disk)
 
  smartctl --test=long /dev/hda          (Executes extended disk self-test)
 
  smartctl --attributes --log=selftest --quietmode=errorsonly /dev/hda
                                      (Prints Self-Test & Attribute errors)
  smartctl --all --device=3ware,2 /dev/sda
  smartctl --all --device=3ware,2 /dev/twe0
  smartctl --all --device=3ware,2 /dev/twa0
          (Prints all SMART info for 3rd ATA disk on 3ware RAID controller)
  smartctl --all --device=hpt,1/1/3 /dev/sda
          (Prints all SMART info for the SATA disk attached to the 3rd PMPort
           of the 1st channel on the 1st HighPoint RAID controller)

Ref.
- smartctl の出力についてまとめてみた - BitWalker
- smartmontoolsで取得できるSMART情報一覧
- Self-Monitoring, Analysis and Reporting Technology - Wikipedia
- S.M.A.R.T. - Wikipedia, the free encyclopedia
- S.M.A.R.T. : ハードディスクの故障時期を推測するS.M.A.R.T.という技術 (SelfMonitoringAnalysisAndReportingTechnology - MemoWiki)
- ヅラッシュ! - [Linux]FailedReadSmartData
- ヅラッシュ! - SMART error (FailedReadSmartSelfTestLog)

追記(2011-05-26)

5/25時点。


ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 7589
197 Current_Pending_Sector  0x0012   078   077   000    Old_age   Always       -       464
198 Offline_Uncorrectable   0x0010   078   077   000    Old_age   Offline      -       464

5/26時点。


ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 18703
197 Current_Pending_Sector  0x0012   072   072   000    Old_age   Always       -       586
198 Offline_Uncorrectable   0x0010   072   072   000    Old_age   Offline      -       586

Reallocated_Sector_Ct の VALUE は変わってないけど(これよりもう下がりようがない…)、RAW_VALUEが7589から18703へと倍以上の値に。崩壊の序曲。

tags: debian hdd smart

Posted by NI-Lab. (@nilab)