 |
≫ |
|
|
 |
パッチ名: PHKL_34815
パッチ摘要: s700_800 11.11 SCSI IO累積パッチ
作成日: 07/03/14
公開日: 07/04/16
ハードウェアプラットフォームおよびOSリリース:
s700: 11.11
s800: 11.11
現象:
PHKL_34815:
(SR:8606469200 CR:JAGag24419)
1. Oracle Consistency Group内に構成されたデバイスに対する書き込みのト
リップ時に、EMC symmetrixのOracle Consistency Group機能は、SCSI
Illegal Requestエラー(SCSIセンスキー/コード/修飾子:0x05/0x20/0x00)
を返します。ところが、このエラーを受け取ると、SCSIサービスが上位層
にEPOWERFを返すため、Consistency Groupがトリップせずに、このエラー
がソフトウェアのエラーとみなされます。
2. SPC-2準拠デバイスがOracle Checksumエラーを返すと、SCSIサービスが不
正に、EINVALではなくEPOWERFを上位層に返します。そのため、EMCの
PowerpathがそのSPC-2デバイスの他のパスでI/O要求を再試行します。
問題点の説明:
PHKL_34815:
(SR:8606469200 CR:JAGag24419)
1. SCSIサービスは、Oracle Consistency Groupエラーを他のエラーと区別し
て処理していませんでした。そのため、EPOWERFを返していました。
解決方法:
Oracle Consistency Groupエラー(センスキー/コード/修飾子)を他のエラ
ーと区別して処理するようにSCSIサービスを修正しました。これで、この
エラーを受け取ると、SCSIサービスは、EPOWERFではなくEINVALを上位層に
返します。
2. Oracle Checksumエラーが返された場合、SCSIサービスはANSI-2デバイスと
ANSI-3デバイスしかチェックしていませんでした。ところが、SPC-2デバイ
スはANSI-4デバイスなので、結果的に、SPC-2デバイスのOracle Checksum
エラーが検出されませんでした。
解決方法:
ANSIフィールドが1より大きいすべてのデバイスのOracle Checksumエラー
を検出して、上位層にEINVALを返すようにSCSIサービスを修正しました。
-----------------------------------------------------------------------------
Patch Name: PHKL_34815
Patch Description: s700_800 11.11 SCSI IO Cumulative Patch
Creation Date: 07/03/14
Post Date: 07/04/16
Hardware Platforms - OS Releases:
s700: 11.11
s800: 11.11
Products: N/A
Filesets:
OS-Core.ADMN-ENG-A-MAN,fr=B.11.11,fa=HP-UX_B.11.11_32/64,v=HP
ProgSupport.C-INC,fr=B.11.11,fa=HP-UX_B.11.11_32/64,v=HP
OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_32,v=HP
OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_64,v=HP
Automatic Reboot?: Yes
Status: General Release
Critical:
No (superseded patches were critical)
PHKL_34187: OTHER
Performance degradation
PHKL_33371: HANG
PHKL_32090: HANG PANIC
PHKL_31134: HANG
PHKL_30510: PANIC
PHKL_29365: OTHER PANIC HANG
JAGae77351: The SCSI c720 driver fails to resume
after repeated suspend and resume operations.
The card may show as UNCLAIMED on an ioscan(1M)
due to this failure.
PHKL_29047: PANIC
PHKL_29039: MEMORY_LEAK HANG
PHKL_28513: PANIC CORRUPTION
PHKL_28096: HANG
PHKL_27579: PANIC HANG
PHKL_27563: HANG
PHKL_26519: PANIC HANG
PHKL_25896: ABORT HANG
PHKL_25509: HANG
PHKL_25165: OTHER PANIC HANG
Boot and ioscan time is improved on large
configurations.
Enhancements are made to support c8xx ioctl's.
PHKL_23313: PANIC HANG CORRUPTION MEMORY_LEAK
PHKL_24441: PANIC HANG
Category Tags:
defect_repair hardware_enablement enhancement
general_release critical panic halts_system corruption
memory_leak
Path Name: /hp-ux_patches/s700_800/11.X/PHKL_34815
Symptoms:
PHKL_34815:
( SR:8606469200 CR:JAGag24419 )
1. When writes to device configured into Oracle consistency
group trips, Oracle Consistency Group functionality of
EMC symmetrix returns a SCSI Illegal Request error with
SCSI sense key/code/qualifier of 0x05/0x20/0x00. On
receiving this error, SCSI services returns EPOWERF to
upper layer which causes Consistency Group not to trip,
and this was treated as a failure of the software.
2. SCSI services returns EPOWERF instead of EINVAL to the
upper layers when SPC-2 compliant devices returns Oracle
Checksum error. This causes EMC's Powerpath to retry the
I/O requests on other paths to these SPC-2 devices.
PHKL_34187:
( SR:8606386083 CR:JAGaf46237 )
EMC configuration utility, "symcfg", appears unresponsive.
PHKL_33371:
( SR:8606382044 CR:JAGaf42255 )
System hangs due to SCSI card failure.
PHKL_32090:
( SR:8606268500 CR:JAGae32738 )
Allocating memory in kernel with WAIT OK flag
on interrupt context may cause Panic.
( SR:8606352805 CR:JAGaf13603 )
System panics with the following stack trace:
Lock held - 0xe000000155f94500X
panic: wait_for_lock: Already owns lock: SCSI LUN
Stack Trace:
wait_for_lock+0x300
spinlock_wait+0x30
scsi_iodone+0x190
scsi_handle_q_full+0x750
biodone+0x180
bpcheck+0x230
sf_strategy+0x100
scsi_fast_cbfn+0x990
ioforw_sched+0x440
scsi_strategy+0x2b0
physio+0x6a0
sf_read+0x160
scsi_read+0x370
sflop_read+0x30
spec_rdwr+0x4a0
vno_rw+0x1e0
read+0x2b0
syscall+0x930
End of Stack Trace
( SR:8606363788 CR:JAGaf24443 )
An incorrect value in the cdb length of the sioc_io
structure for SIOC_IO ioctl call can cause Buffer
overrun.
( SR:8606358449 CR:JAGaf19148 )
XP and VA devices on PA systems are left in open state.
( SR:8606368292 CR:JAGaf28854 )
In a mirrored disk environment on rp8400/rp7410
systems, if one of the internal disks fails and
is hot-swapped, the alternate disk does not
spin up.
( SR:8606368809 CR:JAGaf29371 )
I/O hang due to a pending I/O request in the lun disk queue.
( SR:8606376917 CR:JAGaf37195 )
When the hot swap disk in rp7410/rp8400/8420 is
configured as mirror LVM disk and the disk is
pulled out and put back in, the disk still appears
as unavailable in vgdisplay(1M) output.
( SR:8606374079 CR:JAGaf34466 )
Connectivity of devices to HPUX is impacted when
some targets return BUSY on inquiry.
PHKL_31134:
( SR:8606364148 CR:JAGaf24802 )
I/Os from a volume do not fail over to the mirror/alternate
volume and hang due to continuous SCSI bus resets.
PHKL_30510:
( SR:8606349130 CR:JAGaf09949 )
If an rp8400 or rp7410 system with mirrored VxVM root disk
configuration is powered off and powered on again, then
after booting up from the primary disk, the secondary disk
is shown in the "failed" state:
# vxdisk -g rootdg list
DEVICE TYPE DISK GROUP STATUS
c0t6d0 simple rootdisk01 rootdg online
- - rootmirror rootdg failed was:c2t6d0
( SR:8606342351 CR:JAGaf03258 )
When an FC array is in shutdown state, the system panics
with the following stack trace:
panic+0x6c
report_trap_or_int_and_panic+0x94
interrupt+0x208
$ihndlr_rtn+0x0
sd_retry_after_start_done+0x68
scsi_iodone+0x278
scsi_cbfn+0x520
fcparray_scsi_comp+0x1c8
td_io_done+0x44
td_cdb_cbfn_start+0xac
td_isr+0x2d4
sapic_interrupt+0x2c
mp_ext_interrupt+0x2b0
ivti_patch_to_nop3+0x0
falloc+0x148
( SR:8606344298 CR:JAGaf05149 )
open(2) takes a long time for SPC-2 compliant devices.
( SR:8606351535 CR:JAGaf12340 )
open(2) on the device special file corresponding to the
SCSI initiator id of the card fails with errno set to
ENXIO.
PHKL_29365:
( SR:8606185203 CR:JAGad54405 )
System panics with a Data Page Fault when a read command is
issued on a SCSI pass through driver and the read failed
with a check condition on a deferred error.
panic+0x60
report_trap_or_int_and_panic+0x5c
interrupt+0x238
$ihndlr_rtn+0x0
lbcopy_gr_method+0x12c
privlbcopy+0x1c
scsi_fix_alignment_done+0x58
scsi_iodone+0x170
scsi_cbfn+0x4f4
scsi_fast_cbfn+0x190
c720_call_cbfns+0x60
c720_isr+0x1d0
dino_isr+0xcc
inttr_emulate_save_fpu+0x100
spinunlock+0x2c
vx_mapdbd+0xec
vm_no_io_required+0x104
vx_do_pagein+0x13c
vx_pagein+0xd8
virtual_fault+0x194
vfault+0x120
thandler+0xae0
( SR:8606295123 CR:JAGae58817 )
System panics with the following stack trace when an
asynchronous write request fails.
panic+0x60
assfail+0x30
scsi_iodone_error+0x2b0
scsi_iodone+0x4dc
scsi_cbfn+0x888
scsi_fast_cbfn+0x23c
c720_call_cbfns+0xa4
c720_invalid_req_done+0x180
invoke_callouts_for_self+0x1b8
sw_service+0xe8
inttr_emulate_save_fpu+0x100
prevpc+0x78
spinlock+0x50
spinlockx+0x40
breassignbuf+0x68
bdwrite+0x154
vx_bdwrite+0x88
vx_fbwrite+0x5c
vx_write_default+0x288
vx_write1+0xc84
vx_rdwr+0x164
vno_rw+0xb0
4_74e1_cl_rwuio+0x1e8
write+0x78
syscall+0x700
$syscallrtn+0x0
( SR:8606314587 CR:JAGae77351 )
The SCSI c720 driver fails to resume after repeated suspend
and resume operations. The syslog file contains the
following messages :
C720_init: c720_map failed for request sense buf
( SR:8606299275 CR:JAGae62769 )
The logging subsystem does not display the recovered error
events from disks.
( SR:8606322906 CR:JAGae85372 )
When an external single-ended wide SCSI disk is connected
to the narrow 50 pin SCSI connector on a C3750 model
workstation, the following unexpected behavior may be
observed:
- Hang during I/O request to disk
- Disk not accessible
- diskinfo(1M) showing incorrect output
- ioscan(1M) showing invalid description strings
( SR:8606333146 CR:JAGae94241 )
During device open, unnecessary logs appear in the syslog
file.
( SR:8606335728 CR:JAGae96782 )
System Performance monitoring tools such as glance(1)/gpm(1)
show incorrect values for byte-count disk metrics.
( SR:8606338809 CR:JAGae99756 )
LUNs sometimes are not visible during ioscan(1M).
PHKL_29047:
( SR:8606304724 CR:JAGae68058 )
I/O requests to disk device is slow and sar(1M) shows
large avwait and avque values.
( SR:8606266268 CR:JAGae30517 )
The system panics with the following stack trace,
...
...
scsi_frequency+0x1ac
scsi_ioctl+0x1344
sdisk_ioctl+0x1c
spec_ioctl+0x168
vno_ioctl+0x88
ioctl+0x108
syscall+0x1bc
$syscallrtn+0x0
...
...
( SR:8606304019 CR:JAGae67368 )
SIOC get/set ioctls will not function properly with the
SCSI interface card supporting Ultra320 speed.
( SR:8606312429 CR:JAGae75245 )
During the device open, if the Test Unit Ready (TUR)
command results in "Reservation Conflict", then after
the device open, reads and writes to the device will
fail with EINVAL.
PHKL_29039:
( SR:8606244397 CR:JAGae10884 )
When an error is encountered in the bus or target
open routines, the system exhausts its memory due
to memory leak.
( SR:8606290867 CR:JAGae54710 )
Incorrect block device activity data is reported
by sar(1M) when the disks have failed operations.
( SR:8606267129 CR:JAGae31372 )
When I/O errors happen, the syslog file
shows negative values for the buffer
pointer fields.
( SR:8606266450 CR:JAGae30698 )
Default SCSI diagnostic logging is severely limited by
PHKL_25165.
( SR:8606160884 CR:JAGad30203 )
Devices belonging to class SCSI_PROCESSOR
are reported as UNCLAIMED by ioscan(1M).
( SR:8606298657 CR:JAGae62156 )
Application hang due to error returned by the SCSI
driver.
( SR:8606307922 CR:JAGae70957 )
I/O request failing with 'Reservation Conflict' are retried
forever resulting in application hang.
PHKL_28513:
( SR:8606226043 CR:JAGad95114 )
Data integrity issues or HPMC with Channel B of A5159A and
Core I/O FWD SCSI HBA on the following systems:
rp24xx (A-class), rp54xx (L-class), rp7400 (N-class).
Description field in ioscan output for affected Core I/O
FWD SCSI cards will contain string 'C87x'.
( SR:8606286272 CR:JAGae50215 )
SCSI controllers with 896 chip(revision 4) under certain
circumstances may send wrong data on the SCSI bus after
a bus reset.
( SR:8606289589 CR:JAGae53519 )
Panic in SCSI stack with the following trace:
panic+0x14
wait_for_lock+0x2cc
call_wait_for_lock+0x20
scsi_start+0x50
scsi_free_scb+0xac
scsi_strategy_real+0xcd4
ioforw_sched+0xa4
scsi_cmd+0x3a4
scsi_probe+0x444
parallel_scsi_probe+0x1b4
wsio_probe+0xe0
wsio_find_it+0x34
wsio_scan+0x70
gio_scan_subtree+0x188
gio_scan_subtree+0x1c4
gio_scan_subtree+0x1c4
gio_scan_subtree+0x1c4
gio_scan_subtree+0x1c4
gio_scan_subtree+0x1c4
io_scan+0x9c
do_io_scan+0x48
dev_config_ioctl+0xd8
spubind_cdev_ioctl+0x94
spec_ioctl+0xac
vno_ioctl+0x90
ioctl+0x1f4
syscall+0x28c
$syscallrtn+0x0
PHKL_28096:
( SR:8606271035 CR:JAGae35271 )
In a mirrored disk environment on rp8400/rp7410
systems, if one of the internal disks fails and
is hot-swapped, the alternate disk does not
spin up, thus resulting in an application hang.
PHKL_27579:
( SR:8606245156 CR:JAGae11630 )
The system will HPMC when a bus reset occurs
on an A5838A SCSI HBA.
( SR:8606242143 CR:JAGae09397 )
The system may experience intermittent bus
hangs followed by resets on the ports of the
A5159A card and Core I/O FWD SCSI HBA
on the following systems: rp24xx (A-class),
rp54xx (L-class), rp7400 (N-class), rp8400,
when connected to a disk enclosure.
( SR:8606241873 CR:JAGae09130 )
The ioscan may hang and on the following reboot
the system panics with a stack trace that is
not consistent.
( SR:8606216118 CR:JAGad85288 )
When the SCSI bus is being opened, if an
interrupt is serviced at the same time, the
system panics with the following stack trace:
panic+0x14
report_trap_or_int_and_panic+0x84
interrupt+0x1d4
$ihndlr_rtn+0x0
c720_isr+0x890
sapic_interrupt+0x2c
mp_ext_interrupt+0x318
ivti_patch_to_nop3+0x0
bz_pre_sl_loop+0x4
c720_if_bus_open+0x318
scsi_lun_open+0x12d4
sctl_open+0x24
scsi_probe+0x370
parallel_scsi_probe+0x1a8
wsio_probe+0xe0
wsio_find_it+0x34
wsio_scan+0x70
gio_scan_subtree+0x188
gio_scan_subtree+0x1c4
gio_scan_subtree+0x1c4
io_scan+0x9c
do_io_scan+0x48
dev_config_ioctl+0xd8
spubind_cdev_ioctl+0x94
spec_ioctl+0xac
vno_ioctl+0x90
ioctl+0x1f4
syscall+0x480
$syscallrtn+0x0
( SR:8606257328 CR:JAGae21633 )
Application may hang after an OLAR card resume
operation if the card has been suspended while
a SCSI bus reset was in progress.
( SR:8606204859 CR:JAGad74037 )
SCSI driver can not communicate with the target
(nCipher encryption device) that initiates speed
and width negotiation. This results in parity
errors on the SCSI bus and as a result SCSI bus
resets.
( SR:8606232873 CR:JAGae02101 )
I/O errors may occur when attempting to do more
than one backup on tape.
( SR:8606238711 CR:JAGae07734 )
LVM is not switching to an alternate path
due to an error returned by the SCSI driver.
( SR:8606264850 CR:JAGae29181 )
open() on CDROM drive takes too long when no
CD is present.
PHKL_27563:
( SR:8606265990 CR:JAGae30243 )
I/O hang due to a pending I/O request in the lun disk
queue.
PHKL_26519:
( SR:8606236118 CR:JAGae05183 )
When an LVM I/O request to a SCSI device fails or times out,
any subsequent failed I/O requests to same LUN are returned
with error without being retried by the SCSI disk driver.
( SR:8606226361 CR:JAGad95431 )
Applications may hang due to incorrect SCSI error handling
introduced in patch PHKL_24441.
( SR:8606135832 CR:JAGad04964 )
Enhancement: This product update enables the support for 16
byte CDBs (Command Descriptor Block) in the SCSI driver.
( SR:8606214047 CR:JAGad83238 )
The system may experience a HPMC when a SCSI adapter is
suspended through use of the OLA/R functions accessible
through rad(1M) and sam(1M).
( SR:8606236116 CR:JAGae05181 )
When PHKL_24441 is installed and LVM is trying to switch
from the primary path to an alternate path, the SCSI
subsystem may report false read errors to LVM.
PHKL_25896:
( SR:8606228002 CR:JAGad97060 )
Some disks report a capacity of zero bytes at cold boot or
cold install. This causes the initial boot or install after
a cold start to fail.
PHKL_25509:
( SR:8606203627 CR:JAGad72800 )
There is no way to turn off c720 interface driver
vmunix: scb->cdb: %x %x %x %x %x %x
messages in syslog.log, while other related messages can be
suppressed.
( SR:8606186960 CR:JAGad56170 )
A high number of
SCSI: asense data-done -- lbolt %d, dev: %x, tag: %x
messages may be logged in syslog.log when using Plasmon
optical drives.
( SR:8606201476 CR:JAGad70652 )
The following informative message on the console and in
syslog.log causes unnecessarily alarm by customers:
SCSI: Attempt to access partially open device -- dev: %x
( SR:8606199892 CR:JAGad69078 )
On HP SureStore E Disk Array 12 (A3586A), processes can hang
with the following message in the syslog.log:
Device violation of Contingent Allegiance
( SR:8606194472 CR:JAGad63680 )
On workstation model C3700, the external narrow SCSI bus is
setup incorrectly. The 'diskinfo' command returns invalid
information and I/O's on this bus hang.
( SR:8606177456 CR:JAGad46688 )
It takes an unreasonable amount of time to import Disk
Groups with the VxVM volume manager on a FC60 array.
PHKL_25165:
( SR:8606207855 CR:JAGad77032 )
The ioctl system call returns invalid values if called with
SIOC_GET_TGT_LIMITS or SIOC_GET_TGT_PARMS parameters for a
SCSI device controlled by the c8xx driver.
( SR:8606170140 CR:JAGad39404 )
Well functioning systems with Fibre Channel devices
generate an excessive number of logs. This causes log files
cluttering, diag2 daemon overrun or /var filesystem free
space exhaustion.
( SR:8606172682 CR:JAGad41942 )
With the per lun queue depth feature, queue depth
modification on a lun that does not support tag queueing is
not rejected, even though the queue depth on such a device
cannot be modified.
Queue depth can be changed only once on devices supporting
tag queueing.
( SR:8606166652 CR:JAGad35939 )
If an application uses the sctl/ioctl passthrough interface
with the read/write data size mismatching the buffer size,
the system experiences a Data Page Fault panic with the
following routines on the stack:
panic+0x14
report_trap_or_int_and_panic+0x4c
interrupt+0x1e8
$ihndlr_rtn+0x0
lbcopy_pcxu_method+0xc
privlbcopy+0x1c
( SR:8606192639 CR:JAGad61851 )
The system boot time and ioscan command (without -k option)
time are too long, especially in a system with a large Fibre
Channel Array configuration.
( SR:8606189054 CR:JAGad58270 )
If many processes access the same bus, some processes might
become unkillable. This error condition has been experienced
only on systems with a hundred or more luns on the same bus.
( SR:8606166664 CR:JAGad35951 )
A system with 2 ALT 8-series DLT (Quantum 4000) on the same
card showed the following panic:
panic: (display==0xb800, flags==0x0) Data page fault 1111
The stack trace was:
scsi_start+0x18
scsi_retry+0xd8
invoke_callouts+0x160
softclock+0x38
sw_service+0x154
mp_ext_interrupt+0x2a0
$RDB_int_patch+0x58
mpn_splx_free_lock_ul4_brn_target+0x4
net_callout+0x90
netisr_netisr+0x1bc
netisr_daemon+0x68
PHKL_23313:
( SR:8606174670 CR:JAGad43916 )
Compiling kernel-intrusive programs (such as drivers and
programs that access /dev/mem or /dev/kmem) on HP-UX 11.11
may result in compiler warnings or errors due to namespace
violations. Executing such programs may cause kernel
structure data corruption, resulting in memory leaks, hangs,
or panics.
PHKL_24441:
( SR:8606173682 CR:JAGad42939 )
High Availability systems hang when under heavy load
and many I/O errors are being returned by the scsi
driver (possibly due to a hardware problem).
( SR:8606175843 CR:JAGad45083 )
A defective SCSI bus controller generates continuous
SCSI bus resets and causes the system to panic.
The panic results in the following stack trace:
panic+0x14
settimeout_for_cpu+0x174
Ktimeout+0x3c
c720_reset_chip+0x129c
c720_isrRST+0x94
c720_isr+0x15cc
sapic_interrupt+0x2c
( SR:8606176606 CR:JAGad45845 )
If device tracing is enabled (with appropriate values
for scsi_trace_dev and scsi_trace_mask, typically for
debugging) and when the system experiences errors
during I/O through the passthrough driver, the system
panics.
panic+0x14
report_trap_or_int_and_panic+0x80
interrupt+0x1d4
$ihndlr_rtn+0x0
scsi_dmesg_log_io+0xf8
scsi_action+0x1b8
scsi_status_action+0x6c
scsi_cbfn+0x41c
scsi_fast_cbfn+0x1b0
c720_call_cbfns+0x60
c720_isr+0x5bc
epic_isr+0x58
mp_ext_interrupt+0x34c
ivti_patch_to_nop3+0x0
idle+0x164
swidle_exit+0x0
( SR:8606173887 CR:JAGad43140 )
There are various symptoms as described under the
following CRs.
( SR:8606169027 CR:JAGad38305 )
Disk I/O hangs even when LVM PV-Link is configured.
The system could report a "DIAGNOSTIC SYSTEM WARNING".
The on-line diagnostic log would show an I/O Error.
( SR:8606178152 CR:JAGad47379 )
Process hang can result during a device open. The system
log (/var/adm/syslog/syslog.log) shows Queue Full status
and a large retry count on an Inquiry request.
( SR:8606168578 CR:JAGad37858 )
The process hangs if an ioctl is issued to a non-existent
hardware path.
( SR:8606178041 CR:JAGad47268 )
"vgchange -a n <VG Name>" command hangs when the cable is
disconnected on the alternate link, if immediate
reporting (IR) is true.
( SR:8606167814 CR:JAGad37097 )
ioscan -fn command hangs when there is a bad disk present.
( SR:8606139670 CR:JAGad08981 )
The system panics when a certain type of SCSI error
occurs while doing writes on hfs filesystem.
The /var/adm/syslog/syslog.log reports Check Condition
status with sense key: (03) Medium Error.
( SR:8606166721 CR:JAGad36008 )
When a bus is shared between two systems, if one of the
systems continuously sends out bus resets, the I/Os from
the other system on this bus hang, consequently the
PV-Link switch would not occur.
Defect Description:
PHKL_34815:
( SR:8606469200 CR:JAGag24419 )
1. SCSI services does not handle the Oracle Consistency
group error case differently from other errors. Hence,
EPOWERF was being returned.
2. SCSI services validates only ANSI-2 and ANSI-3 devices
for Oracle Checksum error. Since SPC-2 devices are ANSI-4
devices, Oracle Checksum errors are not detected by SCSI
services.
Resolution:
1. SCSI services has been modified to handle Oracle
Consistency Group error ( sense key/code/qualifier )
differently. SCSI services will now return EINVAL to the
upper layers instead of EPOWERF for this error.
2. SCSI services now detects Oracle Checksum errors for all
devices with ANSI field greater than 1 and returns
EINVAL to the upper layers.
PHKL_34187:
( SR:8606386083 CR:JAGaf46237 )
During open operations on devices configured in
Business Continuance Volume (BCV) state, all commands
except the inquiry command report the following check
condition "Not Ready - Manual Intervention Required".
This causes SCSI Services to retry the open I/Os for
5 times with a delay of 4 seconds between each retry,
even if the O_NDELAY flag was set during the open
operation. Hence, the EMC configuration utility,
"symcfg", takes a long time to open the large number
of devices in BCV state that are present on a high
end storage configuration. Thus, "symcfg" appears
unresponsive.
Resolution:
SCSI Services is modified not to retry open I/Os for the
check condition "Not Ready - Manual Intervention Required"
when O_NDELAY flag is set.
PHKL_33371:
( SR:8606382044 CR:JAGaf42255 )
The c720 SCSI driver loops continuously for a specific
hardware register to be initialized on LSI 53c895/6
card. On a bad SCSI card, this register is never
initialized resulting in a system hang.
Resolution:
SCSI c720 driver is modified to return status from the
continuous loop, if timed out.
PHKL_32090:
( SR:8606268500 CR:JAGae32738 )
Kernel memory allocation routine may sleep when
invoked with the WAIT OK flag. Sleeping on the
interrupt context can cause panic in the kernel subsystem.
Resolution:
SCSI services code is modified to invoke kernel memory
allocation routine without WAIT OK flag on
the interrupt context.
( SR:8606352805 CR:JAGaf13603 )
Under certain failure conditions, SCSI completion
routine can be invoked with the SCSI LUN LOCK held.
Resolution:
SCSI completion routine is modified to verify
if SCSI LUN LOCK is held on entry.
( SR:8606363788 CR:JAGaf24443 )
cdb_length field is not validated for the 32 bit
version of SIOC_IO ioctl call. Hence, it can cause
the 'Buffer Overrun' scenario during the bcopy
operation.
Resolution:
SIOC_IO ioctl call is modified to validate the
cdb length for the 32 bit version of the SIOC_IO
calls.
( SR:8606358449 CR:JAGaf19148 )
Under certain error condition, the device node is not
closed during the disk partition open routine.
Resolution:
cpd driver's open routine is modified to close the
device node under such error conditions.
( SR:8606368292 CR:JAGaf28854 )
Under certain conditions, the start Unit command
was not issued on the hot-swap disks.
Resolution:
SCSI services code is modified to issue the Start Unit
command on getting the Sense data of ASC = 0x4 and
ASCQ = 0x2.
( SR:8606368809 CR:JAGaf29371 )
Under certain error conditions on a LVM disk, all the
LVM I/Os from the SCSI services internal LUN disk
queue are returned to the upper layer. The non-LVM I/Os
remain in the LUN disk queue waiting to be serviced.
As the LUN disk queue is never started, it resulted in
I/O hang.
Resolution:
SCSI services code has been modified to start
processing the LUN disk queue, if there are
any pending I/Os remaining in it.
( SR:8606376917 CR:JAGaf37195 )
LUN inquiry data maintained in the internal
data structure of SCSI services is corrupted
due to invalid DMA operations.
Resolution:
SCSI services code is modified to ignore the inquiry
data on receiving errors.
( SR:8606374079 CR:JAGaf34466 )
SCSI services do not retry the ioscan(1M) probe I/Os,
if the SCSI status returned is BUSY.
Resolution:
SCSI services code is modified to retry the
ioscan(1M) probe I/Os on getting the SCSI status as BUSY.
PHKL_31134:
( SR:8606364148 CR:JAGaf24802 )
Due to continuous SCSI bus resets observed
with the defective hardware, the I/Os are not sent
on the wire (SCSI Bus) and are kept hanging in the c720
driver internal queue, waiting for this reset condition
to clear. As the reset condition is not cleared,
the I/Os are never returned to the upper layer
for the error recovery management.
Resolution:
On detecting a SCSI bus reset condition, the
I/Os from the C720 interface driver internal
queue are returned to the upper layer for
the error recovery management.
PHKL_30510:
( SR:8606349130 CR:JAGaf09949 )
When VxVM does an open(2) on the secondary disk with
O_NDELAY flag, the SCSI disk driver does not retry
the start unit command to the disk even when a
retryable error is seen.
Resolution:
The SCSI disk driver is modified to retry start unit
command when a retryable error is seen, even in
O_NDELAY mode.
( SR:8606342351 CR:JAGaf03258 )
When an FC60 array is in shutdown state and one of
its luns is closed, it results in the driver trying
to use data structures that have already been freed
by it.
Resolution:
The problem is fixed by not closing the lun until pending
driver command has been completed.
( SR:8606344298 CR:JAGaf05149 )
A SCSI sense key of illegal request returned by
some SPC-2 compliant devices was being
treated incorrectly as a transient error thus
resulting in retries.
Resolution:
Code modified to handle SPC-2 compliant devices
correctly.
( SR:8606351535 CR:JAGaf12340 )
The management processor on MSA30 disk enclosure does not
have a dedicated SCSI target id. It responds to the SCSI id
of the initiator. In order to communicate with the
management processor, open(2) on the device special file
corresponding to the SCSI initiator id should be allowed.
Resolution:
The code has been modified to allow open(2) on the device
special file corresponding to the SCSI initiator id.
PHKL_29365:
( SR:8606185203 CR:JAGad54405 )
When a SCSI pass-through read fails with a check condition
on a deferred error, the SCSI function used to process the
completion of the I/O is incorrectly called twice. Since
the SCSI function is called twice, the number of bytes to
be copied from the kernel space buffer to the user space
buffer in the kernel is incorrectly being incremented.
This resulted in writing past the end of the user space
buffer causing the system to panic with a Data Page Fault.
Resolution:
The fix is not to call the SCSI function twice when there
is a deferred error.
( SR:8606295123 CR:JAGae58817 )
When an asynchronous write request fails, the SCSI device
driver did not log the error. The system panicked due to a
related assertion check in the code.
Resolution:
The SCSI device driver has been modified to log the error
appropriately.
( SR:8606314587 CR:JAGae77351 )
During the online resume (OLR) operation, the SCSI c720
driver's init routine is invoked to allocate and map memory
for the sense buffer. This allocated memory was never freed
or unmapped, thereby causing the system to run out of
mapping space, causing further mapping to fail, which leads
to resume failure.
Resolution:
The SCSI C720 driver code has been modified to use the
already mapped memory for the sense buffer during the
OLR operation.
( SR:8606299275 CR:JAGae62769 )
SCSI Driver does not log the recovered error events reported
by disks to the logging subsystem.
Resolution:
SCSI Driver has been modified to log the recovered error
events. To enable logging, set the 0x40 flag in
scsi_log_mask.
This may be done as below.
adb -w /stand/vmunix /dev/kmem
scsi_log_mask/X <===== Get the value of current log mask
scsi_log_mask:
scsi_log_mask: 1F238B10
Add 0x40 to current mask
scsi_log_mask/W 0x1F238B50
scsi_log_mask: 1F238B10 = 1F238B50
scsi_log_mask?W 0x1F238B50
scsi_log_mask: 1F238B10 = 1F238B50
<Ctrl D> to exit adb
( SR:8606322906 CR:JAGae85372 )
SCSI driver incorrectly sets up the external narrow SCSI
bus to wide mode.
Resolution:
The SCSI c720 driver code has been modified to set up the
SCSI bus correctly for narrow or wide mode.
( SR:8606333146 CR:JAGae94241 )
During device open, several unnecessary messages were
getting logged if any of the I/O requests completed with
Reservation Conflict.
Resolution:
SCSI driver code has been modified to disable
Reservation Conflict log message during device open.
( SR:8606335728 CR:JAGae96782 )
SCSI subsystem does not pass the updated values for
residual byte count to kmetric subsystem.
Resolution:
SCSI subsystem has been modified to pass updated values for
the residual byte count.
( SR:8606338809 CR:JAGae99756 )
The SCSI probe code does not probe for any LUNs beyond
a SCSI PROCESSOR type LUN which is not LUN 0. This
causes LUNs after SCSI PROCESSOR LUN to be missed/not found.
Resolution:
The SCSI probe code was modified to probe for further LUNs
on target even if the SCSI PROCESSOR type LUN was not LUN 0.
PHKL_29047:
( SR:8606304724 CR:JAGae68058 )
When large number of I/O requests are sent to the device
and if this number exceeds the queue length limit of the
device, the device returns QUEUE FULL status. To avoid
frequent QUEUE FULL status messages from the device, the
driver will lower its own queue length so that the number
of I/O requests sent to the device is reduced. The driver
will increase the queue length when a certain number of
I/O's have successfully completed at the current queue
length. If the driver gets QUEUE FULL status when operating
in queue length of 1 it will switch to untagged mode (no
queuing). The driver will not return to tagged mode once
this happens (even when the I/O load is reduced later).
This results in slow I/O to the device and large avwait
and avque values reported by sar(1M).
Resolution:
The code is modified to return to tagged queuing when
a certain number of I/O requests complete successfully in
the untagged mode.
( SR:8606266268 CR:JAGae30517 )
The system panicked because of divide by zero operation
in one of the SCSI routines.
Resolution:
The code is changed to handle divide by zero operation.
( SR:8606304019 CR:JAGae67368 )
The present code needs to be modified to
handle Ultra320 rate in the SIOC get/set ioctls.
Resolution:
The code is modified to handle Ultra320 rate
in the SIOC get/set ioctls.
( SR:8606312429 CR:JAGae75245 )
When Test Unit Ready (TUR) command fails during the
the device open, the open is succeeded without
performing the "Read Capacity" command. This results
in not obtaining the device size information. Hence,
after the device is opened, the Read/Write to the
device fail immediately without even sending the command
to the device. This is not acceptable on the devices that
have been reserved with "Write exclusive" type where the
reads should not fail.
Resolution:
The code has been changed to ignore the "Reservation
Conflict" error for TUR during open.
PHKL_29039:
( SR:8606244397 CR:JAGae10884 )
The system exhausts its memory due to memory leak because
of not freeing a data structure in the error
path.
Resolution:
Free the data structure when the error path
is taken in the bus or target open routines
to plug the memory leak.
( SR:8606290867 CR:JAGae54710 )
The disk failure paths do not ensure that
kmetric completion routines are called
to update the block device activity data.
This results in incorrect data being reported
for block device activity.
Resolution:
The kmetric completion routines are called in the
disk driver error paths to ensure that the
correct values are reported for block device activity.
( SR:8606267129 CR:JAGae31372 )
The I/O routine prints the buffer pointer fields as signed
values. This results in reporting of negative values to
syslog output.
Resolution:
The I/O error routine is modified to
print the buffer pointer fields as
unsigned values.
( SR:8606266450 CR:JAGae30698 )
The SCSI logging policy introduced in PHKL_25165
restricted the SCSI logs to prevent unwanted messages.
This limits the debugging efforts of test engineers.
Resolution:
A new flag is introduced in the scsi_log_mask variable
that can be set through adb(1). Setting this flag would
enable original logging behavior, if required.
( SR:8606160884 CR:JAGad30203 )
Devices which report their class as SCSI_PROCESSOR
are not claimed by any driver and hence are
reported as UNCLAIMED.
Resolution:
Devices which belong to class SCSI_PROCESSOR
are now claimed by the pass-through driver and
hence are shown as CLAIMED by ioscan.
( SR:8606298657 CR:JAGae62156 )
When a non-LVM I/O request fails with
a sense key of "Illegal Request", the disk driver
retries the I/O forever instead of returning failure.
This causes the application to hang indefinitely.
Resolution:
The disk driver is modified to return an error
EINVAL for a non-LVM I/O request when we have
a sense key of "Illegal Request".
( SR:8606307922 CR:JAGae70957 )
Driver treats the 'Reservation Conflict' error
status as transient error and the I/O resulting
in that status is retried until a successful completion.
In case of 'Persistent Reservation', since the
reservations are persistent across power cycle
of the device, the error cannot be treated as
transient.
Resolution:
The I/O request failing with 'Reservation Conflict' status
are failed immediately without a retry.
PHKL_28513:
( SR:8606226043 CR:JAGad95114 )
In extremely rare conditions, single byte writes to
onboard memory (SCRIPT RAM) may not complete on Channel B
of A5159A and Core I/O FWD SCSI HBA on rp24xx, rp54xx
rp7400 systems.
This may result in following problems:
a. Data integrity issues
b. System crash due to HPMC
Resolution:
Driver is changed to perform word writes instead of byte
writes.
( SR:8606286272 CR:JAGae50215 )
To avoid data corruption Disable Pipe Request(DPR) bit is
to be set during SCSI operations. In the present code
it is being done only once, in chip initialization
routine, and it gets reset after a successful
chip reset operation.
Resolution:
Set the DPR bit in the chip reset routine instead of chip
initialization routine. This will make sure that DPR bit
is set on chip reset.
( SR:8606289589 CR:JAGae53519 )
The SCSI LUN pointer is invalid for the bus scsi control
block (SCB) and therefore can cause a recursive bus lock
held panic in Multi-LUN configuration.
Resolution:
The LUN pointer is reset to zero for the bus pool SCB
before it is freed. Hence, recursive holding of the
bus lock is avoided.
PHKL_28096:
( SR:8606271035 CR:JAGae35271 )
The internal hot-swap disks are not configured
to spin up at power-on. Hence, any access to the
the hot swap disks returned a sense data
of ASC = 0x04,ASCQ = 0x02.This resulted in the
I/O getting retried forever resulting in
application hang.
Resolution:
The SCSI driver is modified to send a Start Unit
command to spin up the drives when a sense
data of ASC=0x04, ASCQ=0x02 is returned by
the target.The original I/O is resumed after the
Start Unit command is completed.
PHKL_27579:
( SR:8606245156 CR:JAGae11630 )
The SCSI controller chip on the A5838A card
was reset when it was doing DMA. After reset,
some of the DMA transactions were not getting
claimed by the SCSI controller chip. This
caused the timer on the card bridge to expire
and HPMC the system.
Resolution:
The problem is resolved by aborting the ongoing
DMA before resetting the card.
( SR:8606242143 CR:JAGae09397 )
The Disable Overlapped Arbitration bit in
the Control register Zero is used for
gaining access to the PCI bus while another
function is executing a PCI cycle.The register
bit was not set and hence caused intermittent
bus hangs and bus resets on the cards
containing the 53C876 chip.
Resolution:
The Disable Overlapped Arbitration bit is now
set on the cards containing the 53C876 chip
whenever the chip is reset. This avoids the hang
and subsequent resets.
( SR:8606241873 CR:JAGae09130 )
The scsi_isc array maximum limit is 255.If the instance
is greater than 255 overflow occurs and leads to memory
corruption and subsequent panic. The c720 interface
driver was not validating the maximum limit of the instance
number.
Resolution:
The driver init routine was changed to check if
the instance numbers were greater than 256 and if so return
an error. This causes the bus instance numbers
greater than 256 to become unclaimed and hence avoids
the system panic.
( SR:8606216118 CR:JAGad85288 )
The system panics because of a race condition between the
SCSI bus open and the interrupt being serviced. The
interrupt was getting serviced before the internal
data structures in the bus open routine were completely
initialized.
Resolution:
The fix is to set a flag after initializing the data
structures in the SCSI bus open routine. In the ISR
routine, a check is made to verify if this flag is set.
The interrupt is serviced only if this flag is set. The
flag is unset in the SCSI bus close routine.
( SR:8606257328 CR:JAGae21633 )
When the SCSI bus reset is in progress, a flag is set
and this flag is cleared in a routine that will be
executed 3 seconds after the bus is reset using a timer
routine. Before this function executes,if the card is
suspended, as the suspend routine cancels the timeout,
the flag will never get cleared. Even after the
resume, the flag remains set and when this flag is set,
no I/O are possible. This will cause application hang.
Resolution:
The flag should be reset in the suspend routine
before canceling the timeout routine.
( SR:8606204859 CR:JAGad74037 )
The SCSI driver does not distinguish between speed/width
negotiations initiated by the target or the driver.
The mismatch in the speed setting on the host and the target
resulted in Parity Error on the bus.
Resolution:
SCSI driver now tracks whether the response from the target
is a response to host initiated negotiation or a
unsolicited request from the target.
( SR:8606232873 CR:JAGae02101 )
The driver used to map request sense buffer for every
I/O (and unmap them on completion). During heavy load the
mapping failed and caused the system to panic. The problem
was fixed by mapping the request sense buffer during bus
open and re-using the physical address for each I/O until
the bus is closed. However, while re-using the request
sense buffers between I/O, the driver was not invalidating
the buffer.
Resolution:
Modify the request sense buffer handling code in c720
driver as below
1. Allocate and map one request sense buffer in
initialization function and re-use it during the life
of the card.
2. Invalidate the buffer after every completion status
receipt from device.
( SR:8606238711 CR:JAGae07734 )
The disk driver returns EINVAL for I/O request
to LVM due to some hardware condition.
LVM was not retrying the I/O requests even
when an alternate path to the LUN existed.
This resulted in some filesystem and system
hang condition.
Resolution:
The disk driver is modified to return an error
of EPOWERF when an EINVAL condition
is reported by the device for an LVM I/O
except for ASC=0x0C , ASCQ=0xA0
(Oracle Hard Integrity error).
LVM will retry the I/O on an alternate path
due to EPOWERF returned by the disk driver.
( SR:8606264850 CR:JAGae29181 )
The open(2) on a CDROM drive without a CD in it takes
a considerable amount of time compared to having a CD
in the drive. As a result of it, the boot time increased
with VxVM. The delay comes from the driver sending a
"Start Stop Unit" (SSU) command to spin up the disc
and retrying the command 5 times even when the
O_NDELAY flag is set.
Resolution:
The solution is not to retry the "Start Stop Unit"
command if the O_NDELAY flag is set and retry
5 times if the flag is not set, so that the Operating
System realizes faster that there is no media in
the CDROM drive.
PHKL_27563:
( SR:8606265990 CR:JAGae30243 )
The I/O subsystem hang occurred because an I/O remained
in the lun disk queue. The I/O remained in the queue
because of a failure in allocating the resource.
Resolution:
The code has been modified to take care that the
I/O subsystem hang does not happen when allocation
of the resource fails.
PHKL_26519:
( SR:8606236118 CR:JAGae05183 )
Upon detecting a timed out I/O request, the driver sets a
flag in the LUN data structure indicating "do not retry any
requests for this LUN". After successful completion of a
subsequent I/O request, this flag should be cleared.
However, when the subsequent I/O request completes
successfully, the driver's normal completion path (in which
this flag is cleared) is not executed and hence the flag
remains set. If any subsequent I/O requests do not complete
successfully, they are failed immediately without performing
a retry.
Resolution:
Ensure that the driver follows the normal completion path
for the first successful completion of an I/O request
following a failed I/O request.
( SR:8606226361 CR:JAGad95431 )
When I/O requests from LVM fail or time-out due to bad
disks, the SCSI disk driver returns an incorrect error code
to LVM causing the LVM to retry the I/O request forever
instead of returning failure. This causes the application
which has issued the I/O request to hang indefinitely.
Resolution:
Ensure that the I/O request failed due to MEDIUM ERROR
is reported back to LVM with EMEDIA error.
( SR:8606135832 CR:JAGad04964 )
This product update contains minor enhancements required to
enable the support for 16 byte CDBs (Command Descriptor
Block) in the SCSI driver.
Resolution:
The SCSI driver has been to modified for the support of 16
byte CDBs (Command Descriptor Block).
( SR:8606214047 CR:JAGad83238 )
The SCSI driver accesses the adapter registers in its
interrupt service routine (ISR). Although card interrupts
are disabled during the suspend operation, if spurious
interrupts are delivered to the SCSI driver, the driver
would attempt to read the card registers, resulting in a
HPMC.
Resolution:
The SCSI driver has been modified so that it does not
attempt to process spurious interrupts when it is in a
suspended state.
( SR:8606236116 CR:JAGae05181 )
After an LVM I/O times out, the flag L_FAIL_QUEUE_IO can
remain set and prevent LVM probes from being sent to the
device to see if it has returned on-line. Also, many SCSI
read error messages will be seen in syslog.
Resolution:
Only set the flag (L_FAIL_QUEUE_IO) if there are I/O
requests queued to be sent to the device.
PHKL_25896:
( SR:8606228002 CR:JAGad97060 )
Some disks do not start spinning automatically. The SCSI
subsystem retry policy changed with PHKL_24441 and, as a
result, the Start Unit command is not retried. These two
changes combined cause the first Start Unit command to these
disks to fail. Successive commands issued to these disks
also fail. Later, a Read Capacity command to the device
returns a capacity of zero bytes.
Resolution:
The SCSI subsystem retry policy is refined to retry the
Start Unit command five times. With these successive
retries, the disks start spinning and later return the
correct capacity size.
PHKL_25509:
( SR:8606203627 CR:JAGad72800 )
All messages logged by the c720 driver can be suppressed by
setting appropriate value for a c720 driver global variable.
This message was not controlled by this variable.
Resolution:
The value of the global variable is now checked before
printing the mentioned message.
( SR:8606186960 CR:JAGad56170 )
The Plasmon devices return 256 bytes of Sense data. The
allocated buffer for sense data is also 256 bytes, causing
the residue to be zero. When the residue is zero, the
mentioned message is printed.
Resolution:
The code that prints the mentioned message is removed since
a zero residue is not an error.
( SR:8606201476 CR:JAGad70652 )
This informative message is logged while trying to access a
device with zero capacity (i.e. a placeholder LUN when LUNS
are not defined contiguously) or a device without media.
Resolution:
This message is no longer logged by default for partially
opened devices. This message log can be enabled for
debugging purposes by setting appropriate value in
scsi_log_mask.
( SR:8606199892 CR:JAGad69078 )
Devices of HP SureStore E Disk Array 12 sometimes
successfully complete I/Os when in Contingent Allegiance
(CHECK CONDITION state of SCSI devices). These I/O
completions were not reported to the requesting layer,
causing the process to hang.
Resolution:
If an I/O completes successfully when the device is in
Contingent Allegiance condition, the I/O is returned with an
error to the requesting layer for further action (typically
an I/O retry).
( SR:8606194472 CR:JAGad63680 )
While determining whether the SCSI bus is wide or narrow,
the C3700 model was not considered.
Resolution:
The model string for the C3700 was added to the list of
models with narrow external SCSI buses.
( SR:8606177456 CR:JAGad46688 )
On the last close of a device, the scsi driver attempts to
synchronize the cache of the device. Using a FC60 array,
this is unnecessary and it also causes an Automatic LUN
Transfer (ALT). The unneeded ALT operations cause a
significant delay when attempting to import Disk groups.
Resolution:
Disk driver does not synchronize cache on FC60 array devices
when the device is closed.
PHKL_25165:
( SR:8606207855 CR:JAGad77032 )
The SCSI services did not support these ioctls for the c8xx
driver.
Resolution:
SCSI services is enhanced to support these ioctls for the
c8xx driver.
( SR:8606170140 CR:JAGad39404 )
Many non critical conditions generate logs. By example,
addition or removal of devices in the loop generate many
resets.
Resolution:
The logging mechanism is adapted to record only critical
error conditions. The following conditions are recorded:
Unit Attention and Deferred errors
I/O retried unsuccessfully
I/O unsuccessful and not retried
Also, if the SLOG_SUCCESS_RETRY is set in scsi_log_mask,
I/Os completing successfully after prior unsuccessful
attempts are recorded. This helps to identify devices
repeatedly returning errors before being successful.
( SR:8606172682 CR:JAGad41942 )
The nature of the lun is not checked when modifying the
queue depth of the lun.
When modifying the queue depth of a lun, the new value
is overwritten by the previous value.
Resolution:
The nature of a lun is checked while modifying its queue
depth. If the lun does not support tag queueing, an error
code is returned.
The code is adapted to support multiple queue depth changes.
( SR:8606166652 CR:JAGad35939 )
If a SCSI I/O is initiated using the sctl/ioctl passthrough
function and the transfer size is greater than the size of
the malloc'd buffer for this transfer, the system panics.
Resolution:
Check the access permissions of the buffer supplied by the
user before using it. This ensures the system won't panic
if the size of the I/O is greater than the size of the
buffer.
( SR:8606192639 CR:JAGad61851 )
While probing for each LUN, the corresponding bus is opened
and closed. The bus open takes a long time, especially in
large Fibre Channel Array configurations. Since all
possible LUNs are probed on each target whether or not they
are present, the probe time can be quite long on large Fibre
Channel Array configurations, resulting in long boot times
and long ioscan times (without the -k option).
Resolution:
The bus is kept open until all the LUNs corresponding to all
the targets on that bus are probed.
( SR:8606189054 CR:JAGad58270 )
Under heavy I/O load on the same bus, when some per bus
resource (tag, nexus) becomes unavailable, I/Os are stored
in specific queues, waiting for the resource to become
available. Under some conditions, the queues are not
checked once the resource is once again available, leaving
the I/O requests unserviced. The corresponding processes
remain in an unkillable state, waiting for I/O completion
or failure that never occurs.
Resolution:
Additional tests were added to check if I/Os are pending in
the queues, and to process them if the resources are now
available.
( SR:8606166664 CR:JAGad35951 )
While retrying an I/O that has timed out, the system may
access a previously freed target pointer, resulting in a
panic.
Resolution:
Timed-out requests are sent to a temporary queue. This
prevents them from being processed and started until the
target pointer is once again valid.
PHKL_23313:
( SR:8606174670 CR:JAGad43916 )
A number of program header files were delivered in their
"debug" forms rather than in their intended "performance"
forms. Kernel-intrusive programs (such as drivers and
programs that access /dev/mem or /dev/kmem) compiled with
these header files may contain internal structures that are
not aligned with the actual kernel structures.
Resolution:
A set of patches redeliver the header files in their
intended forms. All kernel-intrusive programs originally
compiled using the header files included on the HP-UX 11.11
Operating Environment OE Install and Recovery media dated
December 2000 must be recompiled using the corrected header
files.
All of the corrected header file patches are included in the
BUNDLE11i bundle on the HP-UX 11.11 Operating Environment
Core OE Install and Recovery media dated February 2001 or
later.
The complete list of header file patches is:
PHNE_23288 /usr/conf/net/netmp.h
/usr/include/net/netmp.h
/usr/conf/sys/socketvar.h
/usr/include/sys/socketvar.h
/usr/conf/sys/unpcb.h
/usr/include/sys/unpcb.h
PHNE_23289 /usr/include/sio/mux4.h
PHKL_23290 /usr/conf/space.h.d/system_space.h
PHKL_23291 /usr/conf/graf/gpu_data.h
PHKL_23292 /usr/conf/io/scsi_surface.h
PHKL_23293 /usr/conf/sys/assert.h
PHKL_23294 /usr/conf/sys/buf.h
/usr/include/sys/buf.h
PHKL_23295 /usr/conf/sys/debug.h
/usr/include/sys/debug.h
PHKL_23296 /usr/conf/sys/dnlc.h
/usr/include/sys/dnlc.h
PHKL_23297 /usr/conf/sys/io.h
/usr/include/sys/io.h
PHKL_23298 /usr/include/sys/ki_iface.h
PHKL_23299 /usr/conf/sys/pfdat.h
/usr/include/sys/pfdat.h
PHKL_23300 /usr/include/sys/proc_debug.h
PHKL_23301 /usr/conf/sys/proc_iface.h
/usr/include/sys/proc_iface.h
PHKL_23302 /usr/conf/sys/rw_lock.h
/usr/include/sys/rw_lock.h
PHKL_23303 /usr/conf/sys/sem_alpha.h
/usr/include/sys/sem_alpha.h
PHKL_23304 /usr/conf/sys/sem_beta.h
/usr/include/sys/sem_beta.h
PHKL_23305 /usr/conf/sys/sem_sync.h
/usr/include/sys/sem_sync.h
PHKL_23306 /usr/conf/sys/sem_utl.h
/usr/include/sys/sem_utl.h
PHKL_23307 /usr/conf/sys/spinlock.h
/usr/include/sys/spinlock.h
PHKL_23308 /usr/conf/sys/vas.h
/usr/include/sys/vas.h
PHKL_23309 /usr/conf/sys/vfd.h
/usr/include/sys/vfd.h
PHKL_23310 /usr/conf/sys/vnode.h
/usr/include/sys/vnode.h
PHKL_23311 /usr/conf/ufs/inode.h
/usr/include/sys/inode.h
PHKL_23312 /usr/conf/wsio/pci.h
/usr/include/sys/pci.h
PHKL_23313 /usr/include/sys/scsi_ctl.h
PHKL_23314 /usr/conf/pa/sync/spinlock.h
/usr/include/pa/sync/spinlock.h
PHKL_23315 /usr/conf/pa/cpu.h
/usr/include/pa/cpu.h
PHKL_23316 /usr/conf/pa/sys/map.h
PHKL_24441:
( SR:8606173682 CR:JAGad42939 )
I/Os which timeout were not always being returned with
an error but were being retried indefinitely. This
resulted in requests that had timed out, getting stuck
in the disk driver's queue, resulting in a hang.
Resolution:
The requests are tracked and those which timeout are
returned to the upper layer, thus allowing it to switch
to an alternate path if one is configured.
( SR:8606175843 CR:JAGad45083 )
On every SCSI bus reset, a new timer was set for further
processing. This lead to timer table overflow that
caused the system to panic.
Resolution:
Every time the bus reset occurs, the previously-set timer
(if it exists), is cancelled and a new timer is set.
This ensures only one timer exists per bus at a given
time thus preventing the timer table overflow.
( SR:8606176606 CR:JAGad45845 )
In the passthrough driver I/O path, the logging function
referenced a NULL pointer while trying to generate a
hardware path string for the device, causing the system
panic.
Resolution:
For I/Os through the passthrough driver, the logging
function now checks for NULL pointer and the hardware
path information for the device is not logged for such
I/Os.
( SR:8606173887 CR:JAGad43140 )
( SR:8606169027 CR:JAGad38305 )
( SR:8606178152 CR:JAGad47379 )
( SR:8606168578 CR:JAGad37858 )
( SR:8606178041 CR:JAGad47268 )
( SR:8606167814 CR:JAGad37097 )
( SR:8606139670 CR:JAGad08981 )
( SR:8606166721 CR:JAGad36008 )
Few error conditions were retried indefinitely causing
process hang or PVLink switch not to occur. While in the
case of error-intolerant upper layers (like the hfs
filesystem) the error returns caused file system panics.
Resolution:
Depending on where the I/O is issued from:
1. Device open/ioctl,
2. I/Os from an error-intolerant upper layer or
3. I/Os from LVM-like upper layers,
various error conditions are now handled appropriately.
Enhancement:
No (superseded patches contained enhancements)
PHKL_28513:
Enhancements were delivered in a patch this one has
superseded. Please review the Defect Description
text for more information.
SR:
8606469200 8606386083 8606135832 8606139670 8606160884
8606166652 8606166664 8606166721 8606167814 8606168578
8606169027 8606170140 8606172682 8606173682 8606173887
8606174670 8606175843 8606176606 8606177456 8606178041
8606178152 8606185203 8606186960 8606189054 8606192639
8606194472 8606199892 8606201476 8606203627 8606204859
8606207855 8606214047 8606216118 8606226043 8606226361
8606228002 8606232873 8606236116 8606236118 8606238711
8606241873 8606242143 8606244397 8606245156 8606257328
8606264850 8606265990 8606266268 8606266450 8606267129
8606268500 8606271035 8606286272 8606289589 8606290867
8606295123 8606298657 8606299275 8606304019 8606304724
8606307922 8606312429 8606314587 8606322906 8606333146
8606335728 8606338809 8606342351 8606344298 8606349130
8606351535 8606352805 8606358449 8606363788 8606364148
8606368292 8606368809 8606374079 8606376917 8606382044
Patch Files:
OS-Core.ADMN-ENG-A-MAN,fr=B.11.11,fa=HP-UX_B.11.11_32/64,
v=HP:
/usr/share/man/man7.Z/scsi.7
ProgSupport.C-INC,fr=B.11.11,fa=HP-UX_B.11.11_32/64,v=HP:
/usr/include/sys/scsi_ctl.h
OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_32,v=HP:
/usr/conf/lib/libio.a(cpd.o)
/usr/conf/lib/libwsio.a(scsi_c720.o)
/usr/conf/lib/libwsio.a(scsi_ctl.o)
/usr/conf/lib/libwsio.a(scsi_disk.o)
OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_64,v=HP:
/usr/conf/lib/libio.a(cpd.o)
/usr/conf/lib/libwsio.a(scsi_c720.o)
/usr/conf/lib/libwsio.a(scsi_ctl.o)
/usr/conf/lib/libwsio.a(scsi_disk.o)
what(1) Output:
OS-Core.ADMN-ENG-A-MAN,fr=B.11.11,fa=HP-UX_B.11.11_32/64,
v=HP:
/usr/share/man/man7.Z/scsi.7:
None
ProgSupport.C-INC,fr=B.11.11,fa=HP-UX_B.11.11_32/64,v=HP:
/usr/include/sys/scsi_ctl.h:
scsi_ctl.h $Date: 2003/12/07 21:04:10 $Revision: r11
.11/7 PATCH_11.11 (PHKL_29365) */
OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_32,v=HP:
/usr/conf/lib/libio.a(cpd.o):
cpd.c $Date: 2004/12/01 21:58:26 $Revision: r11.11/1
PATCH_11.11 (PHKL_32090)
/usr/conf/lib/libwsio.a(scsi_c720.o):
scsi_c720.c $Date: 2005/06/22 22:32:09 $Revision: r1
1.11/11 PATCH_11.11 (PHKL_33371)
/usr/conf/lib/libwsio.a(scsi_ctl.o):
scsi_ctl.c $Date: 2004/12/01 21:58:26 $Revision: r11
.11/10 PATCH_11.11 (PHKL_32090)
/usr/conf/lib/libwsio.a(scsi_disk.o):
scsi_disk.c $Date: 2006/12/21 02:06:12 $Revision: r1
1.11/13 PATCH_11.11 (PHKL_34815)
OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_64,v=HP:
/usr/conf/lib/libio.a(cpd.o):
cpd.c $Date: 2004/12/01 21:58:26 $Revision: r11.11/1
PATCH_11.11 (PHKL_32090)
/usr/conf/lib/libwsio.a(scsi_c720.o):
scsi_c720.c $Date: 2005/06/22 22:32:09 $Revision: r1
1.11/11 PATCH_11.11 (PHKL_33371)
/usr/conf/lib/libwsio.a(scsi_ctl.o):
scsi_ctl.c $Date: 2004/12/01 21:58:26 $Revision: r11
.11/10 PATCH_11.11 (PHKL_32090)
/usr/conf/lib/libwsio.a(scsi_disk.o):
scsi_disk.c $Date: 2006/12/21 02:06:12 $Revision: r1
1.11/13 PATCH_11.11 (PHKL_34815)
cksum(1) Output:
OS-Core.ADMN-ENG-A-MAN,fr=B.11.11,fa=HP-UX_B.11.11_32/64,
v=HP:
3159783876 5603 /usr/share/man/man7.Z/scsi.7
ProgSupport.C-INC,fr=B.11.11,fa=HP-UX_B.11.11_32/64,v=HP:
336921635 67285 /usr/include/sys/scsi_ctl.h
OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_32,v=HP:
990800061 13012 /usr/conf/lib/libio.a(cpd.o)
3167280970 155960 /usr/conf/lib/libwsio.a(scsi_c720.o)
1561224706 108444 /usr/conf/lib/libwsio.a(scsi_ctl.o)
130907897 28448 /usr/conf/lib/libwsio.a(scsi_disk.o)
OS-Core.CORE2-KRN,fr=B.11.11,fa=HP-UX_B.11.11_64,v=HP:
3446646258 29664 /usr/conf/lib/libio.a(cpd.o)
1012605850 316144 /usr/conf/lib/libwsio.a(scsi_c720.o)
1062788528 277728 /usr/conf/lib/libwsio.a(scsi_ctl.o)
2743012094 62840 /usr/conf/lib/libwsio.a(scsi_disk.o)
Patch Conflicts: None
Patch Dependencies:
s700: 11.11: PHKL_30511
s800: 11.11: PHKL_30511
Hardware Dependencies: None
Other Dependencies: None
Supersedes:
PHKL_24441 PHKL_29365 PHKL_29047 PHKL_29039 PHKL_28513 PHKL_28096
PHKL_27579 PHKL_27563 PHKL_26519 PHKL_25896 PHKL_25509 PHKL_25165
PHKL_23313 PHKL_34187 PHKL_33371 PHKL_32090 PHKL_31134 PHKL_30510
Equivalent Patches:
PHKL_35731:
s700: 11.23
s800: 11.23
Patch Package Size: 510 KBytes
Installation Instructions:
Please review all instructions and the Hewlett-Packard
SupportLine User Guide or your Hewlett-Packard support terms
and conditions for precautions, scope of license,
restrictions, and, limitation of liability and warranties,
before installing this patch.
------------------------------------------------------------
1. Back up your system before installing a patch.
2. Login as root.
3. Copy the patch to the /tmp directory.
4. Move to the /tmp directory and unshar the patch:
cd /tmp
sh PHKL_34815
5. Run swinstall to install the patch:
swinstall -x autoreboot=true -x patch_match_target=true \
-s /tmp/PHKL_34815.depot
By default swinstall will archive the original software in
/var/adm/sw/save/PHKL_34815. If you do not wish to retain a
copy of the original software, include the patch_save_files
option in the swinstall command above:
-x patch_save_files=false
WARNING: If patch_save_files is false when a patch is installed,
the patch cannot be deinstalled. Please be careful
when using this feature.
For future reference, the contents of the PHKL_34815.text file is
available in the product readme:
swlist -l product -a readme -d @ /tmp/PHKL_34815.depot
To put this patch on a magnetic tape and install from the
tape drive, use the command:
dd if=/tmp/PHKL_34815.depot of=/dev/rmt/0m bs=2k
Special Installation Instructions:
For patch PHKL_29365 and later:
JAGae62769:
SCSI Driver has been modified to log the recovered
error events. To enable logging, set the 0x40 flag
in scsi_log_mask.
This may be done as below.
adb -w /stand/vmunix /dev/kmem
scsi_log_mask/X <===== Get the value of current
log mask
scsi_log_mask:
scsi_log_mask: 1F238B10
Add 0x40 to current mask
scsi_log_mask/W 0x1F238B50
scsi_log_mask: 1F238B10 = 1F238B50
scsi_log_mask?W 0x1F238B50
scsi_log_mask: 1F238B10 = 1F238B50
<Ctrl D> to exit adb
JAGae85372:
For model C3750 workstations that have external
single-ended wide SCSI disks connected to the
narrow 50 pin SCSI connector, it is imperative
that these disks be powered down and then powered
up after this patch is installed. Else, if these
devices are not power-cycled, they will not be
accessible and any I/O requests to them may hang.
|