 |
≫ |
|
|
 |
パッチ名: PHSS_34824
パッチ摘要: s700_800 11.23 Serviceguard eRAC A.11.17.00
作成日: 07/02/14
公開日: 07/03/18
ハードウェアプラットフォームおよびOSリリース:
s700: 11.23
s800: 11.23
現象:
PHSS_34824:
1. 稀に、cmgmsdがそのクライアントの検証を完了できないことがあります。
この場合、cmgmsdは繰り返し検証を試みます。そのため、最終的には、
cmcldがcmgmsdを強制終了し、そのノードでTOCが実行されます。以下のよ
うなsyslogメッセージが生成されます。
Apr 28 11:27:22 see cmcld[1069552606]: Failed to deliver
cdb callback to port=16 pid=1069552613, path=
/cluster/groups/gms/*, Connection timed out
Apr 28 11:27:22 see cmcld[1069552606]: Dropping lcomm
connection to client on port=16.
Apr 28 11:27:22 see cmcld[1069552606]: Sending SIGABRT
to external process responsible for timedout
transaction (pid = 1069552613)
Apr 28 11:27:22 see cmcld[1069552606]: Taking cmcld
core also to see if the message was ever sent to
the client
Apr 28 11:27:22 see cmcld[1069552606]: Aborting:
cdb/cdb_proxy_server.c 1316 (Aborting, because an
external process has timed out in the middle of a txn
2. 使用可能なメモリーが不足していると、cmgmsdはクライアントプロセスの
プロセスIDを検証できません。ところが、この場合、cmgmsdが、"使用可能
なメモリーの不足"ではなく、"ポート番号がプロセスIDに属していない"と
いう意味の紛らわしいエラーメッセージをcmgmsd.logファイルに記録しま
す。
Sep 20 14:12:41 [14720] Process id: 14912 registers as
primary member with id: 86
Sep 20 14:12:41 [14720] The port 53052 doesn't belong
to 14914
Sep 20 14:12:41 [14720] ERROR: The pid received is not
that of the client(14914)
Sep 20 14:12:41 [14720] Request for primary member 87
on node 4 to join group DG7
3. cmgmsdデーモンが、別のアプリケーションによってすでに登録されている
ポート番号5408を使って通信します。そのため、そのアプリケーションが
サーバー上のSGeRACを使用できません。
問題点の説明:
PHSS_34824:
1. クライアントから登録要求を受信すると、cmgmsdはそのクライアントの検
証を試みます。この場合、cmgmsdは、クライアントから送信されたプロセ
スidを使って、そのクライアントが本当にそのポート、つまり、登録要求
をcmgmsdに送信したポートを所有しているかチェックします。ところが、
稀に、cmgmsdは、クライアントのソケット属性を照会できずに繰り返しク
ライアントの検証を試みることがありました。そのため、最終的には、
cmcldがcmgmsdを強制終了し、そのノードでTOCが実行されていました。
解決方法:
一定回数検証を試みた後、その登録要求を拒否するようにcmgmsdを修正し
ました。
2. メモリーを割り当てることができない(ENOBUF)ためクライアントのプロセ
スIDを検証できない場合、cmgmsdデーモンは紛らわしいエラーメッセージ
を記録していました。
解決方法:
クライアントのプロセスIDを検証できなかった本当の理由を示すエラーメ
ッセージを記録するようにcmgmsdデーモンを修正しました。メモリー不足
のためプロセスIDを検証できない場合は、"No buffer space available"
というメッセージを記録します。
3. cmgmsdデーモンは、別のアプリケーションによってすでに登録されている
ポート番号5408を使って通信していました。そのため、そのアプリケーシ
ョンがサーバー上のSGeRACを使用できませんでした。
解決方法:
インターネットソケットとポート番号を使用するのではなく、UNIXドメイ
ンソケットを使ってそのクライアントと交信するようにグループメンバー
シップAPIとcmgmsdデーモンを修正しました。これで、より安全な通信が可
能になり、ポート番号5408を登録したアプリケーションとの衝突は起きま
せん。
-----------------------------------------------------------------------------
Patch Name: PHSS_34824
Patch Description: s700_800 11.23 Serviceguard eRAC A.11.17.00
Creation Date: 07/02/14
Post Date: 07/03/18
Hardware Platforms - OS Releases:
s700: 11.23
s800: 11.23
Products:
SGeRAC A.11.17.00
Filesets:
SG-NMAPI.CM-NMAPI,fr=A.11.17.00,fa=HP-UX_B.11.23_PA,v=HP
SG-NMAPI.CM-NMAPI,fr=A.11.17.00,fa=HP-UX_B.11.23_IA,v=HP
Automatic Reboot?: No
Status: General Release
Critical:
Yes
PHSS_34824: PANIC
This is critical because cmgmsd might kill itself
thus potentially leading to a reboot of the node.
Category Tags:
defect_repair general_release critical panic
Path Name: /hp-ux_patches/s700_800/11.X/PHSS_34824
Symptoms:
PHSS_34824:
1. In certain rare scenarios, cmgmsd may not be able to
successfully validate its clients, which can result in
cmgmsd looping continuously trying to do the
validation. When this happens cmgmsd will eventually be
killed by cmcld and cause the node TOC. Following are
the syslog messages generated
Apr 28 11:27:22 see cmcld[1069552606]: Failed to deliver
cdb callback to port=16 pid=1069552613, path=
/cluster/groups/gms/*, Connection timed out
Apr 28 11:27:22 see cmcld[1069552606]: Dropping lcomm
connection to client on port=16.
Apr 28 11:27:22 see cmcld[1069552606]: Sending SIGABRT
to external process responsible for timedout
transaction (pid = 1069552613)
Apr 28 11:27:22 see cmcld[1069552606]: Taking cmcld
core also to see if the message was ever sent to
the client
Apr 28 11:27:22 see cmcld[1069552606]: Aborting:
cdb/cdb_proxy_server.c 1316 (Aborting, because an
external process has timed out in the middle of a txn
2. When not enough memory is available to cmgmsd, it cannot
validate the process ID of client processes. When this
happens cmgmsd logs a misleading error message in
cmgmsd.log file as "port number does not belong to the
process ID" rather than not enough memory available;
Sep 20 14:12:41 [14720] Process id: 14912 registers as
primary member with id: 86
Sep 20 14:12:41 [14720] The port 53052 doesn't belong
to 14914
Sep 20 14:12:41 [14720] ERROR: The pid received is not
that of the client(14914)
Sep 20 14:12:41 [14720] Request for primary member 87
on node 4 to join group DG7
3. The cmgmsd daemon uses port number 5408 for its
communication which has been previously registered by
another application. Therefore, SGeRAC cannot be used on
servers with that application.
PHSS_33838:
1. When shutting down Oracle CRS with command "crsctl stop
crs" in Oracle 10g, many SIGKILL messages are generated
in syslog file like following:
Sending SIGKILL to process /u01/crs/oracle/product
/10.2.0/crs/jdk/jre/bin/PA_RISC2.0/java (pid: 9183)
after communication problem detected.
...
Sending SIGKILL to process /u01/crs/oracle/product
/10.2.0/crs/jdk/jre/bin/PA_RISC2.0/java (pid: 9183)
after communication problem detected.
2. When shutting down an Oracle RAC instance, the node may
TOC and SG/SGeRAC daemons log the following message in
syslog, the probability of such an occurrence is quite
low.
cmgmsd: Primary process: 0 deregisters
cmgmsd: Sending SIGKILL to process 0 after communication
problem detected.
cmcld: Service cmgmsd terminated due to a signal(9).
cmcld: Halting <node> to preserve data integrity
cmcld: Reason: CMGMSD daemon failed
3. Oracle development determined that one of the internal
group membership API functions was not behaving
according to the specification on HP-UX.
The skgxnqgrp function was returning a SKGXN_FAIL error
rather than returning an empty bitmap when the
specified group did not exist.
The external symptoms of this problem are unknown.
Defect Description:
PHSS_34824:
1. When cmgmsd receives a register request from a
client, it attempts to validate the client. cmgmsd
uses the process id sent by the client to check if the
client indeed owns the port from which the register
request was sent to cmgmsd. In certain rare scenarios,
cmgmsd may not be able to query the attributes of the
client's sockets and will loop continuously trying to
validate the client. cmcld will eventually kill cmgmsd
and cause the node TOC.
Resolution: cmgmsd has been changed to attempt
validation a specific number of times before rejecting
the register request.
2. The cmgmsd daemon logs a misleading error message when
it is unable to validate the process ID of a client
because it cannot allocate memory (ENOBUF).
Resolution: cmgmsd daemon has been modified to log
the error message, which causes the failure of a
client's process ID verification. If process ID
verification fails due to memory, a message will be
logged as "No buffer space available".
3. The cmgmsd daemon uses port number 5408 for its
communication which has been previously registered by
another application. Therefore, SGeRAC cannot be used
on servers with that application.
Resolution: The Group membership APIs and cmgmsd
daemon have been changed to use the Unix Domain sockets
to interact with its clients instead of Internet sockets
and port number. This implementation is more secure and
will no longer conflict with the application that has
registered the port number 5408.
PHSS_33838:
1. When cmgmsd detects a communication problem with a
client, cmgmsd will kill the client and then send
additional SIGKILL signals until the client has
died.
Because cmgmsd has high priority and the kill signal
may not be processed immediately, cmgmsd may send a lot
of kill signals and logs each into the syslog.
Resolution: After sending the kill signal to the client,
cmgmsd waits for a short period of time before sending
another kill signal. This allows the client more time
to process the signal and exit before another kill
signal is sent.
2. The problem occurred because cmgmsd tried to access
the freed memory after a group member deregistered from
cmgmsd.
Resolution: Changed the source code not to process the
group member if the deregistration of this group member
is successful.
3. When the specified group does not exist, skgxnqgrp
returns SKGXN_FAIL instead of an empty bitmap. The
implementation of skgxnqgrp does not match the
specification.
Resolution: Changed the source code to return an
empty bitmap if the specified group is not found.
Enhancement:
No
SR:
8606443147 8606462012 8606472718 8606427015 8606435963
8606439160
Patch Files:
SG-NMAPI.CM-NMAPI,fr=A.11.17.00,fa=HP-UX_B.11.23_PA,v=HP:
/usr/lbin/cmgmsd
/opt/nmapi/nmapi2/lib/libnmapi2.1
/opt/nmapi/nmapi2/lib/pa20_64/libnmapi2.1
SG-NMAPI.CM-NMAPI,fr=A.11.17.00,fa=HP-UX_B.11.23_IA,v=HP:
/usr/lbin/cmgmsd
/opt/nmapi/nmapi2/lib/hpux32/libnmapi2.so.1
/opt/nmapi/nmapi2/lib/hpux64/libnmapi2.so.1
what(1) Output:
SG-NMAPI.CM-NMAPI,fr=A.11.17.00,fa=HP-UX_B.11.23_PA,v=HP:
/usr/lbin/cmgmsd:
$Revision: 92453-07 linker linker crt0.o B.11.16.01
030415 $
Build date: Thu Feb 15 17:11:30 PST 2007
Build id: ibld_sgerac_a1117patch_1123_product
Build platform: hpux
HP92453-02A.11.00 HP-UX SYMBOLIC DEBUGGER (END.O ILP
32) $Revision: 75.04 $
A.11.17.00 Date: 02/15/07 Patch: PHSS_34824
CMGMSD
CMGMSD
/opt/nmapi/nmapi2/lib/libnmapi2.1:
Build date: Thu Feb 15 17:11:24 PST 2007
Build id: ibld_sgerac_a1117patch_1123_product
Build platform: hpux
A.11.17.00 Date: 02/15/07 Patch: PHSS_34824
NMAPI2
/opt/nmapi/nmapi2/lib/pa20_64/libnmapi2.1:
NMAPI2
A.11.17.00 Date: 02/15/07 Patch: PHSS_34824
Build date: Thu Feb 15 17:14:29 PST 2007
Build id: ibld_sgerac_a1117patch_1123_product
Build platform: hpux - 64 bit
SG-NMAPI.CM-NMAPI,fr=A.11.17.00,fa=HP-UX_B.11.23_IA,v=HP:
/usr/lbin/cmgmsd:
A.11.17.00 Date: 02/15/07 Patch: PHSS_34824
CMGMSD
CMGMSD
Build date: Thu Feb 15 17:15:17 PST 2007
Build id: ibld_sgerac_a1117patch_1123_product
Build platform: hpux
/opt/nmapi/nmapi2/lib/hpux32/libnmapi2.so.1:
NMAPI2
A.11.17.00 Date: 02/15/07 Patch: PHSS_34824
Build date: Thu Feb 15 17:15:08 PST 2007
Build id: ibld_sgerac_a1117patch_1123_product
Build platform: hpux
/opt/nmapi/nmapi2/lib/hpux64/libnmapi2.so.1:
NMAPI2
A.11.17.00 Date: 02/15/07 Patch: PHSS_34824
Build date: Thu Feb 15 17:19:27 PST 2007
Build id: ibld_sgerac_a1117patch_1123_product
Build platform: hpux - 64 bit
cksum(1) Output:
SG-NMAPI.CM-NMAPI,fr=A.11.17.00,fa=HP-UX_B.11.23_PA,v=HP:
3668416854 368312 /usr/lbin/cmgmsd
641279329 471040 /opt/nmapi/nmapi2/lib/libnmapi2.1
2496652585 294592 /opt/nmapi/nmapi2/lib/pa20_64/libnmapi2.1
SG-NMAPI.CM-NMAPI,fr=A.11.17.00,fa=HP-UX_B.11.23_IA,v=HP:
3349193792 305492 /usr/lbin/cmgmsd
2635594499 580496 /opt/nmapi/nmapi2/lib/hpux32/
libnmapi2.so.1
583002233 591768 /opt/nmapi/nmapi2/lib/hpux64/libnmapi2.so.1
Patch Conflicts: None
Patch Dependencies: None
Hardware Dependencies: None
Other Dependencies: None
Supersedes:
PHSS_33838
Equivalent Patches: None
Patch Package Size: 670 KBytes
Installation Instructions:
Please review all instructions and the Hewlett-Packard
SupportLine User Guide or your Hewlett-Packard support terms
and conditions for precautions, scope of license,
restrictions, and, limitation of liability and warranties,
before installing this patch.
------------------------------------------------------------
1. Back up your system before installing a patch.
2. Login as root.
3. Copy the patch to the /tmp directory.
4. Move to the /tmp directory and unshar the patch:
cd /tmp
sh PHSS_34824
5. Run swinstall to install the patch:
swinstall -x autoreboot=true -x patch_match_target=true \
-s /tmp/PHSS_34824.depot
By default swinstall will archive the original software in
/var/adm/sw/save/PHSS_34824. If you do not wish to retain a
copy of the original software, include the patch_save_files
option in the swinstall command above:
-x patch_save_files=false
WARNING: If patch_save_files is false when a patch is installed,
the patch cannot be deinstalled. Please be careful
when using this feature.
For future reference, the contents of the PHSS_34824.text file is
available in the product readme:
swlist -l product -a readme -d @ /tmp/PHSS_34824.depot
To put this patch on a magnetic tape and install from the
tape drive, use the command:
dd if=/tmp/PHSS_34824.depot of=/dev/rmt/0m bs=2k
Special Installation Instructions:
For Serviceguard Clusters:
1) Halt Serviceguard on the node the patch is to be
installed on.
2) Install this patch on that node.
3) Restart Serviceguard on that node.
4) Patch needs to be installed on all nodes in the
cluster.
The above instructions apply to any A.11.17.00
Serviceguard cluster and include all configurations
including those using SGeRAC, SGeSAP, and Metrocluster,
for example.
|