Jump to content 日本-日本語
日本HPホーム 製品とサービス サポートとドライバ ソリューション ご購入方法
≫ お問い合わせ
日本HPホーム
企業ユーザ向けサポート情報   >  HP-UX サポート  >  セキュリティ報告&パッチダイジェスト翻訳版

PHSS_36636 s700_800 11.23 Serviceguard A.11.17.00

企業ユーザ向けサポート情報

HP-UX サポート
Tru64 サポート
OpenVMS サポート
セキュリティ報告&パッチダイジェスト翻訳版
技術情報ツリー
ソフトウェアアップデート情報
ITRC日本フォーラム

ITRC

パッチデータベース
技術情報ベースの検索
サポートケースマネージャ
ソフトウェア アップデート マネージャ (SUM)
ご利用の手順
日本HPサイトマップ
コンテンツに進む
パッチ名:   PHSS_36636

パッチ摘要: s700_800 11.23 Serviceguard A.11.17.00

作成日:  07/08/22

公開日:  07/08/30

ハードウェアプラットフォームおよびOSリリース:

	s700: 11.23
	s800: 11.23

現象:

	PHSS_36636:

	1. 不具合:JAGag34293 SR:8606480162
	Serviceguardの予約ネットワークポート(hacl-cfgポート5302/udpおよび
	5302/tcp)を使用するネットワーク上で他のアプリケーションが実行している
	と、cmqueryclが紛らわしいエラーメッセージを表示して終了することがあり
	ます。cmcheckconf/cmviewcl/cmgetconf/cmrunnode/cmapplyconf/cmrunclコマ
	ンドも同様です。

	# cmquerycl
	Unable to receive a datagram from the configuration
	daemon (cmclconfd): No message of desired type
	cmquerycl: Unable to find any configuration information

	# cmapplyconf -v -C cluster.ascii
	Checking cluster file: cluster.ascii
	Checking nodes ... Done
	Checking existing configuration ... Done
	Node <node1> is refusing Serviceguard communication.
	Please make sure that the proper  access is
	configured on node <node1> through either file-based
	access (pre-A.11.16 version) or role-based access
	version A.11.16 or higher) and/or that the host name
	lookup on node <node1> resolves the IP address
	correctly.
	cmapplyconf: Failed to gather configuration information

	2. 不具合:JAGag34599 SR:8606480516
	スタンバイlanから一次lanへの切り替え時に、lanフェイルオーバーメッセー
	ジが数百回syslogに記録されることがあります。通常は、2つのサブネット(パ
	ッケージipが構成されているものと構成されていないもの)がほぼ同時に障害
	から回復すると、この現象が起きることがあります。以下のメッセージが
	syslogに記録されます。

	cmcld: lan2 switched to lan1
	(このメッセージが440回繰り返されます)
	cmcld: lan2 switched to lan1

	3. 不具合:JAGag37580 SR:8606484458
	cmsrvassistdが何らかの理由で終了すると、システムTOCが発生しますが、
	問題の原因を特定する情報が表示されません。次のようなメッセージがsyslog
	に記録されます。

	cmcld: Service assistant daemon died unexpectedly!
	It may be due to a pending reboot or panic.
	cmcld: Exiting with status 1.

	4. 不具合:JAGag41341 SR:8606488693
	クラスタ構成ファイル内でHEARTBEAT_INTERVALが設定されていないと、
	cmapplyconfは成功しますが、cmrunclでコアダンプが取られます。

	cmviewconfを実行すると、ハートビート間隔が0に設定されていることがわか
	ります。

	# cmviewconf
	Cluster information:
	  cluster name:			  abc
	  heartbeat interval:		  0.00 (seconds)

	5. 不具合:JAGag42544 SR:8606490065
	VxFS 4.1以降をインストールしたHP-UX 11.31システムで、論理ボリュームグ
	ループを照会するたびに、cmclconfdが次のような不正なsyslogメッセージを
	表示します。

	cmclconfd: Cannot recognize version 6 or later VxFS file
	systems. Make sure that libc patch PHCO_32488 or later
	is installed if such file systems are used.

	HP-UX 11.23システム上のServiceguardでは、この問題は起きません。

	6. 不具合:JAGag36170 SR:8606482272
	ごく稀に、セグメンテーション違反により、cmcldでコアダンプが取られるこ
	とがあります。スタックトレースの最後の2フレームは以下の関数を示してい
	ます。

	#0  0x175b60 in cl_list_remove+0xbc ()
	#1  0x161024 in st_delete_callback_private+0x548 ()

	7. 不具合:JAGag37994 SR:8606484939
	cmhaltnodeコマンドまたはcmhaltclコマンドを同時に実行すると、cmviewclが
	システムマルチノードパッケージのステータスを"starting"と表示します。

	8. 不具合:JAGag42785 SR:8606490333
	SIGSEGVによりudp cmclconfdデーモンでコアダンプが取られるため、
	Serviceguard関連コマンドが異常終了することがあります。以下が、その際に
	表示されるエラーメッセージの1例です。

	# cmcheckconf -v -C ./cmclconf.ascii
	Checking cluster file: ./cmclconf.ascii
	Checking nodes ... Done
	Checking existing configuration ... Done
	Warning: Can not find configuration for cluster
	<cluster_name>
	Error: Unable to establish communication to node
	<node_name>: 19
	cmcheckconf : Failed to gather configuration
	information

	syslogには次のようなメッセージが記録されます。

	inetd: hacl-cfg/udp: Died on signal 11

	スタックトレースは以下のようになっています。

	#0  0x60000000c0320300:0 in
	    T_19_f81_cl___doprnt_main+0x99b0 ()
	    from /usr/lib/hpux32/libc.so.1
	#1  0x60000000c030e570:0 in _doprnt+0x30 ()
	    from /usr/lib/hpux32/libc.so.1
	#2  0x60000000c03341c0:0 in snprintf+0x140 ()
	    from /usr/lib/hpux32/libc.so.1
	#3  0x432b920:0 in add_alias_ip_addrs+0xe0 ()
	#4  0x432e110:0 in
	    sg_sec_check_filebased_security+0x10b0 ()
	#5  0x432fa40:0 in sg_get_security_privilege+0x220 ()
	#6  0x40d0c20:0 in get_udp_message+0x850 ()
	#7  0x40d37f0:0 in main+0x27a0 ()

	9. 不具合:JAGag43673 SR:8606491388
	パッケージの起動時に、Metrocluster/SRDF環境でのVxVMディスクグループの
	インポートがエラーになることがあります。通常は、SRDF R2側のRDFデバイス
	グループの再構成中にSRDF R1側のノードがリブートまたは再起動されると、
	この問題が起きます。RDFの再構成時に、R1側のデバイスがある状態になるこ
	とがあるため、システムブート時に実行される"VxVMディスクの走査"コマンド
	がそれらのデバイスを"offline"とマークします。そのため、その後、R1側の
	パッケージがVxVMディスクグループをインポートしようとすると、エラーにな
	ります。

	10.不具合:JAGag42796 SR:8606490345
	あるノードでのクラスタサービスの停止時に外部IPアドレスが構成されている
	と、ノードがクラスタから離脱しても、cmcldがメモリーを消費し続けます。
	そして、そのメモリー使用量がカーネルでの上限値に達した時点でコアダンプ
	が取られます。そのため、外部IPアドレスが削除されません。したがって、
	cmcldでのコアダンプ後、手動で外部IPアドレスを削除せざるを得ません。

	コアファイルのスタックトレースはその都度変わりますが、関数
	"add_netsen_shutdown_links_to_chain"が含まれていることがよくあります。

	#0  0x60000000c035d830:0 in _brk+0x30 ()
	    from /usr/lib/hpux32/libc.so.1
	#1  0x60000000c036cb00:0 in sbrk+0xf0 ()
	    from /usr/lib/hpux32/libc.so.1
	#2  0x60000000c0231980:0 in malloc_sbrk+0x280 ()
	    from            /usr/lib/hpux32/libc.so.1
	#3  0x60000000c0232590:0 in grow_arena+0x210 ()
	    from             /usr/lib/hpux32/libc.so.1
	#4  0x60000000c022fa40:0 in real_malloc+0x920 ()
	    from        /usr/lib/hpux32/libc.so.1
	#5  0x60000000c022ef20:0 in _malloc+0x800 ()
	    from        /usr/lib/hpux32/libc.so.1
	#6  0x60000000c023c950:0 in malloc+0x140 ()
	    from        /usr/lib/hpux32/libc.so.1
	#7  0x41c2ae0:0 in add_netsen_shutdown_links_to_chain
	    ()      at netsen/ns_shutdown_chain.c:222
	#8  0x41c3540:0 in ns_start_shutdown_chain () at
	    netsen/ns_shutdown_chain.c:342
	#9  0x41c3880:0 in ns_shutdown () at
	    netsen/ns_shutdown_chain.c:372
	#10 0x42bc760:0 in cl_chain_link_done () at
	    utils/cl_chain.c:121
	#11 0x436b690:0 in cm_shutdown_event_handler ()
	    at cm/utils.c:708
	#12 0x42c18c0:0 in cl_event_loop () at
	    utils/cl_event.c:460
	#13 0x60000000c00c7420:0 in
	    __pthread_bound_body+0x170 ()
	    from /usr/lib/hpux32/libpthread.so.1

	11.不具合:JAGag44706 SR:8606492535
	クラスタ内に多くのパッケージが構成されていると、cmcldが頻繁に以下のメ
	ッセージを記録します。

	cmcld: Unable to set socket buffer size to 360448
	bytes (No buffer space available), continuing anyway.

	ただし、Serviceguardはこの状態を適切に処理するので、これらのメッセージ
	はエラーを示しているわけではありません。

	12.不具合:JAGag43289 SR:8606490922
	"cmviewcl -f line"が常に、クラスタ内のリモートノードのos_statusを
	"unknown"と表示します。

	13.不具合:JAGag45533 SR:8606493360
	cmviewconfが、"enabled"状態のサービスフェイルファーストフラグを
	"disabled"と表示します。

	14.不具合:JAGag45718 SR:8606493785
	クラスタASCII構成ファイル内で、無効なバックスラッシュ文字("\")を使って
	クォーラムサーバーのホスト名QS_HOSTを指定すると、cmapplyconfでコアダン
	プが取られます。スタックトレースには、以下の関数が表示されます。

	0x60000000c0345510:0 in kill+0x30 ()
	from /usr/lib/hpux32/libc.so.1
	#1 0x60000000c023bd50:0 in raise+0x30 ()
	   from /usr/lib/hpux32/libc.so.1
	#2 0x60000000c02ff250:0 in abort+0x190 ()
	   from /usr/lib/hpux32/libc.so.1
	#3 0x40a7510:0 in cdb_db_commit+0x890 ()
	#4 0x40af270:0 in cdb_external_access+0x890 ()
	#5 0x40bc660:0 in cl_config_commit_transaction+0x1560()
	#6 0x4181520:0 in cf_configure_cluster+0x2ab0 ()
	#7 0x4133b10:0 in config_main+0x5480 ()
	#8 0x4146590:0 in main+0x900 ()

	15.不具合:JAGag41937 SR:8606489376
	クラスタの再編成時にクラスタ内のあるノードで複数のハングが起きると、
	ハング中のノードとコーディネータの候補がセーフティタイマーの時間切れで
	終了します。

	16.不具合:JAGag46086 SR:8606494153
	複数のQS_HOSTエントリを指定すると、cmapplyconfが不適切なエラーメッセー
	ジを表示します。

	cmqueryclコマンドの最後のパラメータとして-qオプションを指定すると、
	cmqueryclでコアダンプが取られることがあります。

	"cmviewcl -v -f line"コマンドがクォーラムサーバーのipアドレスを不正な
	フォーマットで表示します。

	quorum_server:<node_name>|ip_address:192.76.1.2|name=192.76.1.2

	次のように表示すべきです。

	quorum_server:<node_name>|ip_address=192.76.1.2|name=192.76.1.2

	17.不具合:JAGag46092 SR:8606494159
	cmapplyconfコマンドはクラスタの稼動中に、QS_POLLING_INTERVAL/
	QS_TIMEOUT_EXTENSION値の変更を受け入れるようですが、実際には、クラスタ
	構成ではこれらの値は変更されません。クラスタが稼動中の場合、
	cmapplyconfはこれらのパラメータの変更を禁止すべきです。

	18.不具合:JAGag27798 SR:8606473093
	停止したノードがクラスタの他のノードと通信できないと、停止したノード上
	のcmviewclが、下に示したように、パッケージのSTATUSとSTATEをそれぞれ
	"down"、"halted"と表示します。しかし、パッケージは他のノード上で正常に
	実行しているので、パッケージのSTATEはUNKNOWNと表示されるべきです。

	停止したnode2ノードが他のクラスタノードと通信できないと、そのノード上
	のcmviewclが以下の情報を返します。

	# cmviewcl

	 CLUSTER      STATUS
	 cluster1    unknown

	 NODE          STATUS      STATE
	 node1        unknown      unknown
	 node2          down        unknown

	 UNOWNED_PACKAGES
	 PACKAGE  STATUS  STATE  AUTO_RUN  NODE

	  pkg1    down    halted  enabled  unowned

	19.不具合:JAGag38581 SR:8606485608
	監視/管理アクセス権限を持つ非rootユーザーが、クラスタの構成済みアクセ
	ス制御ポリシーをすべて表示できます。

問題点の説明:

	PHSS_36636:

	1. 不具合:JAGag34293 SR:8606480162
	Serviceguardは、無効なメッセージを受信するたびにエラーメッセージを表示
	して終了していました。

	解決方法:
	無効なメッセージを無視するようにコードを修正しました。

	2. 不具合:JAGag34599 SR:8606480516
	異なるブリッジネットワーク間である共有変数を使用していたため、2つのサ
	ブネットがほぼ同時に回復すると、その共有変数が矛盾した状態になっていま
	した。その結果、ブリッジネットワークごとに上記のメッセージが記録されて
	いました。

	解決方法:
	異なるブリッジネットワーク間で共有変数を使用しないようにコードを修正し
	ました。

	3. 不具合:JAGag37580 SR:8606484458
	A.11.17では、あらゆる場合に、cmsrvassistdの終了ステータスコードが削除
	されていました。

	解決方法:
	cmsrvassistdの終了ステータスを処理するコードを追加しました。

	4. 不具合:JAGag41341 SR:8606488693
	クラスタ構成ファイル内でHEARTBEAT_INTERVALが設定されていない場合を想定
	していなかったため、デフォルト値が0に設定されていました。

	解決方法:
	このようなケースを適切に処理するようにコードを修正しました。

	5. 不具合:JAGag42544 SR:8606490065
	上記のsyslogメッセージは、HP-UX 11.23システム専用のメッセージです。
	HP-UX 11.31システムには適用できません。ところが、記録する前に、そのマ
	シン上で実行されているOSのバージョンがHP-UX 11.23かどうかチェックして
	いませんでした。

	解決方法:
	HP-UX 11.31システムの場合は上記のsyslogメッセージを表示しないようにコ
	ードを修正しました。

	6. 不具合:JAGag36170 SR:8606482272
	Serviceguardは、コールバック構造を更新する前にmutexをロックしますが、
	コールバックをリストから削除する前にmutexをロック解除していました。

	解決方法:
	コールバックをリストから削除した後、mutexをロック解除するようにコード
	を修正しました。

	7. 不具合:JAGag37994 SR:8606484939
	ノード/クラスタの停止時に、システムマルチノードパッケージのステータス
	が"starting"と表示されていました。

	解決方法:
	cmhaltnode/cmhaltclの実行時には、システムマルチノードパッケージのステ
	ータスを"changing"と表示するようにコードを修正しました。

	8  不具合:JAGag42785 SR:8606490333
	hostent構造を使用している最中に、hostent構造が次のgethost*()呼び出しに
	よって変更されていました。

	解決方法:
	次のgethost*()呼び出しから保護するために、実際のhostent構造ではなく、
	そのコピーを使用するようにコードを修正しました。

	9. 不具合:JAGag43673 SR:8606491388
	データをリモート側(R2)から一次側(R1)へリフレッシュするディスク操作の実
	行中に、それらのディスクはごく短時間、"Not Ready"(ホストに非表示)状態
	になります。この間に、R1上の一次ノードがリブートし、ブートシーケンスの
	一部としてVxVMが起動すると、VxVMは、それらのディスクにアクセスできない
	ため、それらの"Not Ready"ディスクを"offline"とマークします。その結果、
	VxVMディスクグループをインポートできないため、パッケージのフェイルバッ
	ク時に、パッケージが起動できませんでした。

	解決方法:
	"YES"または"NO"に設定できる新たなパラメータ"VXVM_DG_RETRY"を
	Serviceguardパッケージ制御スクリプトに導入しました。このパラメータを
	"YES"に設定すると、障害ディスクグループに属するディスクに対して"vxdisk
	scandisks"が実行されます。

	10.不具合:JAGag42796 SR:8606490345
	クラスタ構成内にはないサブネット上のリロケータブルIPアドレスを使用する
	パッケージが構成されたクラスタ上でcmhaltcl/cmhaltnodeを実行すると、
	cmcldのサイズが増大してコアダンプが取られていました。IPアドレスの削除
	時に、配列の要素を適切に拡張していませんでした。

	解決方法:
	配列の要素を適切に拡張するようにコードを修正しました。

	11.不具合:JAGag44706 SR:8606492535
	unixドメインソケットのデフォルトのバッファサイズが不十分な場合、ログメ
	ッセージがデフォルトのログレベルで記録されていました。しかし、cl_msgの
	フロー制御は、サイズの調整とその送信を処理します。

	解決方法:
	メッセージがデフォルトのログレベルで記録されないように、メッセージのロ
	グレベルを上げ、かつ、ログカテゴリを変更しました。

	12.不具合:JAGag43289 SR:8606490922
	ノードのプロービングを行わない限り、cmviewclはos_status値を取得できま
	せん。ノードのプロービングを行うのは、verboseオプションが指定された場
	合だけです。したがって、verboseオプションが指定されていない場合、
	cmviewclはos_statusを表示してはいけません。

	解決方法:
	verboseオプションが指定された場合にだけos_statusを表示するように
	cmviewclを修正しました。

	13.不具合:JAGag45533 SR:8606493360
	サービスフェイルファーストフラグのステータスを表示する際に、cmviewconf
	コマンドは異なるバイト順値を使ってフラグを比較していたため、フラグの不
	正なステータスが表示されていました。

	解決方法:
	正しいバイト順比較を行ってサービスフェイルファーストフラグの正しいステ
	ータスを取得するようにコードを修正しました。

	14.不具合:JAGag45718 SR:8606493785
	クラスタASCII構成ファイル内で、無効なバックスラッシュ文字("\")を使って
	QS_HOST値を指定すると、cmapplyconfでコアダンプが取られていました。

	解決方法:
	クラスタ構成ファイル内に指定されているQS_HOST/QS_ADDRの値に無効なバッ
	クスラッシュ文字が含まれていないかチェックし、含まれていればエラーメッ
	セージを表示するようにコードを修正しました。

	15.不具合:JAGag41937 SR:8606489376
	ハングしたノードは、そのセーフティタイマーを更新できないため、セーフテ
	ィタイマーの時間切れで終了していました。一方、コーディネータの候補は、
	そのセーフティタイマーを更新するために、ハングしたノードからのハートビ
	ート待ちで待機しているため、同様にセーフティタイマーの時間切れで終了し
	ていました。

	解決方法:
	ノードがハングしたら、その影響がクラスタ内の他のノードに及ぶ前にそのノ
	ードを強制終了するようにコードを修正しました。

	16.不具合:JAGag46086 SR:8606494153
	複数のQS_HOSTエントリを指定すると、cmapplyconfが不適切なエラーメッセー
	ジを表示していました。

	コマンド行からの-qオプションの読み取り時に、cmqueryclは配列の添え字を
	正しく増分していませんでした。

	"cmviewcl -v -f line"は、クォーラムサーバーのipアドレスを不正なフォー
	マットで表示していました。

	解決方法:
	適切なエラーメッセージを表示するようにcmapplyconfを修正しました。

	コマンド行引き数から-qオプションを正しく読み取るようにcmqueryclを修正
	しました。

	ip_addressesを正しいフォーマットで表示するように"cmviewcl -v -f line"
	を修正しました。

	17.不具合:JAGag46092 SR:8606494159
	cmapplyconfは、QS_POLLING_INTERVAL/QS_TIMEOUT_EXTENSION値のオンライン
	変更をチェックしていませんでした。

	解決方法:
	QS_POLLING_INTERVAL/QS_TIMEOUT_EXTENSIONのオンライン変更を禁止するよう
	にcmapplyconfを修正しました。オンライン変更を行おうとすると、
	cmapplyconfはエラーで終了します。

	18.不具合:JAGag27798 SR:8606473093
	cmviewclは、クラスタ到達可能ステータスをチェックせずに、すべての非所有
	パッケージのSTATUSとSTATEをデフォルトで、それぞれ"down"、"halted"と表
	示していました。

	解決方法:
	クラスタのノードが到達可能でない場合は、パッケージのSTATUSを"unknown"
	と表示するようにcmviewclを修正しました。

	19. 不具合:JAGag38581 SR:8606485608
	クラスタ情報を表示するコマンドは、コマンドの実行権限を持つ非rootユーザ
	ーに対して、構成済みのアクセス制御ポリシーをすべて表示していました。
	これ自体は問題ではありませんが、同じかより高いレベルのアクセス権を持つ
	ユーザーに対してだけこれらのポリシーを表示するようにコマンドを修正する
	ことにしました。

	解決方法:
	表示するロールを、ユーザー名と(コマンドを実行する)ホストに基づいてコマ
	ンドを実行するユーザーの権限レベルと一致させました。

-----------------------------------------------------------------------------
Patch Name: PHSS_36636

Patch Description: s700_800 11.23 Serviceguard A.11.17.00

Creation Date: 07/08/22

Post Date: 07/08/30

Hardware Platforms - OS Releases: 
	s700: 11.23
	s800: 11.23

Products: 
	Serviceguard A.11.17.00

Filesets: 
	Cluster-Monitor.CM-CORE,fr=A.11.17.00,fa=HP-UX_B.11.23_IA,v=HP
	Package-CVM-CFS.CM-CVM-CFS,fr=A.11.17.00,fa=HP-UX_B.11.23_IA,v=HP
	Package-Manager.CM-PKG,fr=A.11.17.00,fa=HP-UX_B.11.23_IA,v=HP
	Cluster-Monitor.CM-CORE,fr=A.11.17.00,fa=HP-UX_B.11.23_PA,v=HP
	Package-CVM-CFS.CM-CVM-CFS,fr=A.11.17.00,fa=HP-UX_B.11.23_PA,v=HP
	Package-Manager.CM-PKG,fr=A.11.17.00,fa=HP-UX_B.11.23_PA,v=HP
	Cluster-Monitor.CM-CORE-COM,fr=A.11.17.00,fa=HP-UX_B.11.23_IA/PA,v=HP
	Package-CVM-CFS.CM-CVM-CFS-COM,fr=A.11.17.00,fa=HP-UX_B.11.23_IA/PA,v=HP
	Package-Manager.CM-PKG-MAN,fr=A.11.17.00,fa=HP-UX_B.11.23_IA/PA,v=HP
	Cluster-Monitor.CM-CORE-MAN,fr=A.11.17.00,fa=HP-UX_B.11.23_IA/PA,v=HP

Automatic Reboot?: No

Status: General Release

Critical: 
	Yes
	PHSS_36636: ABORT HANG PANIC

	   If HEARTBEAT_INTERVAL is not set in cluster
	   configuration file cmapplyconf succeeds but cmruncl
	   dumps core.

	   There is a very small timing window when cmcld can dump
	   core with a segmentation violation. The last 2 frames
	   in the stack trace will show the following functions:

	   #0  0x175b60 in cl_list_remove+0xbc ()
	   #1  0x161024 in st_delete_callback_private+0x548 ()

	   Serviceguard commands may fail due to the udp cmclconfd
	   daemon core dumping with a SIGSEGV. Below is an example
	   of the error message for one such command when this
	   happens:

	   # cmcheckconf -v -C ./cmclconf.ascii
	   Checking cluster file: ./cmclconf.ascii
	   Checking nodes ... Done
	   Checking existing configuration ... Done
	   Warning: Can not find configuration for cluster
	   <clsuter_name>
	   Error: Unable to establish communication to node
	   <node_name>: 19
	   cmcheckconf : Failed to gather configuration
	   information

	   The syslog would look like:
	   inetd: hacl-cfg/udp: Died on signal 11

	   The stack trace would look like:
	   #0  0x60000000c0320300:0 in
	   T_19_f81_cl___doprnt_main+0x99b0 () from
	       /usr/lib/hpux32/libc.so.1
	   #1  0x60000000c030e570:0 in _doprnt+0x30 () from
	       /usr/lib/hpux32/libc.so.1
	   #2  0x60000000c03341c0:0 in snprintf+0x140 ()
	       from /usr/lib/hpux32/libc.so.1
	   #3  0x432b920:0 in add_alias_ip_addrs+0xe0 ()
	   #4  0x432e110:0 in
	       sg_sec_check_filebased_security+0x10b0 ()
	   #5  0x432fa40:0 in sg_get_security_privilege+0x220 ()
	   #6  0x40d0c20:0 in get_udp_message+0x850 ()
	   #7  0x40d37f0:0 in main+0x27a0 ()

	   When the cluster services are halted on a node,
	   if external IP addresses are configured cmcld
	   will continue to consume memory after the node
	   has left the cluster until it reaches the kernel
	   limit at which point it will core dump. Due to this
	   it takes a long time to halt, since it will be trying
	   to clean up the ip resources which could not be
	   removed by cmmodnet during halt procedure.

	   The stack traces of the resulting core files can vary
	   but often include one of the functions
	   "add_netsen_shutdown_links_to_chain".

	   #0  0x60000000c035d830:0 in _brk+0x30 ()
	       from /usr/lib/hpux32/libc.so.1
	   #1  0x60000000c036cb00:0 in sbrk+0xf0 ()
	       from /usr/lib/hpux32/libc.so.1
	   #2  0x60000000c0231980:0 in malloc_sbrk+0x280 ()
	       from /usr/lib/hpux32/libc.so.1
	   #3  0x60000000c0232590:0 in grow_arena+0x210 ()
	       from /usr/lib/hpux32/libc.so.1
	   #4  0x60000000c022fa40:0 in real_malloc+0x920 ()
	       from /usr/lib/hpux32/libc.so.1
	   #5  0x60000000c022ef20:0 in _malloc+0x800 ()
	       from /usr/lib/hpux32/libc.so.1
	   #6  0x60000000c023c950:0 in malloc+0x140 ()
	       from /usr/lib/hpux32/libc.so.1
	   #7  0x41c2ae0:0 in add_netsen_shutdown_links_to_chain
	       ()  at netsen/ns_shutdown_chain.c:222
	   #8  0x41c3540:0 in ns_start_shutdown_chain () at
	         netsen/ns_shutdown_chain.c:342
	   #9  0x41c3880:0 in ns_shutdown () at
	       netsen/ns_shutdown_chain.c:372
	   #10 0x42bc760:0 in cl_chain_link_done () at
	       utils/cl_chain.c:121
	   #11 0x436b690:0 in cm_shutdown_event_handler ()
	       at cm/utils.c:708
	   #12 0x42c18c0:0 in cl_event_loop () at
	       utils/cl_event.c:460
	   #13 0x60000000c00c7420:0 in
	       __pthread_bound_body+0x170 ()
	       from /usr/lib/hpux32/libpthread.so.1

	   In the cluster ASCII configuration file, if QS_HOST
	   value is specified with invalid, backslash character
	   ('\'), cmapplyconf dumps core. The stack trace
	   will show the following functions:

	   0x60000000c0345510:0 in kill+0x30 ()
	       from /usr/lib/hpux32/libc.so.1
	   #1  0x60000000c023bd50:0 in raise+0x30 ()
	       from /usr/lib/hpux32/libc.so.1
	   #2  0x60000000c02ff250:0 in abort+0x190 ()
	       from /usr/lib/hpux32/libc.so.1
	   #3  0x40a7510:0 in cdb_db_commit+0x890 ()
	   #4  0x40af270:0 in cdb_external_access+0x890 ()
	   #5  0x40bc660:0 in cl_config_commit_transaction+0x1560()
	   #6  0x4181520:0 in cf_configure_cluster+0x2ab0 ()
	   #7  0x4133b10:0 in config_main+0x5480 ()
	   #8  0x4146590:0 in main+0x900 ()

	   A node in a cluster experiencing multiple hangs during
	   cluster reformation can cause the node experiencing the
	   hangs and the candidate for coordinator to die when
	   safety timer expires.

	PHSS_35427: ABORT HANG PANIC
	   cmcld aborts when the select() system call is
	   interrupted by a signal. This results in the node
	   being reset by the safety timer.

	   cmsrvassistd will loop when the script or program
	   specified in a package SERVICE_CMD parameter does not
	   exist or does not have execute permission, attempting to
	   restart the service until the defined maximum service
	   restart count has been reached. If the count is infinite
	   cmsrvassistd will take large amounts of CPU effectively
	   taking over a single cpu system.

	   System repeatedly TOC's when AUTOSTART_CMCLD is set
	   to 1 if system multinode package is unable to start.

	   pthreads patch PHCO_34944 or later exposes a defect in
	   Serviceguard on uniprocessor systems which can lead to
	   cmcld consuming 100% of cpu resulting in a hang or system
	   TOC. This does not apply to multi-processor systems.

	   When cmgmsd cannot be halted correctly within timeout,
	   cmcld hits an assertion and the node will TOC if the
	   safety timer is still enabled. But, there is no core
	   from cmgmsd to understand the reason why it could not
	   halt.

	   With CVM4.1, failover time could be increased by a few
	   seconds if a FAILFAST service fails while the cluster is
	   reforming. This can delay the TOC of the local node and
	   eventually cluster reformation.

	   The Serviceguard daemon cmlvmd terminates upon receipt
	   of SIGHUP. This causes cmcld to abort and a potential
	   node TOC.

	   Improved integration with Distributed Systems
	   Administration Utilities (DSAU). Intermittent command
	   hangs in cmapplyconf and cmdeleteconf have been seen
	   when DSAU is running on nodes in a Serviceguard cluster.

	PHSS_35371: ABORT PANIC
	   Reuse of memory during reprobe of DGC disks can lead
	   cmclconfd to SIGSEGV resulting in command failures.

	   Invalid data can be specified in the USER_NAME field for
	   the access control policies in the cluster ascii file
	   and a cmapplyconf will complete without error. When a
	   cmapplyconf is re-executed to correct this, and if
	   the cluster is running, cmcld will abort, resulting
	   in a node TOC.

	PHSS_34337: ABORT HANG PANIC CORRUPTION
	   Corruption in link level messages can lead to cmcld
	   SIGSEGV even with checksumed messages, the stack traces
	   of the resulting core files can vary but often include
	   one of the functions ns_if_setgood or dlpi_recv.

	   A socket call failure due to insufficient available
	   memory causes cmcld to abort.

	   UDP messages were not marked as invalid even if there
	   were invalid values for length and offset fields in the
	   message, causing cmclconfd to exit without receiving
	   the message and/or cmviewcl to spin indefinitely.

	   The Serviceguard NMAPI interface fails if the file
	   descriptor used to connect to cmgmsd is greater than the
	   default FD_SETSIZE, i.e. 24576 causing data corruption
	   of the client process.

	   Formation of 2 clusters may potentially result in
	   packages running on 2 nodes at the same time and may
	   potentially result in data corruption issues.

	   When no buffer space is available, the LVM daemon
	   aborts, causing the cluster daemon to also abort,
	   leading to a TOC.

	   When the timer loop thread is stuck (not holding
	   cm_lock) or the system clock is not advancing, cmcld
	   threads will not be scheduled. This prevents cmcld
	   timeout and prevents the safety timer being updated
	   resulting in all nodes being TOC'd.

	   The cmviewconf command can core dump if it cannot get
	   node information (for example, if it cannot contact the
	   cmclconfd daemon).

	   When connection fails between cmcld and quorum server
	   frequently and at adjacent intervals, cmcld may dump
	   core.

	   Memory was freed twice during cluster reformation may
	   cause cmcld to dump core.

	   Removing a node from a cluster when a package is
	   running on that node causes cmquerycl to dump core.

	PHSS_33840: HANG ABORT
	   After deleting a node from the cluster, the
	   configuration daemon (cmclconfd) on the deleted
	   node goes into an infinite loop.

	   When multiple nodes are started and join an existing
	   cluster at the same time, a node may abort.

	   cmcld aborts because cmapplyconf incorrectly passes
	   rather than failing as it should when 'detected a
	   partition of IP subnet' and 'minimum network
	   configuration requirement for the cluster have not
	   been met.'

	   cmapplyconf can core dump in the situation where
	    one of the members has a network interface that
	    may have intermittent problems, where the device's
	    availability turns on and off.  If the administrator
	    runs cmapplyconf to modify the cluster configuration
	    and one of the members has the network device that
	    has intermittent problems the cmapplyconf will fail
	    and core dump.

Category Tags: 
	defect_repair enhancement general_release critical panic
	halts_system corruption manual_dependencies

Path Name: /hp-ux_patches/s700_800/11.X/PHSS_36636

Symptoms: 
	PHSS_36636:

	1. Defect: JAGag34293 SR: 8606480162
	   Other applications on the network using Serviceguard
	   reserved network ports (hacl-cfg ports 5302/udp
	   and 5302/tcp) can cause cmquerycl to fail unexpectedly
	   with misleading error messages. This is also applicable
	   to cmcheckconf, cmviewcl, cmgetconf, cmrunnode,
	   cmapplyconf and cmruncl commands.

	   # cmquerycl
	   Unable to receive a datagram from the configuration
	   daemon (cmclconfd): No message of desired type
	   cmquerycl: Unable to find any configuration information

	   # cmapplyconf -v -C cluster.ascii
	   Checking cluster file: cluster.ascii
	   Checking nodes ... Done
	   Checking existing configuration ... Done
	   Node <node1> is refusing Serviceguard communication.
	   Please make sure that the proper  access is
	   configured on node <node1> through either file-based
	   access (pre-A.11.16 version) or role-based access
	   version A.11.16 or higher) and/or that the host name
	   lookup on node <node1> resolves the IP address
	   correctly.
	   cmapplyconf: Failed to gather configuration information

	2. Defect: JAGag34599 SR: 8606480516
	   Switching back from standby lan to primary lan can
	   cause lan failover message to be displayed multiple
	   times in the syslog. This is seen typically when
	   two subnets, one with package ip configured and the
	   other without, recover from failure at about the
	   same time. Messages similar to the following can
	   be seen in syslog.

	   cmcld: lan2 switched to lan1
	   above message repeats 440 times
	   cmcld: lan2 switched to lan1

	3. Defect: JAGag37580 SR: 8606484458
	   If cmsrvassistd terminates for any reason, there will
	   be a system TOC, but no information to determine the
	   cause of the problem. The following message is
	   displayed in syslog:

	   cmcld: Service assistant daemon died unexpectedly!
	   It may be due to a pending reboot or panic.
	   cmcld: Exiting with status 1.

	4. Defect: JAGag41341 SR: 8606488693
	   If HEARTBEAT_INTERVAL is not set in cluster
	   configuration file, cmapplyconf succeeds but cmruncl
	   dumps core.

	   On doing a cmviewconf, we can see that the heartbeat
	   interval is set to zero.

	   # cmviewconf
	   Cluster information:
	     cluster name:			  abc
	     heartbeat interval:		  0.00 (seconds)

	5. Defect: JAGag42544 SR: 8606490065
	   Each time the logical volume groups are queried by
	   cmclconfd on a HP-UX 11.31 system which has VxFS 4.1
	   or later installed, the following false syslog message
	   is displayed:

	   cmclconfd: Cannot recognize version 6 or later VxFS file
	   systems. Make sure that libc patch PHCO_32488 or later
	   is installed if such file systems are used.

	   This defect is not applicable to Serviceguard on HP-UX
	   11.23 systems.

	6. Defect: JAGag36170 SR: 8606482272
	   There is a very small timing window when cmcld can dump
	   core with a segmentation violation. The last 2 frames
	   in the stack trace will show the following functions:

	   #0  0x175b60 in cl_list_remove+0xbc ()
	   #1  0x161024 in st_delete_callback_private+0x548 ()

	7. Defect: JAGag37994  SR: 8606484939
	   cmviewcl displays the status of system multinode
	   packages as "starting" when a cmhaltnode or cmhaltcl
	   command is executing at the same time.

	8.  Defect: JAGag42785  SR: 8606490333
	    Serviceguard commands may fail due to the udp cmclconfd
	    daemon core dumping with a SIGSEGV. Below is an example
	    of the error message for one such command when this
	    happens:

	    # cmcheckconf -v -C ./cmclconf.ascii
	    Checking cluster file: ./cmclconf.ascii
	    Checking nodes ... Done
	    Checking existing configuration ... Done
	    Warning: Can not find configuration for cluster
	    <cluster_name>
	    Error: Unable to establish communication to node
	    <node_name>: 19
	    cmcheckconf : Failed to gather configuration
	    information

	    The syslog for this would look like:
	    inetd: hacl-cfg/udp: Died on signal 11

	    The stack trace would look like:
	    #0  0x60000000c0320300:0 in
	        T_19_f81_cl___doprnt_main+0x99b0 ()
	    from /usr/lib/hpux32/libc.so.1
	    #1  0x60000000c030e570:0 in _doprnt+0x30 ()
	        from /usr/lib/hpux32/libc.so.1
	    #2  0x60000000c03341c0:0 in snprintf+0x140 ()
	        from /usr/lib/hpux32/libc.so.1
	    #3  0x432b920:0 in add_alias_ip_addrs+0xe0 ()
	    #4  0x432e110:0 in
	        sg_sec_check_filebased_security+0x10b0 ()
	    #5  0x432fa40:0 in sg_get_security_privilege+0x220 ()
	    #6  0x40d0c20:0 in get_udp_message+0x850 ()
	    #7  0x40d37f0:0 in main+0x27a0 ()

	9.  Defect: JAGag43673 SR: 8606491388
	    The importing of VxVM disk groups in
	    Metrocluster/SRDF environment can fail during
	    package start up.  This typically happens when
	    nodes on the SRDF R1 side are rebooted or
	    restarted while SRDF is in the process of
	    reconfiguring the RDF device groups on the SRDF R2
	    side. During RDF reconfiguration, the devices on the
	    R1 side can be in a state which causes the VxVM disk
	    scan done at system boot up to label the devices as
	    offline, then later when a package on the R1 side
	    attempts to import the VxVM disk groups, the import
	    will fail.

	10. Defect: JAGag42796 SR: 8606490345
	    When the cluster services are halted on a node,
	    if external IP addresses are configured cmcld
	    will continue to consume memory after the node
	    has left the cluster until it reaches the kernel
	    limit at which point it will core dump. The external
	    IP addresses do not get removed and need to be removed
	    manually after cmcld dumps core.

	    The stack traces of the resulting core files can vary
	    but often include one of the functions
	    "add_netsen_shutdown_links_to_chain".

	     #0  0x60000000c035d830:0 in _brk+0x30 ()
	     from /usr/lib/hpux32/libc.so.1
	     #1  0x60000000c036cb00:0 in sbrk+0xf0 ()
	     from /usr/lib/hpux32/libc.so.1
	     #2  0x60000000c0231980:0 in malloc_sbrk+0x280 ()
	     from            /usr/lib/hpux32/libc.so.1
	     #3  0x60000000c0232590:0 in grow_arena+0x210 ()
	     from             /usr/lib/hpux32/libc.so.1
	     #4  0x60000000c022fa40:0 in real_malloc+0x920 ()
	     from        /usr/lib/hpux32/libc.so.1
	     #5  0x60000000c022ef20:0 in _malloc+0x800 ()
	     from        /usr/lib/hpux32/libc.so.1
	     #6  0x60000000c023c950:0 in malloc+0x140 ()
	     from        /usr/lib/hpux32/libc.so.1
	     #7  0x41c2ae0:0 in add_netsen_shutdown_links_to_chain
	        ()      at netsen/ns_shutdown_chain.c:222
	     #8  0x41c3540:0 in ns_start_shutdown_chain () at
	           netsen/ns_shutdown_chain.c:342
	     #9  0x41c3880:0 in ns_shutdown () at
	         netsen/ns_shutdown_chain.c:372
	     #10 0x42bc760:0 in cl_chain_link_done () at
	         utils/cl_chain.c:121
	     #11 0x436b690:0 in cm_shutdown_event_handler ()
	         at cm/utils.c:708
	     #12 0x42c18c0:0 in cl_event_loop () at
	         utils/cl_event.c:460
	     #13 0x60000000c00c7420:0 in
	         __pthread_bound_body+0x170 ()
	            from /usr/lib/hpux32/libpthread.so.1

	11. Defect: JAGag44706 SR: 8606492535
	    cmcld logs the message below frequently when the number
	    of packages configured in the cluster is high.

	    cmcld: Unable to set socket buffer size to 360448
	    bytes (No buffer space available), continuing anyway.

	    These messages are not an indication of a failure, as
	    Serviceguard properly handles this situation.

	12. Defect: JAGag43289  SR: 8606490922
	    "cmviewcl -f line" always yields os_status as 'unknown'
	    for any remote node in a cluster.

	13. Defect: JAGag45533  SR: 8606493360
	    cmviewconf displays the service fail fast flag
	    as "disabled" even though the flag was enabled.

	14. Defect: JAGag45718 SR: 8606493785
	    In the cluster ASCII configuration file, if quorum
	    server hostname, QS_HOST, is specified with invalid,
	    backslash character ('\'), cmapplyconf dumps core.
	    The stack trace will show the following functions:

	    0x60000000c0345510:0 in kill+0x30 ()
	    from /usr/lib/hpux32/libc.so.1
	    #1  0x60000000c023bd50:0 in raise+0x30 ()
	    from /usr/lib/hpux32/libc.so.1
	    #2  0x60000000c02ff250:0 in abort+0x190 ()
	    from /usr/lib/hpux32/libc.so.1
	    #3 0x40a7510:0 in cdb_db_commit+0x890 ()
	    #4 0x40af270:0 in cdb_external_access+0x890 ()
	    #5 0x40bc660:0 in cl_config_commit_transaction+0x1560()
	    #6 0x4181520:0 in cf_configure_cluster+0x2ab0 ()
	    #7 0x4133b10:0 in config_main+0x5480 ()
	    #8 0x4146590:0 in main+0x900 ()

	15. Defect: JAGag41937 SR: 8606489376
	    A node in a cluster experiencing multiple hangs during
	    cluster reformation can cause the node experiencing
	    the hangs and the candidate for coordinator to die
	    when safety timer expires.

	16. Defect:JAGag46086 SR: 8606494153
	    cmapplyconf displays inappropriate error message when
	    multiple QS_HOST entries are specified.

	    There is a possibility that cmquerycl may core dump if
	    -q option is specified as last parameter in the
	    cmquerycl command.

	    In 'cmviewcl -v -f line' command quorum server ip
	    addresses are displayed in invalid format.
	    The 'cmviewcl -v -f line' output should be as

	    quorum_server:<node_name>|ip_address=192.76.1.2|
	    name=192.76.1.2
	    instead of
	    quorum_server:<node_name>|ip_address:192.76.1.2|
	    name=192.76.1.2

	17. Defect: JAGag46092 SR: 8606494159
	    While the cmapplyconf command appears to allow the
	    QS_POLLING_INTERVAL and QS_TIMEOUT_EXTENSION values
	    to be modified while the cluster is running, these
	    values are not actually changed in the cluster
	    configuration.
	    cmapplyconf should prevent these parameters from being
	    modified while the cluster is running.

	18. Defect: JAGag27798 SR: 8606473093
	    When a halted node cannot communicate with other
	    nodes of the cluster, cmviewcl on the halted node
	    displays the package status and state as "down"
	    and "halted" respectively, as shown below. But the
	    packages are running fine on the other nodes, so
	    the package state should be displayed as UNKNOWN.

	    When the node, node2 is halted and cannot communicate
	    with the other cluster nodes, cmviewcl on that node
	    returns the following:

	    # cmviewcl

	     CLUSTER      STATUS
	      cluster1    unknown

	     NODE          STATUS      STATE
	     node1        unknown      unknown
	     node2          down        unknown

	     UNOWNED_PACKAGES
	     PACKAGE  STATUS  STATE  AUTO_RUN  NODE

	      pkg1    down    halted  enabled  unowned

	19.  Defect: JAGag38581 SR: 8606485608
	     Non-root users having monitor or admin access
	     privileges can view all configured access control
	     policies for the cluster.

	PHSS_35427:

	1.  Defect: JAGag21443 SR: 8606465899
	    cmcld aborts when the select() system call is
	    interrupted by a signal. This results in the node
	    being reset by the safety timer. The following
	    messages will be logged in the syslog file:

	    cmcld[2257]: Aborting! select failed  (file:
	    lcomm/local_server.c, line: 1165)
	    cmcld[2257]: select for port 46356 failed with
	    Interrupted system call
	    cmcld[2257]: select for port 46100 failed with
	    Interrupted system call
	    cmcld[2257]: 29, 95774e60, 8ef
	    cmcld[2257]: 17 (read)
	    cmcld[2257]: 19 (read)
	    cmcld[2257]: 20 (read)
	    cmcld[2257]: 21 (read)
	    cmcld[2257]: 26 (read)
	    cmcld[2257]: 27 (read)
	    cmcld[2257]: 28 (read)
	    cmcld[2257]: 29 (read)
	    cmcld[2257]: Aborting! select failed (file:
	    rcomm/comm_ip.c, line: 443)
	    cmcld[2257]: 33, 95774e60, 8f0
	    cmcld[2257]: 22 (read)
	    cmcld[2257]: 23 (read)
	    cmcld[2257]: 24 (read)
	    cmcld[2257]: 25 (read)
	    cmcld[2257]: 30 (read)
	    cmcld[2257]: 31 (read)
	    cmcld[2257]: 32 (read)
	    cmcld[2257]: 33 (read)
	    cmcld[2257]: Aborting! select failed (file:
	    rcomm/comm_ip.c, line: 443)
	    cmclconfd[2256]: The Serviceguard daemon,
	    /opt/cmcluster/bin/cmcld[2257], died upon receiving
	    signal number 6.

	2.  Defect: JAGag20225 SR: 8606464542
	    Defect: JAGag29645 SR: 8606475212
	    cmquerycl, cmcheckconf and cmapplyconf commands log
	    errors in syslog if CD/DVD drives from TEAC and other
	    manufacturers are present in a node, though the command
	    succeeds.

	    The following messages may be seen in syslog.log:
	    cmclconfd[9660]: Error looking  up device
	    /dev/dsk/c17t1d0: /dev/config is not open.
	    cmclconfd[3730]: Unable to open
	    disk /dev/rdsk/c0t0d0: Error 0

	3.  Defect: JAGag06135 SR: 8606448943
	    The following warning message will be logged in the
	    flight recorder log, even if the kernel ticks since
	    boot are advancing.

	    FAILURE : Kernel ticks_since_boot has not been
	    advanced for 4.00 seconds, which is greater than or
	    equal to maximum allowable interval of 10.00 seconds.

	4.  Defect: JAGag14977 SR: 8606458777
	    Defect: JAGag33746 SR: 8606479578
	    The Serviceguard daemon cmcld, cmnetassistd
	    does not terminate upon receipt of SIGILL.

	5.  Defect: JAGag11719 SR: 8606455144
	    When cmgmsd cannot be halted correctly within timeout,
	    cmcld hits an assertion and the node will TOC if the
	    safety timer is still enabled. But, there is no core
	    from cmgmsd to understand the reason why it could not
	    halt. This is applicable only for SGeRAC installations.

	6.  Defect: JAGag25946 SR: 8606470887
	    When activating multiple volume groups at the same time
	    in a very heavily loaded system, if the parameter
	    CONCURRENT_VGCHANGE_OPERATIONS in the package
	    configuration file is set to greater than 1, some of
	    the vgchange commands might fail with the following
	    error message in the package log:

	    vgchange: Failed to establish a connection with cmlvmd
	    for volume group /dev/vg1

	7.  Defect: JAGag12644 SR: 8606456223
	    cmsrvassistd will loop when the script or program
	    specified in a package SERVICE_CMD parameter does not
	    exist or does not have execute permissions, attempting
	    to restart the service until the defined maximum service
	    restart count has been reached. If the count is infinite
	    cmsrvassistd will take large amounts of CPU effectively
	    taking over a single cpu system.

	8.  Defect: JAGag05782 SR: 8606448540
	    System repeatedly TOC's when AUTOSTART_CMCLD is set
	    to 1, if system multinode package is unable to start.

	9.  Defect: JAGag27672 SR: 8606472905
	    pthreads patch PHCO_34944 or later exposes a defect in
	    Serviceguard on uniprocessor systems which can lead to
	    cmcld consuming 100% of cpu resulting in a hang or
	    system TOC. This does not apply to multi-processor
	    systems.

	10. Defect: JAGag20034 SR: 8606464337
	    Serviceguard does not failover IPv6 addresses when the
	    standby is configured on a lower-index interface such
	    as lan1 and primary is configured on higher-index such
	    as lan2.

	11. Defect: JAGag25522 SR: 8606470431
	    The CMGMSD_LOG_FILE parameter is defined in
	    /etc/cmcluster.conf.  As a result, the cmgmsd daemon
	    logs into the location specified by CMGMSD_LOG_FILE
	    instead of /var/adm/syslog/syslog.log, which is
	    supposed to be the default.

	12. Defect: JAGag13439 SR: 8606457100
	    Defect: JAGag25508 SR: 8606470417
	    The permissions for the Serviceguard SNMP subagent log
	    file /var/adm/SGsnmpsuba.log is 666 instead of 644.

	13. Defect: JAGag18064 SR: 8606462172
	    In a Serviceguard cluster with CFS and HP Integrity
	    Virtual Machine nodes as Serviceguard nodes,
	    cmapplyconf will allow the first virtual machine node
	    to be added to the cluster, or the last virtual
	    machine node to be removed from the cluster while the
	    System Multi-Node package, SG-CFS-pkg is up on other
	    nodes. This should not be allowed.

	14. Defect: JAGag11227 SR: 8606454589
	    In extremely rare circumstances, if cmcld dies or is
	    killed while it is halting, it may not be restartable
	    on that node.  The following message would appear in
	    syslog.
	    cmcld: It appears that package applications or
	    cmcld: resources may be active on this node.
	    cmcld:  Re-starting the cluster could cause data
	    corruption.
	    cmcld: To recover from this situation
	    cmcld: reboot this system:
	    cmcld: shutdown -r   (stops package components)
	    cmcld: After ensuring that no package applications
	    cmcld: or resources are active, you can override this
	    data
	    cmcld: integrity protection by issuing the following
	    commands
	    cmcld: (which allow the daemon to start without
	    rebooting):
	    cmcld:     rm /var/adm/cmcluster/.cm_start_time
	    cmcld:     touch /var/adm/cmcluster/.cm_start_time
	    cmcld: For CFS customers, it is highly recommended that
	    cmcld: they reboot the node instead of using the data
	    cmcld: override mechanism

	15. Defect: JAGaf91648 SR: 8606432206
	    cmquerycl does not recognize JFS filesystems created
	    with the default layout version 6 and does not report
	    them.

	    In addition to this Serviceguard patch (or it's
	    superseding patch) the libc patch, PHCO_32488 or it's
	    superseding should be installed.

	    If the libc patch PHCO_32488 or its superseding patch
	    is not installed, cmquerycl will not be able to
	    recognize JFS filesystems created with the default
	    layout version 6 and does not report them.

	16. Defect: JAGag30170 SR: 8606475859
	    The Serviceguard daemon cmlvmd terminates upon receipt
	    of SIGHUP. This causes cmcld to abort and a potential
	    node TOC.

	17. Defect: JAGag31538 SR: 8606477058
	    cmapplyconf or cmcheckconf of a package with incorrect
	    syntax for "resource_up_value" might succeed.

	18. Defect: JAGag34015 SR: 8606479869
	    Serviceguard with CVM 4.1 does not support APA or
	    Infiniband heartbeat interfaces. The Serviceguard
	    configuration commands currently allow this type
	    configuration, which will cause CVM/CFS to be unable to
	    initialize successfully. This can increase perceived
	    failover times. As this is an unsupported configuration,
	    undiscovered symptoms are possible.

	19. Defect: JAGag21411 SR: 8606465861
	    Under rare circumstances, cmrunnode will core dump.
	    The stack trace of cmrunnode is:

	    #0  0x60000000c04d0410:0 in kill+0x30 ()
	        from /usr/lib/hpux32/libc.so.1
	    #1  0x60000000c03c7430:0 in raise+0x30 ()
	        from /usr/lib/hpux32/libc.so.1
	    #2  0x60000000c0489370:0 in abort+0x190 ()
	        from/usr/lib/hpux32/libc.so.1
	    #3  0x40f2600:0 in cl_cassfail ()
	        at utils/cl_clog.c:230
	    #4  0x4345800:0 in cf_start_post_rba_nodes ()
	        at config/config_start.c:338
	    #5  0x4348450:0 in cf_start_cluster ()
	        at config/config_start.c:716
	    #6  0x4178d10:0 in cmd_private_fork_daemon ()
	        at cmd/cmd_utils.c:103
	    #7  0x4172260:0 in node_main () at cmd/cmd_node.c:403
	    #8  0x416a2a0:0 in main () at cmd/cmd_main.c:180

	20. Defect: JAGaf82011 SR: 8606422187
	    Improved integration with Distributed Systems
	    Administration Utilities (DSAU). Intermittent command
	    hangs in cmapplyconf and cmdeleteconf have been seen
	    when DSAU is running on nodes in a Serviceguard cluster.

	21. Defect: JAGag34316 SR: 8606480189
	    cmquerycl, cmapplyconf, cmcheckconf do not enforce the
	    supported limit of 8 nodes for clusters using
	    CVM 4.1 /CFS.

	22. Defect: JAGag34191 SR: 8606480056
	    With CVM4.1, failover time could be increased by
	    a few seconds if a FAILFAST service fails while the
	    cluster is reforming. This can delay the TOC of the
	    local node and eventually cluster reformation.

	23. Defect: JAGag26716 SR: 8606471740
	    Unused functions in /etc/cmcluster/cfs/SG-CFS-util.sh
	    are obsoleted. This is not a defective behavior hence
	    there are no symptoms.

	24. Enhancement: JAGag08750 SR: 8606451844
	    Serviceguard did not support APA's LACP mode and only
	    supported up to 4 ports per link aggregate for FEC and
	    MANUAL mode.

	    This is an enhancement to support Serviceguard with
	    APA's LACP mode and up to 8 ports per link aggregate
	    for FEC and MANUAL modes, 32 ports per link aggregate
	    for LACP mode.

	    For this enhancement it is required to either install
	    11.23 December 2005 HP-UX 11i v2 fusion release
	    or the APA patch PHNE_34774. This enhancement is
	    disabled if either of them are not installed.

	25. Enhancement: JAGag36461 SR: 8606482593
	    Enhancement to allow Serviceguard Extension for
	    Faster Failover to be supported with Serviceguard
	    Storage Management Suite bundles containing CFS.

	26. Enhancement: JAGaf87266 SR: 8606427785
	    Hostnames in Serviceguard and Serviceguard Extension
	    for RAC cluster nodes are supported only up to 31
	    characters long.

	    This is an enhancement to support hostnames in
	    Serviceguard and Serviceguard Extension for RAC cluster
	    nodes up to 39 characters long.

	    For this enhancement it is required to install the
	    following bundles: NodeHostNameXpnd available in
	    Software pack media release: SPK0505-11.23,
	    Part Number: 5013-3681.
	    This enhancement is disabled if the bundle
	    NodeHostNameXpnd is not installed.

	27. Enhancement: JAGaf93937 SR: 8606435509
	    With the release of Quorum Server A.03.00.00, multiple
	    IP addresses for the quorum server can be specified.
	    This is an enhancement that allows Serviceguard to
	    support configuration of multiple IP addresses for
	    the quorum server.

	    For this enhancement it is required to upgrade the
	    quorum server version to A.03.00.00. For more
	    information on how to install and configure Quorum
	    Server version A.03.00.00, please refer to the release
	    notes for Quorum Server A.03.00.00. Note that this
	    release document is expected to be released April or
	    May 2007. This enhancement remains disabled if
	    quorum server version is not upgraded to A.03.00.00.

	    When this patch is used with versions of quorum server
	    earlier than A.03.00.00, only one quorum server IP
	    address is supported.

	PHSS_35371:

	1. Defect: JAGag13927  SR: 8606457625
	   cmquerycl command aborts when the cluster configuration
	   contains DGC devices having long hardware paths
	   with the output as given below. Similar abort may
	   be experienced with other Serviceguard commands like
	   cmgetconf, cmapplyconf.
	   ........
	   Gathering storage information
	   Found 23 devices on node omztcl2
	   Analysis of 23 devices should take approximately 5
	   seconds
	   0%----10%----20%----30%----40%----50%----60%----70%
	   ----80%----90%----100%
	   Unable to receive device query message from omztcl2:
	   Software caused connection abort
	   Could not send message to node omztcl2: Software caused
	   connection abort
	   Assertion failed: conn->inuse, file:
	   config/config_storage.c, line:2399

	2. Defect: JAGag08257  SR: 8606451287
	   In a cluster with a package configured with a dependent
	   EMS resource. If you issue cmrunnode on one node while
	   the cluster is down, the cmrunnode will fail and cmcld
	   will die nicely. However, the EMS resource for the
	   package was not deregistered before cmcld exits.
	   A subsequent cmruncl will cause the package depending
	   on the EMS resource not to start the package on this
	   node and logs the following error.

	   Jul  7 11:44:20 bit cmcld[12586]:
	   Resource /net/interfaces/lan/status/lan0 does
	   not meet package RESOURCE_UP_VALUE for package snarf.
	   Jul  7 11:44:20 bit cmcld[12586]: Package snarf cannot
	   run on this node.

	3. Defect: JAGaf69163 SR: 8606409265
	   Invalid data can be specified in the USER_NAME field for
	   the access control policies in the cluster ascii file
	   and a cmapplyconf will complete without error. When a
	   cmapplyconf is  re-executed to correct this, and if
	   the cluster is running, cmcld will abort, resulting in a
	   node TOC.
	   The following message will be logged in syslog when an
	   invalid username is applied:
	       Jul 12 11:34:08 sly cmcld: ERROR:
		  Invalid user name in RBA
		  Privilege lookup
	   The following messages will be logged
	   in syslog when the invalid username
	   is corrected:
		Jul 12 11:35:06 sly cmcld:
		   cdb_db_handle_lookup - More than
		   one found
		Jul 12 11:35:06 sly cmcld: CDB Prepare -
		   Unable to delete /acps/sly/*, object
		   does not exist
		Jul 12 11:35:06 sly cmcld: CDB Prepare -
		   Unable to perform configuration
		   operation 2.   Return value is 22.
		Jul 12 11:35:06 sly cmcld: Aborting:
		   cdb/cdb_db_server.c 1937 (Failed to
		   roll back config change
		Jul 12 11:35:06 sly cmcld:
		   cdb_db_handle_lookup - More than one
		   found
		Jul 12 11:35:10 sly cmclconfd[6699]: The
		   Serviceguard daemon, /usr/lbin/cmcld[6700],
		   died upon receiving signal number 6.

	4. Defect: JAGag09971 SR: 8606453198
	   With Serviceguard Extension for RAC, when the filesystem
	   where /etc/cmcluster resides becomes full and Oracle is
	   trying to request a group membership change, the
	   messages like the following will appear in syslog:

	   cmgmsd[1997]: Unable to apply the configuration change
	   due to insufficient disk space.
	   cmgmsd[1997]: ERROR:  commit_cdb_txn: Failed to commit
	   transaction(28,No space left on device)

	   This could ultimately manifest itself in various Oracle
	   failures.

	5. Defect: JAGag13268 SR: 8606456893
	   A package which uses VxVM disk groups will fail
	   to start and will report that a disk group may be
	   imported on another node even if it is not if cmviewcl
	   fails. The following will be seen in the package
	   log file:

	   check_dg: Error DG may still be imported on HOST

	6. Defect: JAGag11741 SR: 8606455170
	   A system can become unresponsive during a cmquerycl if
	   there are a large number of logical volumes configured
	   on the system. For example a system configured with
	   1400 logical volumes was unresponsive for 10 minutes
	   while cmquerycl was running. Similar delay may be
	   experienced with other Serviceguard commands like
	   cmgetconf, cmapplyconf.

	7. Defect: JAGag11992 SR: 8606455475
	   The Serviceguard boot script /sbin/init.d/cmcluster
	   can take a very long time to execute, resulting in a
	   long system boot time after a Metrocluster/SRDF
	   package   failover. The following error might show
	   up in /etc/rc.log:

	   VxVM vxdisk ERROR V-5-1-531 Device c14t12d1:
	   clearimport failed: Disk write failure

	   If the Metrocluster/SRDF package is then restarted on
	   the R1 node, the import of the VxVM data group fails
	   with

	   VxVM vxdisk ERROR V-5-1-539 Device c14t12d1:
	   get_contents failed: Disk device is offline
	   VxVM vxdg ERROR V-5-1-587 Disk group dgpkgEMC-1: import
	   failed: No valid disk found containing disk group

	PHSS_34337:

	1.  In a reforming cluster that has the
	    NETWORK_FAILURE_DETECTION parameter set to
	    INONLY_OR_INOUT, full network polling would not be
	    performed even if the primary lan has missed the
	    maximum number of inbound polling packets thus causing
	    a local lan failover to the standby lan not to occur.

	2.  Sometimes a package using an psmmon EMS resource may
	    not come up, when Serviceguard is re-started.

	3.  cmapplyconf incorrectly allows a cluster lock volume
	    group to be unclustered (VOLUME_GROUP line removed
	    from cluster ascii file).

	4.  Serviceguard automatically plumbs standby network
	    interfaces for IPv6 use, even when IPv6 is not being
	    used in the cluster configuration.

	5.  Corruption in link level messages can lead to cmcld
	    SIGSEGV even with checksumed messages, the stack traces
	    of the resulting core files can vary but often include
	    one of the functions ns_if_setgood or dlpi_recv.

	6.  cmcld aborts with "Not enough space" on socket
	    allocation. The following message will be logged in the
	    syslog:
	    vmunix: Failed to allocate a socket: Not enough space
	    vmunix: Service Guard Aborting!
	    vmunix: Cause: socket failed
	    vmunix: (File: rcomm/comm_ip_setup.c, Line: 538)
	    vmunix: Aborting! socket failed

	7.  syslog shows the following diagnostic message from
	    cmcld:
	    connect to 192.77.1.5 for port 5300 failed with
	    Invalid argument

	8.  In the package control script templates, the
	    explanation fields for the VGCHANGE examples are
	    inaccurate.  The examples and log messages from the
	    control script do not take shared volume group
	    activation mode into account.

	9.  Serviceguard A.11.17 provides a SCRIPT_LOG_FILE
	    parameter to set the log file for a package. Messages
	    from the control script are output to this log file but
	    service command output does not and still goes to the
	    default log file. The default log file is named by
	    adding .log onto the control script name.

	10. Failure of cmhaltnode when executed in parallel on
	    multiple nodes. The error of the failed cmhaltnode
	    command may look something like this. The error message
	    is not unique. Any other error message is also
	    possible.
	    $cmhaltnode -f
	    Disabling package switching to all nodes being halted.
	    Warning:  Do not modify or enable packages until the
	    halt operation is completed.
	    Failed to query the package information

	11. Sometimes a package using a simple package dependency
	    may not start even though the package that it depends
	    on has started.

	12. Error message output by cmcheckconf and cmapplyconf is
	    not helpful if the bridged network assignment changes.

	13. Under very rare circumstances all the nodes in the
	    cluster may TOC at the same time, when the timer loop
	    thread is stuck (not holding cm_lock) or the system
	    clock is not advancing. This prevents cmcld timeout and
	    prevents the safety timer from being updated resulting
	    in a TOC.

	14. If the hacl-cfg UDP port is scanned by Linux utilities
	    like nmap and amap, Serviceguard commands potentially
	    fail for ten minutes. If inetd logging is enabled the
	    following message is logged to syslog:
	    "inetd[27802]: hacl-cfg/udp: Server failing
	    (looping), service terminated."
	    Sometimes cmviewcl ends up spinning forever with the
	    following output from cmviewcl:
	    "Protocol failure talking with cmclconfd on
	    10.144.196.135 (5)"

	15. cmviewconf core dumps if it is unable to communicate
	    with cmclconfd for any reason.

	16. cmapplyconf does not provide correct information when
	    run in a VERITAS Cluster Volume Manager 4.1
	    environment. Serviceguard does not provide correct
	    information about minimum LAN requirements when they
	    are not met. Also the message "Need not have to look
	    for shared VGs" logged by cmapplyconf is unclear.

	17. cmrunnode times out after approximately 35 seconds
	    rather than waiting for AUTO_START_TIMEOUT to expire
	    even though cmcld is still trying to form a cluster
	    in the background.

	18. When cmviewcl is run with the options '-v -f line',
	    the value displayed for the cluster_formation_time
	    attribute is incorrect.

	19. The Serviceguard NMAPI interface fails if the file
	    descriptor used to connect to cmgmsd is greater than
	    the default FD_SETSIZE, i.e. 24576 causing data
	    corruption of the client process. This is applicable
	    only for SGeRAC and Oracle client processes.

	20. Incorrectly formatted IP addresses in the cluster
	    ascii file are not correctly detected by cmcheckconf
	    and cmapplyconf resulting in confusing error messages.
	    The IP addresses are reported as 255.255.255.255 rather
	    than the text that was entered in the ascii file. For
	    example the entry:

	       HEARTBEAT_IP        16.113.153.bad

	    results in the following error:
	    Network interface lan0 on node ogre has a different IP
	    address (16.113.153.12 != 255.255.255.255)

	21. A 2-node Serviceguard cluster with a cluster lock may
	    form two clusters if all heartbeat networks experience
	    prolonged heavy network congestion and if there are
	    frequent kernel hangs during a cluster reformation.
	    This will result in data integrity problem.

	22. For Serviceguard cluster configurations that do not
	    have the SGeRAC product installed a SCSI bus reset is
	    not issued at the appropriate time for exclusively
	    activated volume groups.

	23. The cmcheckconf and cmapplyconf commands may fail with
	    a misleading error message when a Standby LAN in the
	    cluster configuration has been disconnected or has
	    failed.

	24. Too many "Unable to stat /etc/cmcluster/cmclconfig,
	    No such file or directory" messages fill up syslog.

	25. Quorum server going up and down at times causes cmcld
	    to dump core. The cmcld aborts with signal 6.
	    The stack trace of cmcld is:
	    #0 0x60000000c04a9690:0 in kill+0x30 ()
	    from /usr/lib/hpux32/libc.so.1
	    #1 0x60000000c03a0430:0 in raise+0x30 ()
	    from /usr/lib/hpux32/libc.so.1
	    #2 0x60000000c04625f0:0 in abort+0x190 ()
	    from /usr/lib/hpux32/libc.so.1
	    #3 0x4351700:0 in cl_cassfail (clog_handle=0x0,
	    module=11,assertion=0x400402f0 "FALSE",
	    file=0x40029630 "utils/cl_select.c", line=482) at
	    utils/cl_clog.c:228
	    #4 0x4382900:0 in cl_select_notify_error () at
	    utils/cl_select.c:482
	    #5 0x4383ca0:0 in cl_select_loop () at
	    utils/cl_select.c:671
	    #6 0x60000000c00b3d20:0 in
	    __pthread_bound_body+0x170 ()
	    from /usr/lib/hpux32/libpthread.so.1

	
	26. Result of cmcheckconf -k and cmapplyconf -k are
	    different when volume group listed in cluster config
	    ascii file is powered off.

	27. Configuration commands such as cmgetconf fail after
	    reporting disks do not have an ID when they do:
	    Warning: The disk at /dev/dsk/c25t0d0 on node kelvin
	       does not have an ID, or a disk label.
	    Error: Unable to determine a unique identifier for
	       physical volume /dev/dsk/c25t0d0 on node kelvin. Use
	       pvcreate to give the disk an identifier.
	     The following errors are reported in syslog:
	     Feb  6 20:01:07 kelvin cmclconfd[6345]: Unable to open
	        disk
	     /dev/dsk/c25t0d0: Resource temporarily unavailable
	     Feb  6 20:02:20 kelvin cmclconfd[6345]: Physical
	     volume
	     /dev/dsk/c25t0d0 in volume group /dev/vgXX does not
	        have an ID!

	28. In certain configurations, cmquerycl could hang
	    indefinitely if any volume group is removed from the
	    system while cmquerycl is in progress.

	29. The cmviewcl(5) man page that documents cmviewcl -f
	    line is missing.

	30. As a prerequisite to support HP Integrity Virtual
	    Machine (HPVM) nodes as a member of a Serviceguard
	    cluster, the quiescence period during cluster
	    reformation for HPVM guests need to be extended.

	31. Enhanced the Serviceguard package control script
	    templates to support integration with the EVFS
	    (Encrypted Volume File System) product.

	32. A TOC occurs and the following error shows up in
	    syslog:
	    cmlvmd: Failed to accept connections from commands:
	    No buffer space available

	33. cmviewcl command takes about a minute returning the
	    error message, when the user does not have the
	    adequate access level to view the cluster information.

	34. Certain memory within Serviceguard daemon cmcld was
	    freed twice during cluster reformation resulting in
	    memory corruption. This can result in unexpected
	    behavior ranging from no effect at all or an error
	    message or a core dump.

	35. The documentation about Serviceguard configuration
	    parameter NODE_TIMEOUT is improved. The improved
	    documentation is in cluster ASCII file, cmquerycl man
	    page and "Note" message after cmcheckconf/cmapplyconf
	    command.

	36. During heavy network traffic, cmcld may log the
	    following message to syslog:
	    cmcld: Failed to receive IP message from 192.77.1.13
	    on 5300, Resource temporarily unavailable.

	37. Log messages like the following fill up syslog.log
	    vmunix: LLT INFO V-14-1-10023
	    lost 12 hb seq 97 from 3 link 1 (lan2)
	    vmunix: LLT INFO V-14-1-10019
	    delayed hb 3350 ticks from 2 link 1 (lan2)
	    ...
	    vmunix: LLT INFO V-14-1-10019
	    delayed hb 22100 ticks from 3 link 0 (lan1)

	    These messages appear when Serviceguard is configured
	    with Veritas Cluster File System or Veritas Cluster
	    Volume Manager and there is a standby lan configured
	    for the heartbeat interface.

	38. When using the NFS toolkit scripts in a Serviceguard
	    package control script, if the package is halting and
	    the NFS scripts are unable to cleanly shutdown NFS, the
	    Serviceguard package script logs the following message
	    in the log:
	    Node "nodename": Package start failed at Wed Dec  7
	    09:22:19 EST 2005
	    This is a misleading message, since in reality, the
	    package stop failed, not the package start.

	39. The documentation in the package control script states
	    that the HA NFS script should be named "ha_nfs.sh"
	    instead of the correct name "hanfs.sh".

	40. cmquerycl -f line option when used in conjunction with
	    -c and -C options does not display PV information of
	    a node that is being added into the cluster.

	41. cmquerycl -f line option displays nodes outside the
	    cluster and their node id is set the highest node
	    id of the cluster.

	42. "cmquerycl -v -fline -c cluster -n nodeA", where the
	    -n list does not include all the nodes in the existing
	    cluster, generates a command core.

	43. If cluster nodename is configured with more than
	    eight characters, Serviceguard commands and daemons
	    may fail with error indicating that the operating
	    system release string is null.

	44. Serviceguard commands fail to connect to Serviceguard
	    node. This can happen when there are high amounts of
	    Serviceguard traffic on the network, combined with
	    slow DNS servers and a configuration which does not
	    have all Serviceguard nodes on the network being
	    listed in /etc/hosts.

	    For example cmrunpkg can see the cluster as
	    "down/not running" even if the cluster is running.

	    cmrunpkg -n april -n may pkg-m-1070450443_7
	    cmrunpkg: Cluster appears to be down

	45. Packages start on the first node that satisfies the
	    package dependencies at cluster start up time and not
	    necessarily on the configured primary node.

	PHSS_33840:

	1. This patch provides VERITAS Cluster File System
	   (CFS 4.1) capability with appropriate Serviceguard
	   Storage Management Suite bundles such as T2775BA,
	   T2776BA, or T2777BA.  This patch also enables
	   VERITAS CVM 4.1 capability with Serviceguard A.11.17.

	   Therefore this Item 1 does not represent a defect and
	   therefore there is no external Symptom for this Item.

	2. The boot-time cluster initialization script
	   (/sbin/init.d/cmcluster) does not retry up to
	   the full AUTO_START_TIMEOUT time in the case
	   where the cluster is not already running and
	   one or more of the configured cluster nodes
	   are not reachable via the network.

	3. After deleting a node from the cluster, the
	   configuration daemon (cmclconfd) on the deleted
	   node goes into an infinite loop.

	4. Messages such as the following appear in syslog:
	   cmfileassistd[26383]: The cluster daemon aborted our
	      connection (231).
	   cmcld[12375]: Too much pending message memory
	      (1120552 bytes, 1048576 max)
	   cmfileassistd[26383]: Lost connection with Serviceguard
	      cluster daemon (cmcld): Software caused
	      connection abort

	5. When multiple nodes are started and join an existing
	   cluster at the same time, a node may abort and the
	   message in syslog will be:
	      Assertion failed: trans_id != NULL, file:
	         cdb/cdb_utils.c, line: 263.

	   This may also appear as a segmentation violation
	   abort of the cmcld with the stack trace of the core
	   dump showing  cl_config_disconnect.

	6. Misspellings were found in the cmquerycl man page.

	7. cmcld aborts because cmapplyconf incorrectly passes
	   rather than failing as it should when 'detected a
	   partition of IP subnet' and 'minimum network
	   configuration requirement for the cluster have not
	   been met.'

	   An example of the warnings is shown below:

	   Detected a partition of IP subnet 192.76.1.0.
	      Partition 1
	         node1 lan9000
	         node2 lan9000
	         node3 lan9000
	      Partition 2
	         node4 lan9000
	   Detected a partition of IPv6 subnet fec0:0:0:4c01::.
	      Partition 1
	         node1 lan9000
	         node2 lan9000
	         node3 lan9000
	      Partition 2
	         node4 lan9000
	   Checking for inconsistencies
	      Minimum network configuration requirements for
	      the cluster have not been met. Minimum network
	      configuration requirements are:
	        - 2 or more heartbeat networks OR
	        - 1 heartbeat network with local
	                                 switch (HP-UX Only) OR
	        - 1 heartbeat network using APA with 2 trunk
	                                members (HP-UX Only) OR
	        - 1 heartbeat network with serial
	                                   line (HP-UX Only) OR
	        - 1 heartbeat network using bonding (mode 1)
	                             with 2 slaves (Linux Only)
	    ...
	    Adding configuration to node node4
	    Modifying configuration on node node1
	    Modifying configuration on node node3
	    Modifying configuration on node node2
	    Adding node node4 to cluster
	                cluster_node1_200510181839
	    Marking/unmarking volume groups for
	                        use in the cluster
	    Completed the cluster creation

	8. cmcld has not been updated to handle the
	   drivers for the newer supported cluster
	   lock interface cards such as the Ultra160
	   and Ultra320 cards.  Therefore cmcld defaults
	   the cluster lock timings to the default (worst
	   case) leading to longer failover times than
	   would be expected, approximately 60 seconds
	   rather than 30 seconds which would be seen
	   for c720 driver for a simple 2 node cluster
	   with 2 second node timeout.

	9. There is no symptom for this item.  This is just a
	   description of the addition of a 64bit version of
	   libsgcl shared library to the Patch Depot.  The
	   file is being added now for future internal use.

	10. cmapplyconf can core dump in the situation where
	    one of the members has a network interface that
	    may have intermittent problems, where the device's
	    availability turns on and off.  If the administrator
	    runs cmapplyconf to modify the cluster configuration
	    and one of the members has the network device that
	    has intermittent problems the cmapplyconf will fail
	    and core dump.

	    The messages from cmapplyconf might look like:

	    Checking nodes ... Done
	    Checking existing configuration ... Done
	    Probing dst_node_id = 3 dst_net_id= 1 dst_ppa=0
	       with ZERO dst_mac_addr=0x
	    Probing dst_node_id = 3 dst_net_id= 2 dst_ppa=0
	       with ZERO dst_mac_addr=0x
	    Probing dst_node_id = 3 dst_net_id= 1 dst_ppa=0
	       with ZERO dst_mac_addr=0x
	    Probing dst_node_id = 3 dst_net_id= 2 dst_ppa=0
	       with ZERO dst_mac_addr=0x
	    Probing dst_node_id = 3 dst_net_id= 1 dst_ppa=0
	       with ZERO dst_mac_addr=0x
	    Probing dst_node_id = 3 dst_net_id= 2 dst_ppa=0
	       with ZERO dst_mac_addr=0x

	    Assertion failed: NULL != sub_net, file:
	       config/config_net_evaluate.c, line: 245

	    The command will fail.

	    Also, the syslog messages on the node where the
	    network device is having problems might look like:

	    Oct 1 05:08:45 nodename cmclconfd[5832]: Lan
	      interface 0 (PPA) on node id 3 has ZERO MAC address
	    Oct 1 05:08:45 nodename cmclconfd[5832]: DLPI error 1,
	       unix error 0 sending to aa080009167f - snap value
	    Oct 1 05:08:45 nodename cmclconfd[5832]: Problem
	       with network interface 0: Connection timed out.

	11. During a cmrunnode: "Detected Partition" error
	    messages appear and the cmrunnode succeeds.
	    Also, cmomd core from the api code path can occur:
	    Evaluating IP addresses the code expects to have at
	    least 1 netd object to represent the node with the
	    configured subnet.

	    The stack trace would look something like this:
	Program terminated with signal 6, Aborted.
	#0  0xc0214128 in kill+0x10 () from /usr/lib/libc.2
	#0  0xc0214128 in kill+0x10 () from /usr/lib/libc.2
	#1  0xc01ab554 in raise+0x24 () from /usr/lib/libc.2
	#2  0xc01f0df0 in abort_C+0x160 () from /usr/lib/libc.2
	#3  0xc01f0e4c in abort+0x1c () from /usr/lib/libc.2
	#4  0x81d0 in crash_handler (s=11) at om/om_main.c:226
	#5
	#6  0xc019b178 in tree_concatenate+0x8 ()
	       from /usr/lib/libc.2
	#7  0xc019c4d0 in real_free+0x498 ()
	       from /usr/lib/libc.2
	#8  0xc019f698 in free+0x340 ()
	       from /usr/lib/libc.2
	#9  0xc96dccc4 in
	       cf_private_evaluate_ip6_partition
	       (cl=0x401cabb0, scope=25,
	       ret=0x7bff5388, logh=0x7bff5174, flags=336)
	       at config/config_net_evaluate.c:1616
	#10 0xc96dd224 in cf_private_evaluate_network_probing
	       (cl=0x401cabb0, scope=25, flags=336,
	       logh=0x7bff5174)
	       at config/config_net_evaluate.c:1718
	#11 0xc9710ef8 in cf_private_find_config (cl=0x401cabb0,
	       scope=25, flags=336, make_copy=1, logh=0x7bff5174)
	       at config/config_query.c:889
	#12 0xc97113a8 in cf_find_config (cl=0x401cabb0, scope=25,
	       flags=336, logh=0x7bff5174)
	       at config/config_query.c:981
	#13 0xc9712000 in cf_validate_network (cl=0x401cabb0,
	       flags=336, logh=0x7bff5174)
	       at config/config_query.c:1183
	#14 0xc94345ac in cmp_validate_network_connections
	       (cl=0x401b28f0, vflag=0, log=0x40011480 "OMOB")
	       at providers/cmprovider/cmp_utils.c:2316
	#15 0xc9454478 in exec_method_op
	       (context=0x400388d0 "OMOB",
	       providerOp=0x40167560 "OMOB") at
	       providers/cmprovider/cluster/cmp_cluster_node.c:1665
	#16 0xc9454e1c in cmp_op_SGClusterNodeContainment
	       (context=0x400388d0 "OMOB",
	       providerOp=0x40167560 "OMOB") at
	       providers/cmprovider/cluster/cmp_cluster_node.c:1772
	#17 0xc9428944 in CMProviderOperation
	       (providerOp=0x40167560 "OMOB")
	       at providers/cmprovider/cmp_provider.c:1346
	#18 0xc93c81c4 in _OMProviderOperation
	       (provider=0x40021958 "OMOB",
	       providerOp=0x40167560 "OMOB") at
	       om/cm_provider.c:487
	#19 0xc93dc508 in CMProviderOperation
	      (providerOp=0x40167560 "OMOB")
	       at om/cm_provider_linkage.c:439
	#20 0xc9370c58 in _OMExecMethodOp (
	    class_name=0x40166910 "SGClusterNodeContainment",
	    method_name=0x40012608 "start",

	    instance_id=0x40167830
	       "SGClusterNodeContainment:SGCluster:1052344974+C
	       MNode:dad.cup.hp.com", return_value=0x7bff3888,
	       input_parameters=0x40166888 "OMOB",
	       output_parameters=0x7bff3878,
	       clientContext=0x400125a0 "OMOB",
	    log=0x40011480 "OMOB") at om/cm_ops.c:742
	#21 0x1bffc in parse_exec_method (cl=0x400111e0)
	       at om/om_network.c:3069
	#22 0x289d4 in connection_handler (fd=5, key=0x400111e0 "")
	       at om/om_network.c:5035
	#23 0xa2b4 in OMSelectLoop (doneOMSelectLoop=0x40002774)
	       at om/om_select.c:185
	#24 0x9704 in main (argc=5, argv=0x7bff0054)
	       at om/om_main.c:573

	12. When cmcld is starting up, ClusterUp and NodeUp
	    snmp traps are missed.

	13. "cmviewcl -f line" is not fully documented in
	    man pages.

Defect Description: 
	PHSS_36636:

	1. Defect: JAGag34293  SR: 8606480162
	   Serviceguard used to exit displaying an error message
	   whenever an invalid message was received.

	   Resolution:
	   The code has been modified to ignore invalid messages.

	2. Defect: JAGag34599  SR: 8606480516
	   There is a shared variable between different bridged
	   networks. When both subnet recover at about the same
	   time, it leads to an inconsistent state of the shared
	   variable. This causes the messages to appear for one
	   of the bridge network.

	   Resolution:
	   The code has been modified to not use a shared
	   variable between different bridged networks.

	3. Defect: JAGag37580  SR: 8606484458
	   The cmsrvassistd exit status code was removed
	   in A.11.17 for all the cases.

	   Resolution:
	   The code has been added to handle the cmsrvassistd
	   exit status.

	4. Defect: JAGag41341  SR: 8606488693
	   There was a missed case where in, if
	   HEARTBEAT_INTERVAL is not set in the cluster
	   configuration file then default value was being set to
	   zero.

	   Resolution:
	   The code has been modified to handle this case.

	5. Defect: JAGag42544  SR: 8606490065
	   The syslog messages were relevant only on HP-UX 11.23
	   but not on HP-UX 11.31 systems. There was no check
	   before logging to see if the machine is HP-UX 11.23 or
	   not.

	   Resolution:
	   The code has been modified not to display the false
	   syslog message on HP-UX 11.31 systems.

	6. Defect: JAGag36170  SR: 8606482272
	   Serviceguard locks a mutex before updating the callback
	   structure, but it unlocks the mutex before we remove
	   the callback from the list.

	   Resolution:
	   The code has been modified to unlock the mutex after
	   the callback is removed from the list.

	7. Defect: JAGag37994  SR: 8606484939
	   The status of system multinode package is shown as
	   "starting" during node/cluster halting.

	   Resolution:
	   The code has been modified to show the status of
	   system multinode package as "changing" during
	   cmhaltnode/cmhaltcl execution.

	8  Defect: JAGag42785  SR: 8606490333
	   Trying to modify hostent structure through gethost*()
	   call while in the middle of using it.

	   Resolution:
	   Instead of actual hostent structure, a copy of it is
	   used to have protection against next gethost*() calls.

	9. Defect: JAGag43673  SR: 8606491388
	   During the course of certain disk operations which
	   refresh data from remote side(R2) to primary side (R1),
	   disks are rendered "Not Ready"(not visible to the host)
	   for a short period of time. During this period of time,
	   if the primary node on the R1 is rebooting and VxVM is
	   starting up as part of boot sequence, VxVM marks these
	   "Not Ready" disks offline since it cannot access them.
	   Later, this results in a package startup failure during
	   package failback as VxVM Diskgroups are unable to be
	   imported due to offline disks.

	   Resolution:
	   The Serviceguard package control script has been
	   enhanced with a new parameter "VXVM_DG_RETRY" which
	   can be set to either "YES" or "NO". Setting this
	   parameter to "YES" will run the following
	   command "vxdisk scandisks" on the disks which belong
	   to the failed disk group.

	10. Defect: JAGag42796  SR: 8606490345
	    Execution of cmhaltcl or cmhaltnode on a cluster
	    configured with packages using relocatable IP
	    addresses on subnets that are not in the cluster
	    configuration, causes cmcld to grow in size
	    and dump core. The elements of the array were not
	    advanced properly during the removal of IP addresses.

	    Resolution:
	    Code has been modified to make sure that the elements
	    of the array are advanced properly.

	11. Defect: JAGag44706  SR: 8606492535
	    The log message was logged at default log level,
	    when the default buffer size of unix domain socket was
	    insufficient. But, cl_msg's flow control takes care of
	    adjusting the size and sending it.

	    Resolution:
	    Increased the message log level and changed the
	    log category, so that the message is not logged
	    at default log levels.

	12. Defect: JAGag43289  SR: 8606490922
	    cmviewcl cannot obtain the os_status value unless
	    the node is probed. This is only done when the verbose
	    option is used. cmviewcl should not display the
	    os_status unless the verbose option is used.

	    Resolution:
	    os_status is only displayed when the verbose option is
	    specified with cmviewcl.

	13. Defect: JAGag45533  SR: 8606493360
	    While displaying the status of service fail fast flag,
	    cmviewconf command uses different byte order values to
	    compare the flag which results in incorrect status of
	    the flag to be displayed.

	    Resolution:
	    Proper byte order comparison is performed to get the
	    correct status of service fail fast flag.

	14. Defect: JAGag45718  SR: 8606493785
	    In the cluster ASCII configuration file, if QS_HOST
	    value is specified with invalid, backslash character
	    ('\'), cmapplyconf dumps core.

	    Resolution:
	    The values specified in the cluster configuration file
	    for QS_HOST and QS_ADDR are checked for invalid,
	    backslash characters. An error will be thrown if it
	    has invalid, backslash character.

	15. Defect: JAGag41937  SR: 8606489376
	    The node experiencing hangs dies as it is unable to
	    update its safety timer. The candidate for coordinator
	    dies as it is waiting for heartbeat from the node
	    experiencing hangs in order to update its safety timer.

	    Resolution:
	    Node experiencing hangs is failed before it causes
	    problems to other nodes in the cluster.

	16. Defect: JAGag46086  SR: 8606494153
	    cmapplyconf displays inappropriate error message when
	    multiple QS_HOST entries are specified.

	    When the -q option is read from cmquerycl command
	    line, the array index is not incremented properly.

	    In 'cmviewcl -v -f line' the quorum server ip
	    addresses are displayed in invalid format command
	    output. The format is changed to display it correctly.

	    Resolution:
	    The error message is updated to display the
	    appropriate, valid message.

	    The way how cmquerycl command line arguments are read
	    for -q option is changed to read it properly.

	    The 'cmviewcl -v -f line' output format string is
	    changed to display the ip_addresses properly.

	17. Defect: JAGag46092  SR: 8606494159
	    cmapplyconf does not check for online changes in
	    QS_POLLING_INTERVAL and QS_TIMEOUT_EXTENSION values.

	    Resolution:
	    cmapplyconf is modified so it disallows changes to
	    QS_POLLING_INTERVAL and QS_TIMEOUT_EXTENSION online
	    and fails with an error if this is attempted.

	18. Defect: JAGag27798  SR: 8606473093
	    For all the unowned packages by default status and
	    state were displayed as "down" and "halted"
	    respectively without checking for the cluster
	    reachable status.

	    Resolution:
	    Package status has been modified to display
	    as "unknown" when the nodes of the cluster is not
	    reachable.

	19. Defect: JAGag38581  SR: 8606485608
	    Commands that show cluster information will display all
	    configured access control policies to a non-root user
	    that has privilege to run the command. This itself is
	    not a problem but it has been decided to show policies
	    only to those with the same or higher level of access.

	    Resolution:
	    The roles displayed will match the privilege level of
	    the user running the command based on the user name and
	    host from where command is run.

	PHSS_35427:

	1.  Defect: JAGag21443 SR: 8606465899
	    The select() system call was not retried when it
	    failed because of an interrupted system call.

	    Resolution:
	    select() system call is retried for a maximum of ten
	    times when it fails because of an interrupted system
	    call.

	2.  Defect: JAGag20225 SR: 8606464542
	    Defect: JAGag29645 SR: 8606475212
	    TEAC CD/DVD devices were not being excluded from
	    probing due to their unique peripheral descriptions
	    resulting in them not being detected as CD/DVDs,
	    also some CD/DVD devices from other manufacturers were
	    being probed as the descriptions were not present in
	    cmclconfd.

	    Resolution:
	    cmclconfd has been modified to exclude more CD and DVD
	    devices and specific TEAC CD/DVD devices.

	3.  Defect: JAGag06135 SR: 8606448943
	    When 3 heartbeats are exchanged between coordinator and
	    member node in the same tick, "ticks_since_boot not
	    advancing for last 4 seconds" is logged in the flight
	    recorder log. This message is misleading.

	    Resolution:
	    Code has been changed such that there will not be any
	    warning message if 3 heartbeats are received in same
	    tick but start giving warning message if 4 heartbeats
	    are received in same tick.

	4.  Defect: JAGag14977 SR: 8606458777
	    Defect: JAGag33746 SR: 8606479578
	    cmcld, cmnetassistd was coded to ignore SIGILL.

	    Resolution:
	    code to ignore SIGILL removed.

	5.  Defect: JAGag11719 SR: 8606455144
	    There was no core file from cmgmsd as it was not
	    aborted when it fails to halt correctly within timeout.

	    Resolution:
	    cmcld will now send an abort signal to cmgmsd if the
	    daemon fails to halt within timeout thus causing cmgmsd
	    to dump a core file. In the future this change helps
	    troubleshooting the underlying problem.

	6.  Defect: JAGag25946 SR: 8606470887
	    The maximum number of connections that can be accepted
	    by cmlvmd daemon is 5. It is too small.

	    Resolution:
	    The number of connections that can be accepted by
	    cmlvmd daemon is now increased to 4096.

	7.  Defect: JAGag12644 SR: 8606456223
	    There was no check to see whether the service script
	    or program exists or has execute permission before
	    attempting to run the service.

	    Resolution:
	    Only allow service restart to be attempted if service
	    script or program exists with execute permission.

	8.  Defect: JAGag05782 SR: 8606448540
	    When system multi-node package  fails at startup,
	    it causes the node to TOC's. If AUTOSTART_CMCLD is set
	    to 1, the system can repeatedly TOC and reboot if
	    there are any problems with the system multi-node
	    package after cmcld is up and running.

	    Resolution:
	    Cluster activities are not automatically started if the
	    node TOC's repeatedly twice due to failure in starting
	    system multinode package. A message is logged
	    in syslog.log and /etc/rc.log.

	9.  Defect: JAGag27672 SR: 8606472905
	    The issue resulted from Serviceguard's use of multiple
	    thread priorities within the same process.

	    Resolution:
	    Added additional synchronization between the threads.

	10. Defect: JAGag20034 SR: 8606464337
	    Lower index interface is processed first and added to
	    the bridged net. In this scenario, the standby
	    configured on lower index interface does not find the
	    primary interface in the bridge net to figure out if
	    IPv6 is configured.

	    Resolution:
	    The plumbing routine is done only after all interface
	    entries are added to the database.

	11. Defect: JAGag25522 SR: 8606470431
	    The variable CMGMSD_LOG_FILE was included
	    in /etc/cmcluster.conf.

	    Resolution:
	    CMGMSD_LOG_FILE variable is removed from
	    /etc/cmcluster.conf so that cmgmsd daemon will hence
	    forth log to syslog.

	12. Defect: JAGag13439 SR: 8606457100
	    Defect: JAGag25508 SR: 8606470417
	    The permissions for /var/adm/SGsnmpsuba.log is
	    set to 666 which allows universal write to this
	    log file.

	    Resolution:
	    The permissions for the log file
	    /var/adm/SGsnmpsuba.log is set to 644.

	13. Defect: JAGag18064 SR: 8606462172
	    Incorrect behavior of cmapplyconf allows the
	    first virtual machine node to be added or last
	    virtual machine node to be removed from a
	    Serviceguard CFS Cluster even when system multi
	    node package SG-CFS-pkg is up on other nodes.

	    Resolution:
	    Modified the behavior of cmapplyconf to disallow
	    this configuration.

	14. Defect: JAGag11227 SR: 8606454589
	    Changes to cmcld increased the halt sequence time,
	    where if cmcld were to die, it would not be restartable
	    on that node.

	    Resolution:
	    Modified the code to decrease the halt sequence time.

	15. Defect: JAGaf91648 SR: 8606432206
	    cmquerycl command does not recognize JFS filesystems
	    created with the default layout version 6 and does not
	    report them.

	    Resolution:
	    cmquerycl has been enhanced to identify and report
	    the logical volumes with JFS version 6 layout file
	    system.

	    In addition to this Serviceguard patch (or it's
	    superseding patch) the libc patch, PHCO_32488 or it's
	    superseding patch should be installed.

	    If the libc patch PHCO_32488 or its superseding patch
	    is not installed, cmquerycl will not be able to
	    recognize JFS filesystems created with the default
	    layout version 6 and does not report them.

	16. Defect: JAGag30170 SR: 8606475859
	    cmlvmd was coded not to ignore SIGHUP.

	    Resolution:
	    cmlvmd has been modified to ignore SIGHUP

	17. Defect: JAGag31538 SR: 8606477058
	    cmapplyconf or cmcheckconf did not check for string
	    boundaries while parsing string value given for
	    "resource_up_value".

	    Resolution:
	    Modified cmapplyconf and cmcheckconf to look for end
	    of string before parsing the subsequent token.

	18. Defect: JAGag34015 SR: 8606479869
	    Serviceguard should prevent clusters using APA or
	    Infiniband from being configured when SG-CFS-pkg is
	    configured.

	    Resolution:
	    Added checks so that cmquerycl, cmcheckconf and
	    cmapplyconf will fail if Serviceguard with CVM 4.1 is
	    configured with APA or infiniband heartbeat interfaces.

	19. Defect: JAGag21411 SR: 8606465861
	    Execution of the command cmrunnode on a cluster
	    even before the earlier command cmapplyconf has
	    finished writing the config file, can cause the value
	    of "node" to be NULL, which causes an assertion.

	    Resolution:
	    Assertion has been replaced with the following message,
	    "Unable to execute the command at this time, please try
	    again." asking the user to try again.

	20. Defect: JAGaf82011 SR: 8606422187
	    The execution of cmapplyconf, cmdeleteconf invokes
	    scripts in a Distributed Systems Administration
	    Utilities (DSAU) environment. The command 'ps -ef' on
	    any script launched from
	    /usr/sbin/cmconfig_change_callout shows wrong process
	    name. It will show the process name of the parent
	    script, '/usr/bin/sh usr/sbin/cmconfig_change_callout'.

	    Resolution:
	    /usr/sbin/cmconfig_change_callout has been modified to
	    use nohup to launch the script.

	21. Defect: JAGag34316 SR: 8606480189
	    cmquerycl, cmapplyconf, cmcheckconf do not enforce the
	    supported limit of 8 nodes for clusters using
	    CVM 4.1 /CFS.

	    Resolution:
	    Disallow cmquerycl, cmapplyconf, cmcheckconf operation
	    for CVM 4.1 /CFS cluster of more than 8 nodes.

	22. Defect: JAGag34191 SR: 8606480056
	    With CVM 4.1, when a failfast service fails while
	    the cluster is reforming, safety time will be updated
	    to a value beyond what it should be.

	    Resolution:
	    Added a check to ensure safety time is not set past
	    its current value.

	23. Defect: JAGag26716 SR: 8606471740
	    Unused functions in /etc/cmcluster/cfs/SG-CFS-util.sh
	    are obsoleted. This is not a defective behavior hence
	    there is no defect description.

	24. Enhancement: JAGag08750 SR: 8606451844
	    Serviceguard did not support APA's LACP mode and only
	    supported up to 4 ports per link aggregate for FEC and
	    MANUAL mode.

	    Resolution:
	    Serviceguard has been enhanced to support APA's LACP
	    mode and up to 8 ports per link aggregate for FEC and
	    MANUAL modes, 32 ports per link aggregate for LACP mode.
	    In addition to this Serviceguard patch or the later the
	    user needs to install the 11.23
	    December 2005 HP-UX 11i v2 fusion release or the APA
	    patch PHNE_34774. This enhancement is disabled if
	    either of them are not installed.

	25. Enhancement: JAGag36461 SR: 8606482593
	    Certification and software limitations were required
	    for support of the Storage Management Suite's cluster
	    filesystem component within a cluster configured for
	    faster failover.

	    Resolution:
	    Added checks during cmapplyconf to enforce a minimum
	    node timeout limit for clusters configured with both
	    CFS and faster failover.

	26. Enhancement: JAGaf87266 SR: 8606427785
	    In Serviceguard Extension for RAC (SGeRAC)
	    configuration, the SLVM subsystem is unable to support
	    more than 31 character hostnames. Hence it limits the
	    support of hostnames in Serviceguard and SGeRAC cluster
	    nodes to only 31 characters.

	    Resolution:
	    Removed the 31-character hostname restriction in cmcld
	    and enhanced cmlvmd to allow existing vgdisplay command
	    in SGeRAC configuration to truncate 39-character
	    hostnames in the cluster nodes to 30 plus a '*' at the
	    end in its output.

	    Note that the SLVM subsystem on HP-UX 11.23 still
	    does not support hostnames of more than 31 characters
	    long.

	    This enhancement makes it flexible for Serviceguard to
	    support 39 character hostname and at the same time
	    seamlessly integrate with existing SLVM subsystem.

	    In addition to this Serviceguard patch or the later the
	    user needs to install the following bundle:
	    NodeHostNameXpnd, available in Software pack media
	    release: SPK0505-11.23, Part Number: 5013-3681.
	    This enhancement is disabled if the bundle
	    NodeHostNameXpnd is not installed.

	27. Enhancement: JAGaf93937 SR: 8606435509
	    When the quorum server is needed during a cluster
	    reconfiguration, if the subnet on which the cluster
	    nodes communicate with the quorum server goes down,
	    then the cluster will go down. If an additional
	    subnet (a total of two subnets) can be configured
	    for communication between nodes in the cluster and
	    quorum server, this will provide additional redundancy.
	    In Quorum Server A.03.00.00 release, it is possible to
	    specify two IP addresses for nodes to communicate with
	    the quorum server. Prior to this feature, the nodes of a
	    Serviceguard cluster could communicate with the quorum
	    server through only one subnet.

	    This is an enhancement to add the capability in
	    Serviceguard to configure an additional IP address for
	    communication between quorum server and cluster nodes.

	    Resolution:
	    This feature will enable the Serviceguard cluster to be
	    configured in such a way that the nodes of the cluster
	    can communicate with the quorum server through multiple
	    subnets. This will also need an enhancement to the
	    quorum server to accept connections through multiple IP
	    addresses which is supported in Quorum Server version
	    A.03.00.00.

	    Supported platforms and configurations
	    ======================================
	    When this feature is used, it is recommended that the
	    nodes of  the Serviceguard cluster and the Quorum Server
	    be physically connected on two different subnets in
	    order to realize the redundancy that this feature
	    offers.

	    The Quorum Server multiple IP address feature is
	    supported only when configured with the new Quorum
	    server version A.03.00.00 that supports this feature.
	    This feature is supported by this patch only on
	    Serviceguard version A.11.17.00 on HP-UX 11.23. When
	    this patch is used with versions of quorum server
	    earlier than A.03.00.00, only one quorum server IP
	    address is supported.

	    Up to one additional quorum server IP address is
	    currently supported, which means a total of two
	    addresses can be configured.

	    For more information on how to install and configure
	    quorum server version A.03.00.00, please refer to the
	    release notes for Quorum Server A.03.00.00. Note that
	    this release document is expected to be released April
	    or May 2007.

	    How to configure multiple quorum server IP addresses
	    ====================================================

	    cmquerycl, the command to generate a cluster
	    configuration file has been modified to accept an
	    additional quorum server IP address as shown below.

	    Please note that for all further steps to configure
	    second quorum server IP address to succeed, Quorum
	    Server version A.03.00.00 is needed and it must be
	    configured as described in its release notes.

	    To generate the cluster configuration with the
	    additional quorum server IP address on a second subnet,
	    qsip2, apart from the first one, qsip1, for a cluster
	    consisting of nodes node1 and node2, run the following
	    command.

	    # cmquerycl -n node1 -n node2 -q qsip1 qsip2 -C
	      cluster.conf

	    This generates a cluster configuration file that has a
	    second quorum server IP address, qsip2, specified in it.
	    This alternate IP address would have been specified by
	    the new keyword "QS_ADDR" in the cluster ascii
	    configuration file.

	    Alternatively, an existing cluster configuration file
	    can be edited to add the QS_ADDR keyword with the
	    alternate quorum server IP address. This can then be
	    used with the cmcheckconf and cmapplyconf commands. The
	    QS_ADDR keyword must be specified after the QS_HOST
	    keyword.

	    Use the cluster configuration file, cluster.conf,
	    generated above to configure the Serviceguard cluster.
	    Please note that a cluster configured with one quorum
	    server IP address must be halted before configuring it
	    with two quorum server IP addresses. Online addition of
	    second quorum server IP address is not supported.

	    Run the following command to check the cluster
	    configuration file.

	    # cmcheckconf -C cluster.conf

	    Run the following command to verify and apply the
	    cluster configuration file.

	    # cmapplyconf -C cluster.conf

	    Bring the cluster up by running the following command.

	    # cmruncl

	    For the cmquerycl, cmapplyconf and cmcheckconf to
	    succeed with multiple quorum server IP address
	    configuration, the following conditions must be met:

	     1. All the nodes must be able to communicate with all
	        the quorum server subnets.
	     2. Both the quorum server IP addresses specified must
	        be of the same quorum server.
	     3. The quorum server must be Quorum Server A.03.00.00
	        or later.
	     4. The authorization file of the quorum server must
	        specify all the IP addresses from which each of the
	        Serviceguard nodes will communicate with it (For
	        more details refer to quorum server A.03.00.00
	        release notes above).

	    After configuring a cluster with multiple quorum server
	    IP addresses, to verify that the cluster is configured
	    correctly, run the cmviewcl command as shown below and
	    verify that it reports two IP addresses.

	    # cmviewcl -v -f line | grep quorum_server
	      quorum_server:qsip1|name=qsip1
	      quorum_server:qsip1|ip_address=15.70.191.21
	        |name=15.70.191.21
	      quorum_server:qsip1|ip_address=15.70.191.46
	        |name=15.70.191.46
	      quorum_server:qsip1|polling_interval=300000000
	      quorum_server:qsip1|node:node1|status=up
	      quorum_server:qsip1|node:node2|state=running

	    The second quorum server IP address is reported in
	    "quorum_server" section cmviewcl command as shown above,
	    by the "ip_address" field.

	    cmviewcl has been modified to report the status of
	    quorum server. It will report a status of "up", if
	    quorum server is reachable via any of the quorum server
	    IP addresses.

	    cmviewconf and cmgetconf commands have also been
	    modified to report alternate quorum server IP addresses,
	    if configured. These commands also use the keyword
	    QS_ADDR to report the additional quorum server IP
	    address.

	    How does this feature work
	    ==========================
	    Due to a network failure, if a Serviceguard node is
	    unable to communicate with the quorum server, the node
	    connects to the quorum server via alternate quorum
	    server IP address. At any given time only one of the
	    quorum server subnets will be in use and only this
	    subnet is monitored periodically. This also means
	    that various nodes in the Serviceguard cluster may be
	    communicating with quorum server via different quorum
	    server IP addresses.

	    IMPORTANT NOTE: When SGeFF and quorum server multiple
	    IP addresses are both configured, it is very important
	    that the QS_POLLING_INTERVAL be tuned to your network
	    environment and reduced to as low a value as possible,
	    without going so low as to generate considerable load
	    on the Quorum server or the network. The default value
	    of QS_POLLING_INTERVAL is set to 30 seconds by cmquerycl
	    when SGeFF and quorum server multiple IP addresses
	    are both configured.

	    Please note that Serviceguard Manager cannot be used to
	    configure a cluster with the additional quorum server IP
	    address. Configuration operations on a cluster already
	    configured with multiple quorum server IP addresses will
	    fail when performed from Serviceguard Manager. A change
	    request (CR- JAGag37574 / SR- 8606484450) has been
	    filed against Serviceguard Manager for this issue.

	PHSS_35371:

	1. Defect: JAGag13927  SR: 8606457625
	   Disks which identify as DGC are probed twice when
	   configuration commands such as cmapplyconf are run.
	   During the second probe the hardware path of the device
	   is written to the wrong area of memory resulting in
	   possible failure of cmclconfd and subsequent command
	   failure as a result.

	   Resolution:
	   Code modified to copy the hardware path to the correct
	   location in memory.

	2. Defect: JAGag08257  SR: 8606451287
	   If cmrunnode fails to start the cluster services on a
	   node with packages which have dependent EMS resources
	   configured, the EMS resources are not deregistered
	   before cmcld exits. These packages will then fail to
	   start on this node on subsequent attempts after the
	   cluster is started.

	  Resolution:
	   EMS resource is deregistered before cmcld exits.

	3. Defect: JAGaf69163 SR: 8606409265
	   Data from the USER_NAME field is not
	   validated when cmapplyconf is run,
	   although an error is reported in syslog
	   by cmcld if the USER_NAME is invalid.
	   When an invalid username is corrected
	   and cmapplyconf re-executed cmcld aborts
	   due to the invalid data in the CDB.

	   Resolution:
	   Appropriate checks are now added to be
	   consistent with the checking for CLUSTER NAME,
	   PACKAGE NAME, etc.

	4. Defect: JAGag09971 SR: 8606453198
	   cmgmsd required that group membership transactions be
	   committed to the /etc/cmcluster/cmclconfig cluster
	   binary file on all nodes before the transaction would
	   complete. If this filesystem is full the transaction
	   would fail resulting in Oracle errors. However, group
	   membership information is transient and does not have
	   to be written to disk.

	   Resolution:
	   cmgmsd transactions no longer fail if there is not
	   enough disk space to write them to the binary
	   configuration file. An error is written to syslog but
	   the transaction completes preventing Oracle errors. The
	   transaction will be written to disk on nodes which have
	   enough space.

	5. Defect: JAGag13268 SR: 8606456893
	   The package control script was checking the exit status
	   of "cmviewcl | sed" which is set to the exit status of
	   the sed command when the exit status of cmviewcl was
	   required. This meant that cmviewcl was never retried as
	   it was supposed to if the cmviewcl command failed.

	   Resolution:
	   The code was modified to check the status of cmviewcl so
	   it could be retried if it fails.

	6. Defect: JAGag11741 SR: 8606455170
	   cmquerycl opens the block logical volume device instead
	   of the raw logical volume while querying logical volumes.
	   This results in overhead of closing the block logical
	   volume device in terms of holding the filesystem alpha
	   semaphore.

	   Resolution:
	   The code is modified to open the raw logical volume
	   device rather than the block logical volume device.

	7. Defect: JAGag11992 SR: 8606455475
	   After the failover of a Metrocluster/SRDF package from
	   the R1 to the R2 side, the Serviceguard boot
	   script /sbin/init.d/cmcluster potentially tries to
	   run "vxdisk clearimport <disk>" on write disabled EMC
	   disk devices belonging to the package's SRDF device
	   group. After some timeout the vxdisk command fails
	   and VxVM marks the device "offline". This adds
	   considerable time to the overall boot process of the
	   node that owned the package on the R1 side before.
	   If the package is restarted on the R1 node later, the
	   import of the VxVM disk group that uses the disk
	   devices that were marked offline during system boot,
	   fails.

	   Resolution:
	   The Serviceguard RC script /sbin/init.d/cmcluster
	   does not try to run "vxdisk clearimport <disk>" on
	   write disabled EMC Symmetrix disks managed by VxVM.
	   For a complete fix the patch PHSS_35451 for
	   Metrocluster/SRDF A.05.00 or later is required.

	PHSS_34337:

	1.  In a reforming cluster that has the
	    NETWORK_FAILURE_DETECTION parameter set to
	    INONLY_OR_INOUT, full network polling would not be
	    performed even  if the primary lan has missed the
	    maximum number of inbound polling packets thus causing
	    a local lan failover to the standby lan not to occur.

	    Resolution:
	    A full polling is performed even if the state of the
	    cluster is REFORMING.

	2.  The SIGPIPE signal was being set to default action
	    by the psmmon resource monitor.  So, when a package
	    with an ems resource, using psmmon, is configured
	    in the cluster, the psmmon died when it tried to
	    re-connect with cmcld, when cmcld went down and came
	    up. This caused the package to not come up.

	    Resolution:
	    Fixed the libsgcl used by the ems framework to handle
	    SIGPIPE appropriately, when connecting to cmcld.

	3.  Incorrect check allowed unclustered volume groups.

	    Resolution:
	    Fixed logic to check clustered and unclustered volume
	    group.

	4.  Serviceguard automatically plumbs standby network
	    interfaces for IPv6 use, even when IPv6 is not being
	    used in the cluster configuration.

	    Resolution:
	    Serviceguard does not plumb any network interfaces for
	    inet6 (IPv6) when IPv6 is not being used within the
	    cluster.

	5.  If the revision field of a corrupted Serviceguard link
	    level message is corrupt, the message can pass through,
	    eventually causing cmcld to abort. If the revision is
	    corrupted to a value lower than 3 we do not do any
	    checksum checking of the message. This results in many
	    cmcld SIGSEGVs when the link level polling messages are
	    corrupted.

	    Resolution:
	    cmcld ignores corrupted link level messages.

	6.  A socket call failure due to insufficient available
	    memory causes cmcld to abort.

	    Resolution:
	    cmcld now retries the socket call if it fails due to
	    insufficient available memory.

	7.  A transient error was encountered during a connect()
	    call and it was recovered after a retry. The
	    diagnostic was unnecessarily logged.

	    Resolution:
	    syslog will no longer show that message.

	8.  Current control script does not take into account
	    shared VG activation. And the original comments for
	    existing VG activation examples are inaccurate.

	    Resolution:
	    Corrected the inaccurate comment field for existing VG
	    activation examples; added two new examples for shared
	    VG activations; changed the control script log message
	    to reflect the current activation mode including shared
	    mode.

	9.  Service command output of a package goes to the default
	    log file even if SCRIPT_LOG_FILE parameter is set.

	    Resolution:
	    The log file was set appropriately to the
	    SCRIPT_LOG_FILE if defined or to the default log file
	    otherwise.

	10. Some of the transient errors caused the failure of
	    cmhaltnode when it is run in parallel (within a short
	    window of time) on multiple nodes.

	    Resolution:
	    A retry mechanism is added in the cmcluster script for
	    cmhaltnode to handle the situation if multiple shutdown
	    (1m) commands are executed on multiple cluster nodes.
	    This ensures that cmhaltnode succeeds at least in the
	    next retries if it was failed due to transient
	    mechanisms. To manually halt cluster services on
	    multiple nodes at the same time, the only supported
	    command is "cmhaltnode <node1> <node2> .." which
	    serializes the actions and therefore avoids the
	    problem.

	11. Sometimes a package with dependency does not start
	    after the package that it depends on is running.
	    This occurs when there is an activity on the dependent
	    package when the package it depends on comes up and
	    we cannot start the package at that time.

	    Resolution:
	    Modified Serviceguard to remember the event
	    and later to start the dependent package.

	12. cmcheckconf and cmapplyconf fail when -C
	    cmcluster.ascii is specified and the bridged net
	    assignment has changed. This could be due to a link
	    failure on one of the LAN cards in a bridged net,
	    since this LAN card cannot talk to any other LAN on
	    the local node. There were no internal/external
	    logging messages for this error.

	    Resolution:
	    cmapplyconf and cmcheckconf will log specific error if
	    the standby LAN card is disconnected.

	13. Under rare circumstances when the timer loop thread is
	    stuck (not holding cm_lock) or the system clock is not
	    advancing, cmcld threads will not be scheduled. This
	    prevents cmcld timeout and prevents the safety timer
	    being updated resulting in all nodes being TOC'ed.

	    Resolution:
	    Enhanced the code to check the time stamps of received
	    heartbeat messages to ensure clock is advancing, rather
	    than using the heartbeat sequence number. cmcld will
	    abort if it detects the system clock is not advancing
	    for a set period of time resulting in a failure of the
	    single errant node rather than the entire cluster.

	14. UDP messages were not marked as invalid even if there
	    were invalid values for length and offset fields in the
	    message, causing cmclconfd to exit without receiving
	    the message and/or cmviewcl to spin indefinitely. In
	    the cmclconfd case the message hence remains in the
	    inetd socket buffer causing inetd to spawn another
	    cmclconfd server. This is repeated until it reaches 40
	    servers in 60 seconds when it terminates the service
	    and only reinstates the service again after 10 minutes.

	    Resolution:
	    Mark the message as invalid if the length and offset
	    fields in the message contained improper values.

	15. cmviewconf dumps core when it cannot get a node handle
	    back from cmclconfd or it is not able to communicate
	    with cmclconfd for any reason.

	    Resolution:
	    Instead of dumping core, the command returns an error
	    statement and exits.

	16. cmapplyconf message is unclear when run in a
	    VERITAS Cluster Volume Manager 4.1 environment.
	    Serviceguard does not provide correct information
	    about minimum LAN requirements when they are not met.
	    The message "Need not have to look for shared VGs" is
	    unclear.

	    Resolution:
	    cmapplyconf output has been changed to indicate the
	    correct information. Unclear messages related to shared
	    volume groups have been removed.

	17. cmrunnode times out after approximately 35 seconds
	    rather than waiting for