Jump to content
 English      
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
     Forums advanced search
HP.com Home
IT Resource Center Forums > HP-UX > system administration

disaster test - what to check?

» 

IT Resource Center

» Login
» Register
» My profile
» Search knowledge base
» Forums
» Patch database
» Download drivers, software and firmware
» Warranty check
» Support Case Manager
» Software Update Manager
» Training and Education
» More maintenance and support options
» Online help
» Site map

Member icons
 
 HP moderator  HP moderator
 Expert in this area  Expert in this area
Member status
ITRC Pro ITRC Pro
250 points
ITRC Graduate ITRC Graduate
500 points
ITRC Wizard ITRC Wizard
1000 points
ITRC Royalty ITRC Royalty
2500 points
ITRC Pharaoh ITRC Pharaoh
7500 points
Olympian Olympian
20000 points
1-Star Olympian 1-Star Olympian
40000 points
2-Star Olympian 2-Star Olympian
80000 points
»  How to earn points
»  Support forums FAQs
Question status
Magical answer Magical answer
Message with a response that solved the author's question
Favorites status
Add to my favorites Add to my favorites
Delete from my favorites Delete from my favorites
This thread has been closed Thread closed
 

Content starts here
   Create a new message    Receive e-mail notification if a new reply is posted  Reply to this message
Author Subject: disaster test - what to check?      Add to my favorites
Viktor Balogh This member has accumulated 500 or more points
Oct 17, 2009 07:23:17 GMT   

Hi,

In two weeks or so we will have a disaster test in our datacenter, and I need to check some things and prepare for it. The power will suddenly turned off and after a few minutes turned back. We use ServiceGuard and external XP storage, the outage affects only the half of the two-node clusters and only the half of the storage subsystem. I need to foretell what will happen with the system/packages, and how the resynchronization of LVM will be done - how will state between the two storage boxes be synchronized.

The SG part is clear for me, but for now I didn't do any storage yet so I am most curious about the storage part here. And if you have a disaster recovery plan here, it is welcomed too! Points will be awarded. ;)
Note: If you are the author of this question and wish to assign points to any of the answers, please login first.For more information on assigning points ,click here


Sort Answers By: Date or Points
Viveki Expert in this area This member has accumulated 500 or more points
Oct 17, 2009 07:59:23 GMT  3 pts

Hi

I do not know what you are looking for from the storage side if the power goes down. Usually, for XP there is a disaster recovery software called continuous access. Is the same implemented?
Michael Steele Expert in this area This member has accumulated 7500 or more points
Oct 17, 2009 08:17:19 GMT  5 pts

Hi

a) Have your unlimited power supply vendor out to check for bad batteries.

b) and verify all boxes are on battery backup

c) then there is nothing. You're not failing over unless the network is disrupted. So if all of your network nodes are on batteries...
Johnson Punniyalingam This member has accumulated 2500 or more points
Oct 17, 2009 13:25:45 GMT  6 pts

Check all power supply, UPS,

MP Login for all the Servers

CM>PS

UPS,
====
Check with UPS Vendor,

Backup
======
I would also like to make sure all latest

OS backup and latest File System backup
for all servers included under disaster test
nickel script to collect all the system information details.

Rgds,
Johnson
Viktor Balogh This member has accumulated 500 or more points
Oct 18, 2009 11:40:32 GMT    N/A: Question Author

You must probably misunderstood me: the electricity will be completely off, also the UPS's! It will be tested, what will happen with the packages in this case. So one half of the clusters/storages will be offline, without a clean shutdown!

My question is: how will be the filesystems synchronized? If the package failovers, I think the resync of LVM will be initiated by the surviving clusterpartner, where the package actually runs. But what if the failover isn't permitted? e.g. after reboot of the powered off node the package starts automatically. In this case the sync was initiated by the rebooted node. I'm afraid here will be the correct data overwritten by the stale one. Can LVM auto-resync turned off? With what command?
Viktor Balogh This member has accumulated 500 or more points
Oct 18, 2009 11:43:56 GMT    N/A: Question Author

>Usually, for XP there is a disaster recovery >software called continuous access. Is the >same implemented?

No, HP XP Continuous Access isn't implemented.
Viveki Expert in this area This member has accumulated 500 or more points
Oct 18, 2009 12:36:44 GMT  4 pts

Hi Victor,

Still we are not clear on what you are looking for? If you are going to perform a power OFF test in the XP and test the disaster recovery of the same, I should say to ensure a good back up. Nothing else ....
Viveki Expert in this area This member has accumulated 500 or more points
Oct 18, 2009 12:57:51 GMT  9 pts

Sorry Balogh,

didnt see your post on the above.

I do not know the current setup. But normally, if you power off one node of the cluser, the package should automatically change over to the other node. If a filesystem correction is there, the fsck will be called automatically by the other node.

Just for your info, I will share one of my experience. In the site, the power is failed. The storage (not XP) and one of the nodes got powered off suddenly. They came back after a while. But the cluster failed to start and fsck was consuming hours to repair a file system. Finally, a complete shutdown and proper restart of full setup solved the issue without any further delay. No fsck doen this time. So again do not go by docs or information from other sites. The power down test will be unpredictable and as per me, there is a rare chance that you won't have a problem after that, whatever may be your precaustions.

Again, please stick to back up before the activity since the machience may not be aware you are just testing ;)
Viktor Balogh This member has accumulated 500 or more points
Oct 19, 2009 09:31:14 GMT    N/A: Question Author

Hi Viveki,

Thanks for your help. I know the Serviceguard part: if the package AUTO_RUN is enabled then the package will failover to the surviving node. (For the test packages it isn't always enabled) We have several LVM-mirrored filesystems, they are mirrored with the help of PVGs. Every such physical volume group consists of LUNs from separate XP boxes, so in case of a storage box failure only the half of the mirror will be affected.

But that part isn't clear to me: after a package switch, the package runs with half of the mirror. After the other half of the system powered on, how will be the data synchronized? The surviving node will a sync initiate, but we must make sure that the failed XP will be synchronized to the surviving one, and not reverse..

And what will be with the test packages? They will be started on the surviving node, after it has come back to life. Will here be needed a sync? Or only an fsck? We are using VXFS filesystems, do you think a full fsck (nolog) would be recommended?

After all, we will create an extra backup of the OS and the data...
Tingli This member has accumulated 1000 or more points
Oct 19, 2009 19:26:33 GMT  5 pts

If one node of a two node cluster is powered off, then the root mirror of the survival node won't be affected. It runs as usual, only the processes running in the failed node will be failed over the the survival node.

When the failed node is up, then it is just as usual system boot up. You can bring those failed processes can be brought back manually to the original node.
Viktor Balogh This member has accumulated 500 or more points
Oct 30, 2009 17:09:27 GMT    N/A: Question Author

Thanks Tingli,

and could you point me towards, what will on the storage side happen? The one side of the mirror (on storage box) will be out of electricity, my question would be: what happen after restarting the failed storage? How will the resynchronization occur? Will it happen manually or automatically? Could we set this synchronization to manual?
Viktor Balogh This member has accumulated 500 or more points
Oct 30, 2009 17:13:58 GMT    N/A: Question Author

my other question would be: will the server again powered off after we give the electricity back? where can I check it if it will reboot itself automatically? on MP/GSP?
Viktor Balogh This member has accumulated 500 or more points
Oct 30, 2009 17:23:01 GMT    N/A: Question Author

> will the server again powered off after we give the electricity back?

will the server powered ON again after we give the electricity back?
Johnson Punniyalingam This member has accumulated 2500 or more points
Oct 31, 2009 03:12:20 GMT  5 pts

my other question would be: will the server again powered off after we give the electricity back? where can I check it if it will reboot itself automatically? on MP/GSP?

Are You referring with "raw power" or UPS power" or You are doing power mantiance test ?

Once you shutdown the server- gracefully (shutdown -hy 0)
You have to unplug the "Power Cables" from Server- does your power source comes from (UPS) ?
Once you have Completed your power mantinance activity- connect back the power cables once power resume you need manual power on

> will the server again powered off after we give the electricity back?<<

This Question depends on "Power source" - if poweroff, you can check

can Check under Console logs (E - Error logs ) (MP/GSP)
Viktor Balogh This member has accumulated 500 or more points
Oct 31, 2009 12:14:00 GMT    N/A: Question Author

I am referring to RAW power. And this is not a power maintenance activity, our customer plans to simply remove the power without graceful shutdown. Switches, storage boxes, and servers will be affected to this, a whole datacenter - the other half of the infrastucture resides in another building will be inaffected, and hopefully will take over the packages.
Tingli This member has accumulated 1000 or more points
Nov 3, 2009 20:58:32 GMT  5 pts

Why not make the two raw power supplies to both servers and both storages, so you don't need to worry about sudden power off.

If you have UPS, then it can supply power for a few minutes and you don't need to worry about it either.

But if one server and half of the storage is down, then everything related to that half of storage will be down for sure and the database might be corrupted. Mean while the processes resided in the failed system will fail over to the the survival system. Usually the fail over takes several minutes and if the original failed system is back, then the result is unpredictable.
Viktor Balogh This member has accumulated 500 or more points
Nov 4, 2009 15:22:18 GMT    N/A: Question Author

Tingli: yeah, with UPSs it would be much easier. AFAIK there is an alternate power source, but it won't be an option.

>and if the original failed system is back, then the result is unpredictable.

Yes, that's my task: to predict the unpredictable. This whole action was organized only for checking what would happen if... electricity would be off all of a sudden. :(

I will leave this thread open and share the details. The test will be made on 27-29 november...
Tor-Arne Nostdal This member has accumulated 250 or more points
Nov 4, 2009 15:48:23 GMT  5 pts

I assume you've tested ordinary package switching and know that this works ok ;)
That you also know that your backups are running and is possible to restore ;)
...

Tip:
Ensure that they really cut the power for all components at once. We found a plausible case for error if failures came in a specific sequence...

Tip:
Check your MC/SG setup, that when your primary node once again comes up again - if it will switch back or not.
It might be that you want a controlled fallback to primary node, and do not want package switching automatically when the power is back again.

/2r
 
Create a new message    Receive e-mail notification if a new reply is posted   Reply to this message
 
 
Printable version
Privacy statement Using this site means you accept its terms
© 2009 Hewlett-Packard Development Company, L.P.