Failover Clustering
- High availability option, standby instance on a different node
- In case of failover: active transactions are rolled back, connections retry, reconnect to other node
- Single copy of databases in shared storage
- Built on top of Windows Server Failover Clustering – up to 8 nodes in SQL 2005, 16 in SQL 2008
- SQL Server 2005 – Requires certified hardware
- See http://support.microsoft.com/kb/309395
- SQL Server 2008 – Run Failover Clustering Validation tool
- http://support.microsoft.com/kb/943984
- Discussion – Where to keep tempdb? In a cluster disk!
- Cluster Group (a.k.a. Cluster Service or Application) – Collection of required resources
- See http://msdn.microsoft.com/en-us/library/ms189134.aspx
- See http://support.microsoft.com/kb/327518/
SQL Server 2005
- Multi-instance clusters – each instance requires its own resources
- Multi-node – more nodes per configuration, active/active, active/passive, N+1 – plann carefully
- Analysis Services clusterable
- See http://msdn.microsoft.com/en-us/library/ms143511.aspx
- Integration Services – Can be clustered. Discussion - Where to store the packages?
- Reporting services not clusterable - Consider using network load balancing instead
- Can rename a clustered instance
- Majority Node Set
- See http://suppot.microsoft.com/kb/838612
SQL Server 2008
- Cluster Validation tool
- Support for up to 16 nodes
- New quorum model: File Server Witness, Node Majority
- Edit subnet mask, DHCP, IPv6
- GPT disks, support for >2TB partitions
- Improved cluster setup, error reporting
- Rolling upgrades, one node at a time
- See http://msdn.microsoft.com/en-us/library/ms189910.aspx
Failure Detection
- Node failure
- Resource failure detected
- LooksAlive – SQL Server status – default every 5 seconds
- IsAlive – SELECT @@SERVERNAME – default every 60 seconds
- See http://blogs.technet.com/rob/archive/2008/05/07/failover-clustering.aspx
- Upon failover – new instance comes up on other node
- SQL Server 2005+ – Enterprise Edition – service available after Redo phase completes
- Failback – Available using use preferred owner
Troubleshooting
- Verify failover
- Careful – Post-install tasks like manual IP ports, additional disks – Check resources
- Logs: Windows Event Logs, Cluster log, SQL Server Setup log
- Books Online: Failover Cluster Troubleshooting
- See http://msdn.microsoft.com/en-us/library/ms189117.aspx
Clustering Details
- Client design: To the app, failover process looks like server taking a long while to respond. Consider adding retry logic.
- Planning: Verify hardware solution, run validation, plan security for service accounts, plan SQL Tools location
- Migration: Verify OS settings (CSP, Kerberos), MSDTC, pre-requisites
- Heartbeat: TCP/IP configuration, binding order, remove NETBIOS
- Virtualization: Check support policy (KB below), guest failover not supported, SVVP
- See http://support.microsoft.com/KB/956893
- See http://support.microsoft.com/KB/956893
- Adding a node: Set SQL setup, select add cluster node
- Removing a node: Set SQL setup, select remove cluster node
- SQL Server setup can be fully scripted, including cluster install, add cluster node, etc.
- Example: SETUP.EXE /q /ACTION=InstallFailoverCluster /INSTANCENAME=...
- Example: SETUP.EXE /q /ACTION=AddNode /INSTANCENAME=...
- See http://msdn.microsoft.com/en-us/library/ms144259.aspx
- Careful – SQL Server 2008 cluster setup is now run one node at a time
- Cluster log: Use CLUSTER.EXE command line to obtain text log
- See http://blogs.msdn.com/clustering/archive/2008/09/24/8962934.aspx
Combining with other features
- Clustering and File Stream - Supported
- See http://msdn.microsoft.com/en-us/library/bb895334.aspx
- Clustering and Full Text – Supported
- Clustering and Replication - Supported
- See http://msdn.microsoft.com/en-us/library/ms143786.aspx
Clustering vs. Mirroring
- Clustering – entire instance, Mirroring – one database at a time
- For 2005, avoid mirroring large number of databases
- See http://msdn.microsoft.com/en-us/library/cc917681.aspx
- For 2008, with log stream compression, can probably handle more…
- See http://sqlcat.com/technicalnotes/archive/2007/09/17/database-mirroring-log-compression-in-sql-server-2008-improves-throughput.aspx
- Mirroring with large databases: Initial backup/restore can be problematic
- See http://blogs.msdn.com/clustering/archive/2008/08/28/8904281.aspx
- See http://blogs.msdn.com/cindygross/archive/2009/02/22/how-to-configure-dtc-for-sql-server-in-a-windows-2008-cluster.aspx
- See http://technet.microsoft.com/en-us/library/cc730992.aspx