[lug] Clustering for Load-Balancing and Fault-Tolerance??

Nate Duehr nate at natetech.com
Thu Jan 31 00:22:25 MST 2002

> Not 100% true...  The load-balancer could watch the DNS requests and
> re-submit a request which has not been responded to in an appropirate
> time...  Usually, the client resolver will do that for you, though...

I don't know of any load-balancers that have this functionality (DNS proxy)
built-in but are there any?  Most are focused on Web failover which has its
own issues like loss of session state, etc -- which can usually be best
fixed by proper site coding vs. hardware... but folks still buy the
hardware... each has it's proper use, and both techniques get misused a lot.

> Load balancers and failover tend to work quite nicely for DNS, because
> not the 60 seconds of unavailability that hurt...  Most users won't make a
> call if it clears up in that period of time.  Considering the general
> reliability of DNS servers, that 60 seconds happens pretty infrequently.
> The problems that the users notice are when the name server is freaking
> for an hour or two that it takes for somone to notice the name server is
> down (usually because of an unusually high number of calls from users
> it), and to get it corrected...

This sounds more like a network monitoring problem than a need for a big
complex load-balanced setup though, doesn't it?  Would not a hot spare DNS
server and proper monitoring be a simpler and more effective solution to the
problem than building in complexities like IP failover and/or a
load-balancer?  (Color this with the above statement that I've not yet seen
a load-balancer that does DNS proxy properly, yet.  But I could be wrong

> DNS is a great service to fail over...  It's relatively easy and you don't
> have to worry about already open TCP connections -- most requests are UDP
> and the TCP requests usually aren't urgent...  While you can easily
> load-balance them, ideally you want a shared cache between all
> load-balancers...

My concern exactly about the load-balanced scenario is that it makes
troubleshooting two orders of magnitude more difficult when checking for
cache issues, if configured wrong.  Monitoring one solid DNS server with the
abiltity to replace it rapidly via NOC monitoring and or paging out of
failure seems a tad more stressful but much easier to implement and
maintain.  Only under supremely high load scenarios can I see where
automatic failover beyond what DNS already provides would be necessary.

But maybe I'm missing something?  Good conversation!

Nate, nate at natetech.com

More information about the LUG mailing list