Lessons From the Biggest Site on Earth
Netscape's site today could be your site tomorrow. Here's what you need to know now.
In mid-September, Netscape became the undisputed king of the hill with the most-often visited Web site on the Internet. According to I/Pro's latest Web traffic audits, Netscape was receiving more than 100 million hits per day. And by early October, Netscape broke the 110 million hit-per-day mark, which consisted of more than 3 million independent sessions, 10 million pages, and 230 gigabytes of data per day.
Yikes! That level of traffic exceeds most local-area networks (LANs), much less most Internet sites. How do you even go about putting together a Web site that can handle that much traffic? To find out, we went to Netscape's headquarters in Mountain View, Calif., to speak to Robert Andrews, the company's chief webmaster.
Andrews told us many of the management techniques Netscape uses are applicable to all Web sites. Netscape manages its infrastructure, servers, and content in a fairly conservative manner, minimizing the chances of connection failure while simultaneously optimizing for performance. All told, Netscape's techniques can be applied to any organization's Web site, whether small or gigantic, with relative benefit.
the Infrastructure
To pump out 230-plus gigabytes of data a day, you first need lots and lots of pipe for both local and Internet traffic. Netscape achieves external connectivity through four T3 (45-Mbps) circuits. These lines are connected directly to MCI's and Sprint's Internet backbone networks. Since these carriers provide much of the world's Internet backbone traffic, connecting them directly to Netscape's network also provides the shortest and quickest connection to Netscape users. Additional circuits dedicated to AT&T, Uunet, and other aggregate carriers are planned.
There also are OC3 (155-Mbps) circuits dedicated to FTP traffic (the majority of Netscape's traffic is from downloads). One of the circuits is installed at MAE (Metropolitan Area Exchange) West in the Silicon Valley area, and another at MAE East in Reston, Va., where Netscape keeps a handful of FTP servers for East Coast users to access. "We wanted to bring the servers closer to the users," Andrews says, "and MAE East serves a large geographic community." Indeed, MAE East provides primary connectivity for most of the northeastern United States, as well as international connectivity for several European network providers.
Each of the FTP installations get 100 Mbps of Internet connectivity, and Netscape's Web site (which runs out of Netscape's headquarters) has access to a cumulative 180 Mbps of Internet connectivity. All of this bandwidth-while not necessarily required for the level of traffic used today-is designed to bear the burden when traffic to Netscape's Web site reaches terabyte levels, which is expected to happen within the next few months.
Supporting all of these pipes are dedicated, top-end 7500-class Cisco Systems Inc. routers. The routers themselves are interconnecte using two sets of local LAN cabling. The primary path is a fast Ethernet LAN that uses a Cisco Catalyst switch, providing 100-Mbps of dedicated bandwidth to each system. This is backed up by an FDDI (Fiber Distributed Data Interface) ring that provides a second 100-Mbps network. The routers are configured to use BGP (Border Gateway Patrol) for automatic rollover, so if the Ethernet switch fails for any reason, the routers and devices will use the FDDI network automatically.
The Servers
The Web and FTP servers used on Netscape's site are optimized for their separate functions. Each system has a full RAID (Redundant Array of Independent Disks) for local data storage, 256 MB of system RAM, and highly optimized system kernels, which do nothing but what they're configured for. They don't even run SNMP monitoring agents, as that would steal system and TCP/IP stack time. They're "lean and mean, for maximum performance," Andrews says. Even the log files are kept on local disks so as to avoid generating any of the network traffic incurred by using SYSLOG.
Over time, Netscape has been able to work with OS vendors to improve the reliability and efficiency of the native TCP/IP stacks and kernels so they're more robust than ever. Whereas some of the systems could support only a few hundred simultaneous sessions in the past, they're now able to support several thousand simultaneously. This exponential improvement at the OS level has boosted Netscape's ability to keep up with demand. "These new advances in IP stack technology are allowing us to run more services on the systems," Andrews says.
Distributing the Load
Although Netscape advertises 20 FTP servers, in reality there are only six systems-SGI Challenger L servers and HP 9000 H-class systems that are capable of supporting 4,000 sessions each.
The new systems aren't maxed out like the ones Netscape previously used, so the 4,000th user gets just as much CPU time as the first. The FTP server software used by all of these boxes is a slightly modified version of Washington University's freeware FTP server, chosen for its flexible configuration and logging options.
Of course, all of Netscape's Web servers run Netscape software. A variety of vendor systems are used as Web servers, including Digital Equipment Corp., IBM, Silicon Graphics Inc., and Sun Microsystems Inc. Most of these systems run Netscape's Enterprise Server, except for those requiring transaction processing like Netscape's General Store.
"Netscape's site is really made up of a variety of specialized systems, though this isn't readily apparent to most users," Andrews says. Just as there is a specialized system for the General Store, there are other systems for online registration and even for the general purpose sites.
The registration system, for example, is used for people who buy Navigator Personal Edition at a computer store and then need to find an ISP for their Internet access. The first time a user installs the Personal Edition, the browser locates the Netscape registration server and prompts the user through a series of locale and pricing questions. Once the back-end system has the necessary information, it locates an ISP and creates the account information on the remote system automatically. All of this is handled invisibly as far as the user is concerned.
All told, there are almost 20 different systems acting as Web servers. Some are for the special purpose systems, but most are for the home.netscape.com systems, which is the default URL used by the Netscape Navigator client software.
DNS Naming Issues
Obviously, there isn't just one system that supports home.netscape.com. There couldn't be, as it alone is responsible for 6.5 million hits in just one hour. Thus, Netscape has the home host name referenced to 32 separate host names through a combination of round-robin DNS and client spoofing. These 32 host names are served by eight individual systems.
"Originally, the site was set up using round-robin DNS lookups, meaning that clients would use whatever system was randomly returned by the DNS servers," Andrews says. "However, not all DNS clients worked well this way." Some PC stacks didn't implement support for round-robin at all; they'd simply use the same IP address all the time, which put too much of a load on any single server.
To get around these problems, Netscape embedded a bypass technology directly into the Navigator client. Now when a user enters home.netscape.com into the URL, Navigator randomly chooses a number between 1 and 32 and sends a request to homeNN. The result is that 32 host names (home1 through home32) act as the destination for any Navigator client that is connecting to home.netscape.com. In essence, a Navigator user will never be able to connect to home.netscape.com, as the client will always intercept the request and convert it to homeNN.netscape.com.
However, there is a DNS entry for home.netscape.com, which is used exclusively for non-Navigator clients since they won't be rewriting the host name. Although the home server has the same content as the other hosts, it also runs extensive monitoring software that logs the different types of browsers used by the visiting clients, providing demographic data to Netscape's webmasters on the types of browsers in use.
Although Netscape's primary site is home.netscape.com, many people enter www into their Web browsers out of habit, so Netscape's webmasters have set up a system explicitly for www.netscape.com. Since this system is accessible to both Navigator and non-navigator clients, it provides another data collection point on the distribution of browsers on their site.
Andrews admits it's highly unlikely that other webmasters will be able to implement DNS load-balancing into their Web browsers directly, but he doesn't apologize for what Netscape does. "We had to deal with these load issues six months before anybody else even had to think about it," he says. And even though DNS technologies have come a long way since then, Netscape will continue to use the client-based lookups for the foreseeable future. "It certainly guarantees evenly balanced servers," Andrews notes.
Content Management
Just as Netscape's servers are split up for the different types of access, so is the content managed according to function. All of the core materials are created and managed by a development team that's part of the marketing organization, which even has a handful of programmers on staff for CGI and JavaScript development.
Web pages and programs are developed using a variety of tools. Once the pages are completed and tested, they are checked into a source code management tool, just like a real development project. The source library can be used to archive files, check for inconsistencies, compare versions, and even restore past versions of pages for re-use.
Having a distributed, decentralized development environment is "really the only way to build and manage super-scalable content," Andrews claims. Otherwise the process would become too bureaucratic, with the various departments fighting for the centralized developers' resources. "The downside is that there is a decreased level of control," he says, which can result in problems when one department steps on another's online efforts by changing a URL or by removing a file.
Once the pages are checked into the archive, the webmaster-on-duty pulls the pages to a master Web server that feeds all of the other systems. Once a night, the master server sends any changed files in the directory tree to all of the other servers using the standard RDIST (Remote Distribution) protocol. At that point, the individual servers archive their daily log files and send them to the internal log server. This host expands the archives and then runs the daily analysis tools against them, generating graphs and charts showing the number of hits, amount of data transferred, and other useful statistics.
Incidentally, this is one of the current bottlenecks for the entire layout. Although the nightly submissions of new pages don't consume much bandwidth, the processing of the log files does. Since the connection between the external site and the internal network relies on 10-Mbps Ethernet, the transfer times for the log files can be huge. The simple act of uncompressing a single log file can take well over an hour due to the huge volumes of traffic that each one sends. "As the volume edges toward terabyte levels, the need for higher-bandwidth, on-the-fly compression will be required for the daily log file analysis to continue," Andrews says.
Learning Curve
While it seems unlikely that any of us will ever have a Web site that generates as much traffic at Netscape's, it's not as far-fetched as you may think. What we're seeing from Netscape's site today is probably going to be very common within the next two or three years, especially for many of the larger consumer-related Web sites that are just coming online now.
There's a lot to be learned from Netscape's experience. First of all, setting up a strong infrastructure makes a big difference in overall performance. Balance the bandwidth equally between the Internet connection and the local server pool. If the devices can't talk to each other quickly, then a high-speed connection is irrelevant. Next, provide direct links to the aggregate carriers to improve performance for the access providers underneath them. Finally, set up a multilayered, fault-tolerant wiring scheme. You don't want to lose all of your servers simply if the hub is accidentally unplugged.
Don't try to put too much load on any one server, unless you really don't expect too much traffic. Optimizing the platform for performance will really make a difference in user satisfaction. For example, out of the 350 gigabytes of data transferred from Netscape's site, more than 70 gigabytes came from the server caches, resulting in low disk times and high performance. Make sure to work with your vendor on optimizing the system kernel so that TCP/IP traffic gets the highest priority, if possible.
Finally, push the content development and management off to the groups that have the most to benefit-probably your marketing department. These people have a much higher level of commitment to good design than you do, so make them responsible for it. Your job as webmaster should be to make sure that everything works smoothly, not to do the layout.
Taken all together, these tips provide a strong overall success strategy that will help you maximize your company's Web presence.
So you wanna be a webmaster?
Managing the largest Internet site on the planet is not as simple as it sounds-it requires a myriad of skills. "It's not just programming HTML and CGI," says Robert Andrews, Netscape's Web site director. "You have to take a holistic approach," looking at every aspect of the process, from how applications generate data down to how the user connects to your site, and all of the layers in between.
With a formal education in physics, Andrews prepared for his position by working in a variety of systems management roles over the years. In his previous job, he worked as a network manager for a large semiconductor company, and before that he was a Unix systems manager for another technology company in Silicon Valley.
This combination of Internet infrastructure and general systems management experience has proved to be his most valuable asset in dealing with his day-to-day management issues, as well as with the strategic design efforts. Andrews points out two requirements for being a successful webmaster-"You have to know [system administration and Internet networking] from top to bottom."
And it's only going to get more complex as the Internet continues to grow. "The future of the Web is one of truly dynamic content," Andrews says. "Not movies or static files, but real-time, dynamically generated material that reflects current events and activity. Web sites are going to be capturing data from other dynamic sources, marking them up and displaying them automatically."