On Saturday, December 3rd, Bb OWL required a complete system restart to resolve performance problems. This extreme action was unavoidable and a direct result of unprecended usage. We are exploring options to reduce the likelihood of a repeat of this problem. Most IT problems are easily solved with money. With the scheduled retirement of Bb OWL at the end of the winter 2012 term, additional spending is not a sensible option. I prepared these notes to explain our current predicament to underscore the urgency of moving courses to our new BbLearn environment as quickly as possible.
Our Bb OWL environment is comprised of multiple servers: 5 application servers and 1 database server. When you login, an applicance known as a load balancer looks at the 5 application servers, picks the one with the fewest number of users, and starts your session on that server. All actions you do such as reading a discussion posting, opening a file with assignment instructions, checking the grade book, etc. are small programs. Each application server is identical so from your standpoint as a user it doesn’t matter which one is running your session.
The vendor, Blackboard, suggests that each of the 5 application servers can comfortably handle about 500 concurrent users. When the user-count exceeds that number, it is predictable that performance degrades. On our 5 application servers we can comfortably accommodate 2,500 users at the same time. Prior to this fall, normal numbers at the peak of the day during busy academic terms topped out at around 3,700 users which is obviously higher than the vendor’s recommendation but was generally OK with little apparent performance degradation.
On Saturday, December 3rd, the number of concurrent users exceeded 4,700 for a short period which is the highest number we’ve recorded since starting operations with our current product in 2006. Our system simply ran out of resources to handle the number of users and the only cure was a complete restart. After the restart and restoration of service, after 60 minutes the number of concurrent users climbed to around 3,500 and remained at that level throughout the afternoon without additional incident.
In an ideal world, to solve this problem all we need to do is install additional application servers to have the horsepower to cope with the occasional burst of high traffic. If we accept the vendors recommendation of 500 users per application server, then to handle the Saturday numbers we need to double our current count. Each server is around $4,000 so this solution has a price tag of $20,000 in hardware before factoring in the cost of setup, backups, etc.
You may have missed above that after the restart the concurrent user count did not reach the 4,700 number at the point of failure. Many of the sessions as the moment of failure were connections which had been idle for up to 120 minutes. These are sessions where users shut down their computers without logging out, or simply walked away from logged on computers. It is important to log out of your session when finished. Today I instructed our system administrator to reduce the time-out interval to 30 minutes. This only applies when sesssions are idle. As long as you continue to take some action in Bb OWL, your session will continue until you logout. Please remember to log out of your session when you are finished.
We are considering turning off services like the Whos Online as it requires continuous resources. Although a nice feature, it’s not essential and gobbles up system resources which are better used in reading discussion postings, submitting assignments, taking quizzes, etc.
So when all is said and done we can solve the problem of occassion traffic bursts by spending money on our current system. However, that is not a sesnsible action to take given our new environment, BbLearn, is ready for use starting in January 2012. It has plenty of horsepower to cope with bursts such as we experienced last Saturday. Spending on our current system is comparable to putting a bigger engine in a Model T with a luxury car sitting empty in the driveway.
Our best option is to get courses in our new environment as soon as possible. Reminder: please log out of your session when finished.