--------------------------------
TSG Report 2

"Rambling on..."

by 'oscar51'

March 2002

The Statistics of A "Rambling Discussion"

On February 28, 2002, at about 8:00 a.m. PST (4:00 p.m. GMT) I finished downloading 212 pages of the Thread "Rambling Discussion" in the TSG "Random Discussion" forum. This thread was started on Thursday, November 8, 2001, at 7:58 p.m. GMT, and is still going strong. After concatenating these 212 pages into one 23.7Mb file (subsequently reduced to 1.2Mb by filtering, i.e., keeping only those lines that contained information relevant to this study), I did a few statistical calculations on the data. To wit:

Elapsed Time Between Replies

Table I shows the number of replies that occurred within the elapsed time indicated. For example, at the top of the table, 2321 replies (55%) were posted within 10 minutes following the previous post and, at the bottom of the table, 57 replies (1.4%) were posted after eight hours had elapsed.

Table I - Elapsed Time Between Replies

Note the change in time scale from
minutes to hours between rows 6 and 7
Elapsed timeReplies
  10 minutes & under2321
  11 thru 20 minutes685
  21 thru 30 minutes307
  31 thru 40 minutes198
  41 thru 50 minutes111
  51 thru 60 minutes 72
  during 2nd hour 238
  during 3rd hour 96
  during 4th hour 42
  during 5th hour 34
  during 6th hour 25
  during 7th hour 20
  during 8th hour 16
  over 8 hours 57
Total replies: 4222

It is obvious in Table I that the number of replies decreases rapidly as the length of time between replies increases. (The small but abrupt increase in number of replies from row 6 to row 7 is due to the change in time scale from minutes to hours.) This trend is characteristic of data that are highly skewed. For this type of data, the concepts of "average value" and "standard deviation", terms commonly used in statistical analysis to describe "normally distributed" (i.e., "bell-curve") data, are quite meaningless -- unless there is a good reason to ignore some or all of the longer elapsed-time replies*. But even when those replies are ignored, the data remain highly skewed and a value for "average reply time", even though it can easily be calculated, is not very meaningful. But that's life in the world of statistics.

* Note: Six of the 57 over-8-hours gaps in reply time were more than 16 hours each and may have been due to server downtime. These six gaps were as follows (times are GMT):


11-12-2001  1:00 a.m. to 11-12-2001  7:07 p.m., 18.1 hours
11-24-2001 11:57 p.m. to 11-25-2001  5:52 p.m., 17.9 hours
01-07-2002  1:07 a.m. to 01-10-2002  5:38 a.m., 76.5 hours
01-12-2002  1:12 a.m. to 01-14-2002 12:45 a.m., 47.6 hours
02-24-2002  1:06 a.m. to 02-24-2002  5:22 p.m., 16.3 hours
02-26-2002  7:28 p.m. to 02-27-2002 11:40 a.m., 16.2 hours

If these six gaps are ignored, the average reply time is about 35 minutes. If, arbitrarily, all 57 over-8-hours gaps are ignored, the average reply time is about 28 minutes. That's as much as can be said under the circumstances.


Day-of-Week Post Distribution

Table II shows the number of posts made during each day of the week based on an 'Eastern Standard Time'-zone (EST) day. Wednesdays, Thursdays and Fridays were the "high days" with Sunday the lowest.

Table II - Day-of-Week Post Distribution

Day-of-WeekPostsHelp
  Sunday 333396
  Monday 681565
  Tuesday 651323
  Wednesday728370
  Thursday 723434
  Friday 702366
  Saturday 405326
Total posts: 42232780

For comparison, the column labeled 'Help' shows the number of posts per day made to the 11 help forums as reported in a study I did earlier (see TSG Success Rate). Note that in this 'Help' column, Monday was by far the "high day" for posting. Interesting...

Time-of-Day Post Distribution

Table III shows the number of posts made to "Rambling..." during each hour of a typical 24-hour day, again based on EST.

Note: By basing Tables II and III of EST, I am assuming that the center of gravity of TSG posting is located somewhere in eastern USA. It turns out this was a bad assumption for "Rambling..."; I should have used CST or MST. However, EST as the "center of posting" is a good assumption for the Help forums.

The number of posts per hour peaks from 3:00 to 5:00 p.m. and then again from 6:00 to 8:00 p.m. The overall distribution of posts probably reflects a normal day for most TSG members: 10 hours sleeping, eating and driving, and 14 hours at the computer.

Table III - Time-of-Day Post Distribution

ESTPosts
  0000 to 0100 94
  0100 to 0200 51
  0200 to 0300 30
  0300 to 0400 22
  0400 to 0500 38
  0500 to 0600 24
  0600 to 0700 45
  0700 to 0800 33
  0800 to 0900 99
  0900 to 1000 209
  1000 to 1100 209
  1100 to 1200 225
  1200 to 1300 307
  1300 to 1400 279
  1400 to 1500 245
  1500 to 1600 357
  1600 to 1700 369
  1700 to 1800 247
  1800 to 1900 329
  1900 to 2000 359
  2000 to 2100 263
  2100 to 2200 175
  2200 to 2300 123
  2300 to 2400 91
Total posts: 4223

Posts per Member

Of the 58 members who contributed to "Rambling..." during the time interval studied, 11 made more than 100 posts. From high to low the number of posts per member was 975, 471, 345, 266, 263, 227, 205, 152, 134, 123, and 122. Among the 47 contributors who made fewer than 100 posts each, 24 made less than 10 each.

Disclaimer

Finally, remember that these data are derived only from "Rambling..." and not from any other threads in the "Random Discussion" forum. So don't use these numbers to jump to any big conclusions one way or another about anything. They are supplied here only for your edification and enjoyment.
--------------------------------