Back to previous post: D-Day

Go to Making Light's front page.

Forward to next post: Heart Attack Casserole

Subscribe (via RSS) to this post's comment thread. (What does this mean? Here's a quick introduction.)

June 8, 2009

Fixing Light
Posted by Patrick at 07:22 AM *

First, thanks to the literally dozens of people who answered our cry for help fixing and upgrading this site. Thanks to John Scalzi for sending even more kindly volunteers our way. And thanks for all the excellent and informative advice in the discussion following our appeal. We are astonished and humbled; we are not worthy. (We also want to see Michael Roberts’ basement.)

Second, yes, various fixes are underway. Thanks to Martin Sutherland—who’ll be helping migrate us to Movable Type 4—we now see how our ancient “View All By” script, dating from 2002-or-so when MT-based blogs might have 2,000 comments in their database, has been more or less beating our server to death, since it’s now 2009 and that same database now contains over 300,000 comments. To address this, Martin has implemented a bunch of sensible fixes which we’ll explain in more detail down in the comments. In addition, comments from people who don’t enter a personal URL no longer default to showing their declared email address. This is part of a larger effort to keep our commenters from being spammed by email-address-harvesters. Yes, we should have dealt with this a while ago.

The various changes rolled in this weekend have quick fixes to urgent problems. Sometime in the next little while, we’ll actually migrate the whole site to MT 4.25, and then start rethinking aspects of the way the place looks and works. Not on the agenda: threaded comments, personal avatars, or a quick sale to mobbed-up Russian “businessmen.” Of course, we would say that.

Comments on Fixing Light:
#1 ::: Dan Guy ::: (view all by) ::: June 08, 2009, 08:18 AM:

Three cheers for Martin!

#2 ::: Ginger ::: (view all by) ::: June 08, 2009, 08:22 AM:

Hip, hip, hurrah!

#3 ::: Tom Whitmore ::: (view all by) ::: June 08, 2009, 10:17 AM:

But good Russian businessmen offerink such a good deal, financed by soon-to-be-received proceeds from Nigerian financial adviser....

Seriously, good stuff and huzzah.

#4 ::: Bruce Cohen (SpeakerToManagers) ::: (view all by) ::: June 08, 2009, 10:44 AM:

And another 3 cheers for everybody involved in the upgrade!

In post-Soviet Russian mob, type moves you.

#5 ::: Terry Karney ::: (view all by) ::: June 08, 2009, 10:46 AM:

I am amused by the aggregated comment stats function.

Cool beans (and wow... I said a lot last year).

#6 ::: abi ::: (view all by) ::: June 08, 2009, 11:09 AM:

To pick up and expand on what Patrick has written, here's the visible side of what we've* done over the weekend.

  1. As several people have noticed, (view all by), heretofore to be referred to as (vab) has been re-engineered to serve up the 20 latest comments and a profile of the user's overall commenting history. You can still get to the full comment list with one more click.

    We did this because (vab) is a resource hog. We'd managed to cut it down enormously by adding some more indexing to the database, but the most prolific commenters still have quite large pages**. Yet one of the most common uses of (vab) is to check for drivebys, which doesn't need more than a general view of a user's history.

  2. While rewriting (vab), Martin made one more small† change: clicking on a comment used to bring you to the head of the thread in which it was made. Now it brings you to the comment in question within the thread.
  3. Search engines are now barred from indexing (vab) pages. The most prolific commenters (with, therefore, the biggest histories) are also the ones whose (vab) pages change the most. Therefore they were the most frequently crawled pages. Kerosene, meet fire.
  4. Comments are still searchable within their comment threads.

  5. Concurrent with the desire to stop redlining the server, we also wanted to tackle the privacy aspect of (vab). Having the URL of people's (vab) pages list their email addresses was a lovely SPAM ME flag, as many commenters have mentioned over the years. Now the URL refers to the comment number whence the page is generated. The email address is derived behind the scenes and never shows on the front end.
  6. Note that, under the covers, (vab) is still indexed by your email address.

  7. There is also a backwards compatibility accommodation that will take an old-style (vab) link in and spit you out a new-style page§.
  8. Another privacy leak was the email link on the comments from users who don't enter a URL. Again, these were a rich source of email addresses for spammers. So now if you don't enter a website when you comment, your name is not a link. (Links to specific comments remain unchanged; get them by clicking on the comment date/time.)
- o0o -

There were more things behind the scenes and under the covers, of course: backups, security stuff and stats, mostly.

But barring explosions, alarums and excursions, that's the extent of changes for the next wee while. I've tested things as Martin has implemented them, but I'm sure there are edge cases I'm missing. Feel free to report them here, or email them to us privately if you prefer. We can't promise to patch minor things right away, but we'll get to things as we have time.

I'd like to join Patrick in thanking everyone who's offered to help (including, of course, Martin, whom I did not pressure into this at all‡). This is part of what makes us a community, along with the poetry, puns and mutual support in hard times.

-----
* for values of "we" that mostly mean Martin, in consultation with Patrick and me
** like, um, me at 3 Mb
† but perfectly formed
§ Thanks to David Harmon for providing that use case.
‡ as in, I didn't know he'd offered till Patrick said he was minded to accept

#7 ::: Angiportus ::: (view all by) ::: June 08, 2009, 11:10 AM:

Glad that things are going better. Sorry I myself couldn't help.
I hope that all the changes won't make it harder for a poor person with dial-up access, and not a whole lot of tech skills, to read and comment. This is one of my favorites.

#8 ::: Patrick Nielsen Hayden ::: (view all by) ::: June 08, 2009, 11:22 AM:

"I hope that all the changes won't make it harder for a poor person with dial-up access, and not a whole lot of tech skills, to read and comment."

We hope to generally keep ML usable by people with trailing-edge tech. Of course, as in all things having to do with technology, "trailing edge" is a moving target.

#9 ::: Terry Karney ::: (view all by) ::: June 08, 2009, 11:30 AM:

Urf... I think I need to make a post to link, or at least cross reference to posts made under other emails. I think, when all is said and done there were two. Because I am sure there were more than two posts from me in 2004.

#10 ::: abi ::: (view all by) ::: June 08, 2009, 11:43 AM:

Terry @9:
I've IM'd you with a note of the email address used to index this (vab). It has the following profile:

2006 252
2005 111
2004 105
2003 37

(It also has an unclosed italic tag...must go sort that out.)

Might be useful to cross-link the two.

#11 ::: LMB MacAlister ::: (view all by) ::: June 08, 2009, 12:05 PM:

Although I haven't made much of a contribution with my comments here, I know I changed addresses when I came back in April after my difficulties of last year. Still, any vab page of mine would mostly say "not much."

#12 ::: Earl Cooley III ::: (view all by) ::: June 08, 2009, 12:39 PM:

One thing I've noticed is that it appeared that View All By pages don't have their HTML scrubbed nearly as strenuously as the main comments pages, thus leaving questionable bits (like embedded videos, for example).

#13 ::: Serge ::: (view all by) ::: June 08, 2009, 12:54 PM:

abi @ 6... the most prolific commenters still have quite large pages

It's not my fault!
("Yes it is.")
I didn't do it!
("Yes you did.")
You can't prove a thing.
("Yes we can.")
Drat.

#14 ::: Xopher ::: (view all by) ::: June 08, 2009, 12:59 PM:

All good changes; brave (two syllables)!

Good HEAVENS but I yakked a lot in 2007.

#15 ::: Erik Nelson ::: (view all by) ::: June 08, 2009, 12:59 PM:

The new feature shows us that people's commenting habits have grown a lot over the years.

If present trends continue, we will reach the commentularity.

#16 ::: Chris Sullins ::: (view all by) ::: June 08, 2009, 01:04 PM:

I'm glad to see the e-mail addresses are gone from the VAB links. I was hoping that would get fixed in the upgrade process. On those very same comment history pages, you have either the default of "view past 20" or the overkill "view all by". It would be very nice to have some form of pagination on those.

I also noticed while previewing this comment that on the preview comment pages you're using a different template than on the main post page. It's just a slight thing, but there ought to be a way to include the same template/sub-template so you don't have to change things in two places every time you update the site. (I'd know, except I've never used MT.)

Good to hear things are getting better on the backend. Those are the changes that are least noticed by the end-users, but still extremely important. Keep it up :)

#17 ::: Mikael Vejdemo Johansson ::: (view all by) ::: June 08, 2009, 01:05 PM:

Random techy thought. The issue with searching for all comments by specific commenter - it sounds like a good indexing design might go a long way for this: something like reindexing once a day during lull times or so, to make the search being less insane and more using-databases-inherent-data-handling-mechanisms?

Of course, I expect your other tech rats to already be on top of this and more...

#18 ::: Liza ::: (view all by) ::: June 08, 2009, 01:38 PM:

A use question: If I change my listed URL but use the same email address, will Making Light still know it's me? (I switched my main journal site.)

#19 ::: Serge ::: (view all by) ::: June 08, 2009, 01:39 PM:

Weird. I blabbed LESS in 2008 than in 2007.

#20 ::: Daniel Klein ::: (view all by) ::: June 08, 2009, 01:44 PM:

Not on the agenda: threaded comments, personal avatars, or a quick sale to mobbed-up Russian “businessmen.”

Aww, no avatars? Can't you at least allow signatures with animated gifs? Or the blink tag?

Congrats on diving into lake Upgrade. May it be as painless as can be (which is probably not much, but hey, as my old gym teacher used to say, pain is just your body getting better.)

#21 ::: Tech Rat ::: (view all by) ::: June 08, 2009, 01:50 PM:

The overall plan for the fixes/upgrades is roughly as follows:

  1. Pre-upgrade: fix the immediate performance problems, and stop leaking email addresses.
  2. Upgrade: migrate from MT 3.3x to MT 4.2x.
  3. Post-upgrade: improve spam handling, optimize templates, tweaks and tuning.

Phase 1 is now mostly complete. Phase 2 will probably start next week-ish.

For the technically minded, the major performance issue was caused by the View-all-by (vab) page, commentlist-oneauthor.php. Every time the page loaded, the underlying PHP code executed a SQL query along the lines of "select * from mt_comment where comment_email='monkey@example.com'". The mt_comment table did not have any indexes to help the execution of this query, so MySQL had to do a full table scan every time. The mt_comments table for Making Light has 300,000+ rows in it, and so the query was using up a lot of resources, regularly taking several seconds to deliver its results.

Compounding the problem was the fact that the (vab) page showed all comments for the specified commenter. For many commenters, the page size was in the multi-megabyte range. Abi's page, for example, was over 3MB. Even over a fast broadband link, that much data takes a non-trivial time to deliver, and keeps an Apache process busy for the duration.

Now, most of the requests for the (vab) script come from automated processes: bots, indexers, crawlers, spiders. Clever indexing systems, such as googlebot, keep track of how often a page is updated, and they will visit frequently updated pages more regularly. And because there is a loose correlation between how often someone posts comments here, and how many comments they have posted, the biggest pages (which consume the most resources) were being requested the most often.

The number of page requests we're talking about is substantial: the (vab) page received 4,152,243 hits in May. That's 1.5 incoming requests per second on average, for a page that took several seconds to process. The server just could not keep up. As we say in the trade, "well there's your problem."

#22 ::: valkyrie ::: (view all by) ::: June 08, 2009, 01:56 PM:

so what happened?

#23 ::: Rob Rusick ::: (view all by) ::: June 08, 2009, 02:01 PM:

Erik Nelson @15: If present trends continue, we will reach the commentularity.

The same sort of AI that tries to complete your words for you now will expand to complete your sentences. Soon it will do the reading for you, and put together characteristic responses (based on your past postings — mine would italicize parenthetic comments).

#24 ::: abi ::: (view all by) ::: June 08, 2009, 02:48 PM:

Liza @18:
If I change my listed URL but use the same email address, will Making Light still know it's me?

Yes. (Vab) is indexed by email address with no reference to URL.

I've just changed my URL for this comment, but when you click on the (vab) it should show up in my history.

#25 ::: abi ::: (view all by) ::: June 08, 2009, 02:49 PM:

And, just to finish the test, I'll now post this with no URL associated with it.

#26 ::: abi ::: (view all by) ::: June 08, 2009, 02:54 PM:

And all of them appear in my (view all by).

Abi, trusting but verifying as always

#27 ::: abi ::: (view all by) ::: June 08, 2009, 03:03 PM:

Earl @12:

Do you have an example we could look at?

#28 ::: Jules ::: (view all by) ::: June 08, 2009, 03:09 PM:

A quick question for the MT gurus: would it be possible, as well as listing the most recent 20 comments from a user to list the _first_ comment from the user? It would make things clearer for the people looking at (for example) my profile, and stop them assuming that I started commenting here in 2008 (the first post from my current e-mail address was this one, which includes a link back to my previous address's vab page -- and thanks, by the way, for making sure that link still works despite the new format). And I'm sure I'm not the only one who's used this approach... I can't remember who I shamelessly stole the idea from, though.

Oh, and thanks for making the comment list link back to the posts... that could be really helpful. :)

#29 ::: Jules ::: (view all by) ::: June 08, 2009, 03:13 PM:

Actually, thinking about it, there's no way to do what I just asked without running 2 separate SQL queries, or the same one you've just changed from, so I guess that'd probably undo a lot of the benefit of the recent changes. Forget I asked. :)

#30 ::: abi ::: (view all by) ::: June 08, 2009, 03:26 PM:

I think there is a general problem with linking commenter histories across email address changes. I don't know if there's a clean way to solve that.

I'll bat it by the Tech Rat over the next wee while, but for the moment, the only solution I can suggest is to cross-reference your (vab) streams by making a comment with each of the two email addresses linking to the other.

#31 ::: Clifton Royston ::: (view all by) ::: June 08, 2009, 03:40 PM:

Tech Rat:

Two additional suggestions to address the (vab) problem, both perhaps obvious:
1) The robots.txt file for the site should certainly request that the most CPU-intensive pages not be crawled; that's in large part what it's for. Perhaps you could allow the 20 recent postings list to be crawled, but you should probably disallow the full vab from Google-dexing etc. until you know the site can do it without breaking a sweat.
2) I assume you've added an additional index on user email address to the mt_comments table?

#32 ::: Liza ::: (view all by) ::: June 08, 2009, 03:40 PM:

abi @ 24-26: Thank you!

#33 ::: Clifton Royston ::: (view all by) ::: June 08, 2009, 03:42 PM:

Jules: Actually, what you just suggested sounds pretty reasonable to me. Two short queries should be a lot faster than one gargantuan query.

#34 ::: Terry Karney ::: (view all by) ::: June 08, 2009, 03:46 PM:

And we are lucky to have the opportunity to benefit from the effects of the "abivield"

#35 ::: Dave Fried ::: (view all by) ::: June 08, 2009, 04:16 PM:

In my experience, the killer is not necessarily the number of SQL queries that must be done, but rather the time each query takes and the amount of data that must be transferred between the SQL server and the PHP process (and then, of course, across the interwebs to your browser).

For example, to retrieve the most recent 20 posts, you might do:

SELECT * FROM comments WHERE email = 'foo@bar.com' ORDER BY postdate DESC LIMIT 20

And to get the earliest one, you'd query:

SELECT * FROM comments WHERE email = 'foo@bar.com' ORDER BY postdate LIMIT 1

Now, if both 'email' and 'postdate' are indexed (i.e. hashed into some internal quick-lookup mechanism for fast retrieval and sorting) then these are both very, very fast operations even for very large databases. Effectively, the SQL server will only look at a total of at most 21 entries, because it knows where to find them. The only added overhead for including the second query is the back-and-forth time for the PHP script to talk to the SQL server, which is not very much compared to the transport time for it to send the response back to your browser.

On the other hand, if neither 'email' nor 'postdate' are indexed, the SQL server will have to manually search 300,000 entries and sort the results for each query, so the overhead for multiple queries is prohibitive.

Smart indexing is one of the most important keys to good database design.

#36 ::: Dave Fried ::: (view all by) ::: June 08, 2009, 04:18 PM:

Sorry - that was @Jules (#29)

#37 ::: Michael Roberts ::: (view all by) ::: June 08, 2009, 04:45 PM:

The problem isn't seeing the basement; the problem is keeping your experience in memory when you leave. I've really gotta do something about that.

#38 ::: David Manheim ::: (view all by) ::: June 08, 2009, 04:58 PM:

YAY! Hooray for you.

Dave @ 35:
I thought that most DB's have pretty good indexing that can be turned on easily, or by default. I haven't been a DB admin, so I'm not really clear on it, but I thought one would assume this was true before.

The question I have is: what was redlining? Was the processor maxed out, or was it a memory issue? (Or both. It could just be a single badly written bit of code looping when it shouldn't.)

#39 ::: Earl Cooley III ::: (view all by) ::: June 08, 2009, 05:02 PM:

abi #27: Do you have an example we could look at?

Yes. The thread is Swine flu and information hygiene, comment #118, which has a YouTube video embedded in the poster's full "view all by" page.

#40 ::: Martin the Tech Rat ::: (view all by) ::: June 08, 2009, 05:14 PM:

@Clifford (#31) and @Dave (#35): as Abi mentioned in her explanatory post, adding optimized indexes for the relevant tables was the first thing we did. Also, an updated robots.txt file has been in place since the weekend.

Several people have noted that the current system doesn't deal with multiple email addresses very well. The (vab) page only shows comments from a single email address. Movable Type 3 does not have a built-in system for user profiles/user registration, so barring active human assistance to link addresses together, we have no way of knowing that monkey@example.com and fez@example.com are actually the same person.

MT4 does have user registration, so the option will be there when the upgrade is complete. Whether Patrick and Teresa want to make use of it is a completely different discussion...

#41 ::: NelC ::: (view all by) ::: June 08, 2009, 05:39 PM:

While you're upgrading, can I ask what the "Don't make me type all this again" tick-box does? I've a vague idea I ticked it once, a long time ago, and I don't seem to have needed to again, so presumably it works. Whatever it does.

#42 ::: KeithS ::: (view all by) ::: June 08, 2009, 05:49 PM:

Thank you very much to Martin for helping out with all this.

A note that older entries (e.g. here) still show people's email addresses.

NelC @ 41:

The tickbox saves whatever you currently have in the name, address, and URL boxes to a cookie on your browser.

#43 ::: Martin the Tech Rat ::: (view all by) ::: June 08, 2009, 05:59 PM:

@Earl #39 - thanks for the test case - should be fixed now.

In general, there are likely to be some small differences between the way comments show up in the normal entry archives, and on the (vab) page.

The reason for this is that the (vab) page is a custom script. It bypasses Movable Type's template and text filtering system, and rips the comments directly from the database. With its bare teeth. I've tried to match the text filters that MT applies to the comments, but I'm sure to have missed a few things.

#44 ::: Xopher ::: (view all by) ::: June 08, 2009, 06:05 PM:

NelC - it stores a cookie on your computer so the information is prefilled next time you go to the Comments page.

#45 ::: David Harmon ::: (view all by) ::: June 08, 2009, 06:35 PM:

Thanks, Martin, and admins! Sounds like you've got a pretty good handle on things.

#46 ::: David Harmon ::: (view all by) ::: June 08, 2009, 06:43 PM:

How long will it take for Google to re-index the site by thread & comment links? So far, searches are still getting "vab" links. When clicked, these do come up as new-style "last-20" pages.

#47 ::: Dragoness Eclectic ::: (view all by) ::: June 08, 2009, 06:46 PM:

As a computer programmer who has done a lot more embedded and technical processing than database and web applications, I find this thread quite educational. Thanks for giving us the "behind the scenes" look at what you are doing to this beastie.

#48 ::: Clifton Royston ::: (view all by) ::: June 08, 2009, 08:54 PM:

Martin: Thanks, now I see where those were both mentioned. I must have skimmed that comment too quickly.

#49 ::: Clifton Royston ::: (view all by) ::: June 08, 2009, 08:57 PM:

David Harmon: Google only knows how long Google will take to reindex the vab pages, but from my past experience I'd guess sometime within the coming week.

#50 ::: John Houghton ::: (view all by) ::: June 08, 2009, 08:59 PM:

Thanks to all who are busy behind the scenes fixing and improving things!

Many hands Making Light work...


Now that user addies aren't visible (and from the user view only the behind-the-scenes user-index) , I can't think of anything that would break* if there was an admin script to do a find-and-replace to update old entries to the current email for a user (other than writing the short script and admin time to actually do the work, and I know that in our universe Free Time can only be expressed with negative values)**.

*Without having actually like looked at the code or anything. I've been busy doing nested repairs to my pickup truck — you know, the kind where you find something else critically wrong when you do the dis-assembly for the original straight forward fix. The driver's side upper ball-joint is going to be a real pain because the almost inaccessible (frozen) bolts have an interference fit with the fuel line.

** I push for this for the sake of others, of course. The fact that I'm going to be dropping the email address I use here soon has absolutely nothing to do with it.

#51 ::: heresiarch ::: (view all by) ::: June 08, 2009, 09:28 PM:

Thank you everyone who's helped out--kudos to Martin especially! One request, one question and one comment:

Would it be possible to make the yearly comment counts into links to a page of all the comments from that year? From my perspective, I mostly use the vab pages when I'm trying to find a comment someone (or I) made sometime, and being able to search by year would be nice!

How many different "It may take () to load." categories are there? So far I've found (nothing), "It may take some time," "It may take quite a while," and "It may take a frightfully long time to load." Fun!

Aesthetically, I don't like the differences in people's names between the blue bold of those with sites and the flat black of those without. /twocents

#52 ::: Lee ::: (view all by) ::: June 08, 2009, 10:31 PM:

John, #50: Around here, we call that "onion repairs," and what it means is that once you get to the original thing that needs to be fixed, you see something else that isn't quite broken but soon will be, and which will be MUCH easier to replace now while you've got it torn down to that point from preparing to fix the first thing; lather, rinse, repeat. It's not quite yak-shaving, but definitely a related category. It can also turn a quick, cheap repair into an all-day sucker that racks up significant bucks.

#53 ::: xeger ::: (view all by) ::: June 09, 2009, 12:23 AM:

Huh! I didn't realize that I'd been posting here as long (and as verbosely) as I seem to have...

(and, er ... no indices at all, or just not on the pertinent tables? )

#54 ::: John Houghton ::: (view all by) ::: June 09, 2009, 01:12 AM:

Onion repairs. That fits. I'm not quite to the tears stage, but it does feel like I'm rebuilding the vehicle piece by piece around the VIN tag.

#55 ::: Arachne Jericho ::: (view all by) ::: June 09, 2009, 03:32 AM:

Thinking of the future...

When looking for anti-spam, consider something that uses Javascript and cookie checks over something that analyzes email addresses and comment content.

The multiple links thing is still a good catch, but anything further on smart analysis requires a lot of upkeep and sometimes result in strange things like false positives and auto-banning entire IP blocks from Ireland.

A Javascript/cookie check works extremely well, however, with far less maintenance. Only drawback is that users without Javascript or cookie support can't post.

... which I don't know how much of a drawback that is these days. Even older browsers support Javascript and cookies. Ancient is another question, but I doubt anyone is browsing ML with lynx 0.5 or Netscape 1.0 (or Mosaic...).

#56 ::: LLA ::: (view all by) ::: June 09, 2009, 03:49 AM:

Arachne Jericho @ 55:

I'm not very technically minded at all (I've read most of this discussion with eyes and mouth open wide as some of the secrets of Good Website Design have been spilled to us mere mortals) but I know that, for security reasons, I delete cookies on a regular basis and I have disenabled Javascript under certain situations.

I have no idea if there's another way to maintain site security that gets around my simplistic precautions that wouldn't make me want to take further precautions.

#57 ::: Earl Cooley III ::: (view all by) ::: June 09, 2009, 04:07 AM:

Arachne Jericho #55: A Javascript/cookie check works extremely well, however, with far less maintenance. Only drawback is that users without Javascript or cookie support can't post. ... which I don't know how much of a drawback that is these days. Even older browsers support Javascript and cookies. Ancient is another question, but I doubt anyone is browsing ML with lynx 0.5 or Netscape 1.0 (or Mosaic...).

Old browsers are not so much the problem as making sure people who use screen readers and other assistive technologies can still use the web site; also, protective software, such as the popular NoScript Firefox extension, could potentially also block people from posting if it's tied down to use of JavaScript.

#58 ::: xeger ::: (view all by) ::: June 09, 2009, 07:22 AM:

Arachne Jericho @ 55 ...
A Javascript/cookie check works extremely well, however, with far less maintenance. Only drawback is that users without Javascript or cookie support can't post.

Indeed -- I certainly wouldn't be able to post.

... which I don't know how much of a drawback that is these days. Even older browsers support Javascript and cookies. Ancient is another question, but I doubt anyone is browsing ML with lynx 0.5 or Netscape 1.0 (or Mosaic...).

Supported is one thing -- a good idea is a completely different one.

#59 ::: Julia Jones ::: (view all by) ::: June 09, 2009, 07:44 AM:

Adding another anti-vote on the Javascript thing. I have never been completely unable to use a mouse, but there have been times when I have used speech recognition software because it *hurt* to use a mouse and keyboard. I am underwhelmed by suggestions that deliberately excluding non-Javascript users is a minor drawback on a website.

It is also a potential legal problem in jurisdictions where disability access is taken seriously -- and while it's rare for anyone to actually do something about this at the legal level, I would be disinclined to hand critics of the site such an easy target for "look at those hypocrites!" commentary.

#60 ::: Liza ::: (view all by) ::: June 09, 2009, 10:07 AM:

Julia @ 59: How is speech-recognition software incompatible with Javascript? (From my wording that could be read as a snarky question, but it's not meant as one. I trust there is some connection and I'm very curious what it is.)

#61 ::: Joel Polowin ::: (view all by) ::: June 09, 2009, 11:00 AM:

I remember that one of my earliest posts to ML was in response to James Macdonald writing, in a thread about intellectual property, "How is a notebook like a house?" or words to that effect. I replied that "Edgar Allan Poe wrote in both"; somewhat later, Abi quoted that back, saying that she didn't want me to think that that had gone unnoticed. I can't find that message in Google, and haven't been able to for some time. Would that thread have been lost in the Great Crash last year?

#62 ::: abi ::: (view all by) ::: June 09, 2009, 11:20 AM:

Just to be clear, we're reading this thread, but not everything that people want is (a) possible, (b) a good idea, or (c) going to happen. Things that we don't answer are things for which the answer is probably "um", or possibly "can I think about this when I've had more sleep?" (Evening collaborations with people six hours behind you make for late nights.)

A couple of specific replies:

John @50:
an admin script to do a find-and-replace to update old entries to the current email for a user

Although there may be nothing theoretically wrong with doing that (I don't know), it makes my archivist's skin itch something fierce to even contemplate. You're talking about changing original data.

A less visceral reaction: sorry, but it's not on the immediate horizon, despite the trouble that not doing it may cause you.

Arachne Jericho @55:

The server-side spam filtering that we use requires a good deal of manual intervention and human input, it's true. But I think we're doing a pretty not-sucky job of keeping the spam off of ML whilst getting in the way of the conversation as little as possible. I don't recall ever blocking half of Ireland, and I try to release those few comments that end up in moderation on a timely basis.

Certainly, under no circumstances will we be doing anything so accessibility-limiting as requiring JavaScript. I will hunt down spam by hand and kill it with stone knives and bearskins before I agree to compromise the ability of the widest possible population to post here.

#63 ::: abi ::: (view all by) ::: June 09, 2009, 11:21 AM:

(NB: Heresiarch gets the little crown we give to People Who Notice Things.)

#64 ::: Kevin Reid ::: (view all by) ::: June 09, 2009, 11:37 AM:

Other variance in the HTML processing in the VAB pages: I noticed that on my VAB page whenever I use a <blockquote> element there is extra whitespace where there isn't in the main comment threads; I haven't looked at the source but I assume there is an overeager line-breaks-to-paragraphs processor at fault.

#65 ::: fidelio ::: (view all by) ::: June 09, 2009, 11:58 AM:

re 62: How does one kill spam with bearskins? I can see how the stone knives would work, but the ursine pelts throw me for a loop. Are they used to smother the stuff? Or does beating it soundly about the (putative) head and shoulders cause it to become frightened and confused so that it can be herded over a cliff and die upon the boulders below?

#66 ::: Julia Jones ::: (view all by) ::: June 09, 2009, 12:02 PM:

Liza @60: I don't know how much of an issue it is with current versions, because I haven't had to use Dragon in earnest for a while. But a common feature of accessibility software packages is that they use the standard keyboard commands built into browsers to access links in a webpage -- for example, you can TAB through the links on a page, and and a lot of software will use that to move to the next link. Or you can tell the software to select a particular piece of text, and then carry out an action linked to that text.

But this requires the links to be present in the html source code for the page. The Javascript "solution" to spam works because it deliberately and with malice aforethought makes it impossible to access links using those keyboard commands. The link is generated by the Javascript, rather than being in clear in the page's source code, which makes it hard (though not necessarily impossible) for a scraperbot to get at it. With such pages, a human can see the link on the screen, and get at it with a mouse (or mouse-like pointing device) -- but a human who can't see or who can only use keyboard commands to move the cursor around the screen is left high and dry.

This tends not to be obvious to people who never have reason to use those keyboard commands (and many people don't even know the commands exist). Thus the often suggested Javascript solutions to various problems, shortly followed by a flurry of clue-by-fours from those of us who have reason to know why this is a Bad Idea.

I personally would be surprised if it were seriously considered by Those In Authority, even without the wails of protest; but the wailing does become something of a reflex action when you've seen your several dozenth "but it only affects a few die-hards who won't upgrade their browsers, why should we care about them" conversation. It often requires several posters pointing out that *they* *personally* will be affected even with the latest software to get past that mindset, hence the pile-on.

One good test of a website for this purpose is to see whether it is usable in the Lynx browser. If you can't get to parts of the site (including email links) when you view the site in Lynx, then people using accessibility software probably won't be able to get there either.

#67 ::: Kip W ::: (view all by) ::: June 09, 2009, 12:19 PM:

John Houghton @50 has me thinking that "Many Hands" is a collocation that should be on the front page somewhere, referring either to the good folks who read and comment here, or to the possibly even better folks who quietly lubricate squeaks and remember when the eclipses are.

#68 ::: abi ::: (view all by) ::: June 09, 2009, 12:47 PM:

fidelio @62:
How does one kill spam with bearskins?

It depends how well-cured the bearskins are. If they've been properly tanned, then yes, one smothers with them. If not, the smell alone will kill spam at ten paces.

#69 ::: Liza ::: (view all by) ::: June 09, 2009, 12:47 PM:

Julia @ 66: I see. Thanks for taking the time to explain so thoroughly, I learned something! (And wow, I haven't used Lynx in years. It'd be fun to pull it back up, I may do that!)

#70 ::: Julia Jones ::: (view all by) ::: June 09, 2009, 01:29 PM:

Liza @69: better to explain it so that people who haven't run into this before can understand why it's an issue. :-)

I'm not against Javascript as such, as it can be a useful tool -- just against people using it for essential functions without providing a keyboard alternative.

#71 ::: Ingvar M ::: (view all by) ::: June 09, 2009, 01:32 PM:

Abi @ #62:

If I were to implement "find all posts, for users who have used more than one mail address", I'd probably end up using a secondary table, to house all "extra" addresses, with "canonical" and "extra" on each row (plus, I guess, the identity transformation for each address). That way, it would be possible to get it all done using "just one more join".

However, I'd probably not implement it, unless the customer (me, in the case of me, Teresa and Patrick in the case of Making Light) wanted it and I'd want to run some benchmarks to see how much of a resource hog it would be.

#72 ::: Terry Karney ::: (view all by) ::: June 09, 2009, 03:31 PM:

fidelio: One has to use unwashed bearskins. The spam is well aware of the efforts the Ursine Race will undertake to consume it. The smell of bear causes it to freeze; like a mouse which notices the shadow of a hawk. On closer approach the spam panics, but people are able to run it down, and so carve it into small cubes to fry up with eggs, or potatoes; using the aforementioned stone knives.

#73 ::: Mycroft W ::: (view all by) ::: June 09, 2009, 05:01 PM:

And since JavaS...t (yes, it does have a real name, but it's just as unpronounceable as the correct name is unprintable, for those of us on the wrong end) is very, very good at making my browsing experience what Someone Else wants (as opposed to what I want) it Doesn't Get Run Here.

Not only because I like to be in control over who has access to my computer's memory, but because happy flickering graphics DIE, and popups/popunders/jumplinks-to-phishers/malware-downloaders frequently use JavaS...t (or Flash, or whatever), and my life is just better without them. Even if that means that I'm not able to buy stuff from certain companies (which I'm sure I'm more disappointed about than they are, or they'd provide a Mycroft-friendly way of seeing their site). Even if there are end-user-friendly uses for JavaS...t.

#74 ::: fidelio ::: (view all by) ::: June 09, 2009, 05:39 PM:

abi, Terry--once again you have proved how many useful and wonderful things it's possible to learn here at Making Light.

#75 ::: Terry Karney ::: (view all by) ::: June 09, 2009, 08:39 PM:

I have noticed (not that it's critical) new posts take longer to make it to the 1,000 most recent list.

Other than that, things look good from here.

#76 ::: Rob Rusick ::: (view all by) ::: June 09, 2009, 09:49 PM:

abi @62: I will hunt down spam by hand and kill it with stone knives and bearskins [..]

I take it the real problem is that you need to construct a duotronic memory unit with the primitive technology available in this era.

#77 ::: heresiarch ::: (view all by) ::: June 10, 2009, 12:49 AM:

abi @ 63: "(NB: Heresiarch gets the little crown we give to People Who Notice Things.)"

Oh it's so sparkly! *adjusts crown to a jaunty angle*

fidelio @ 65: "How does one kill spam with bearskins?"

So many ways!

One digs a hole and stretches the bearskin over it, conceals it with dirt, and then drives the spam over the pit.

Or one cuts the bear skin into thin strips and braids them together to create rope, and uses it to lasso the spam.

Or one sews a spam-costume out of bear-skin, and uses it to get near enough to employ those stone knives.

Or, finally, one goes to the resident spam-hunter and says, "I'll give you this fluffy new bearskin if you kill that spam for me!"

#78 ::: Bruce Cohen (SpeakerToManagers) ::: (view all by) ::: June 10, 2009, 02:27 AM:

heresiarch @ 77:

The simplest, albeit most dangerous, method is to leave the bear in the skin, and simply point it at the spam. Spam is much less able to protect itself than the "pig" it came from, so a bear is quite happy to run up to it, grab it, and eat it. The danger lies in not having the bear notice you instead of the spam.

#79 ::: Bruce Cohen (SpeakerToManagers) ::: (view all by) ::: June 10, 2009, 02:29 AM:

ohnosecond followup:

Of course I meant "The danger lies in not having the bear notice you instead of the spam."

#80 ::: Terry Karney ::: (view all by) ::: June 10, 2009, 11:17 AM:

Bruce (StM): I think both of those can be read both ways. English is such a clever language.

#81 ::: Dave Weingart ::: (view all by) ::: June 10, 2009, 12:09 PM:

Bruce @ 78: Running shoes. All you have to do is be faster than the spam.

#82 ::: Earl Cooley III ::: (view all by) ::: June 11, 2009, 01:29 AM:

I've noticed over the last few years that ML has the highest rate of unintentional double posting of any site I know. Is this something that can be reasonably investigated?

#83 ::: Serge ::: (view all by) ::: June 11, 2009, 01:51 AM:

@ 81... All you have to do is be faster than the spam.

Is spam's span less than a man's when he ran?

#84 ::: Dave Bell ::: (view all by) ::: June 11, 2009, 03:41 AM:

I don't think the double-posting is all that common, and I'd wonder how much is down to a failure to respond, while under heavy load.

Something to think about, I agree, but I think we can wait a while for a fix. See what happens with the rest of the work.

#85 ::: Serge ::: (view all by) ::: June 11, 2009, 11:12 AM:

I'm rather amazed that, considering Martin's nationality, nobody has yet stooped down to making jokes about Scottish engineers crawling deep inside high-tech machinery.
("Well, Serge, if someone has to stoop down, you're the best.")
I heard that.

#86 ::: Terry Karney ::: (view all by) ::: June 11, 2009, 11:18 AM:

Earl Cooley: The double posting seems to be more an artifact of us, and preview, than it is anything else.

The occaisions when I have double posted are when I had the system either lock up, or fail to post, and resent without checking.

I had one happen recently because of the slower feed from page to, "last 1000" (which is what I do these days to see if a comment which was hung up for some reason made it through) and so I thought it had failed.

I think it's more observational awareness. When one reads all the posts, all the time, one sees more. I see them on other sites too.

#87 ::: David Harmon ::: (view all by) ::: June 11, 2009, 11:38 AM:

I'm pretty sure I've had double posts where I'd clicked and switched to another tab (that is, not kept clicking the button). But hey, if the issue doesn't go away with the upgrades/repairs, then Martin can try to figure out what's up with that.

And now my preview is bouncing off OpenDNS, whcih claims that nielsenhayden.com is "not loading right now". It pings OK, though... Ah, here we go, working again.

#88 ::: Serge ::: (view all by) ::: June 11, 2009, 12:50 PM:

Terry Karney @ 86... I had the system either lock up, or fail to post, and resent without checking

"I resent that."

#89 ::: Earl Cooley III ::: (view all by) ::: June 11, 2009, 01:09 PM:

I think perhaps that the double posts are due to more than one cause. It would be great if some of that could be abated by user interface improvements. Other causes may prove to be more intractable.

#90 ::: Xopher ::: (view all by) ::: June 11, 2009, 01:11 PM:

That's the response to "I hope you won't mind resending that and telling me when you have...or if you do mind, please let me know."

WHICH answer cannot be determined.

#91 ::: Raphael ::: (view all by) ::: June 11, 2009, 01:28 PM:

Re the javascript discussion, out of curiousity, what exactly are the scripts from nielsenhayden.com doing, given that they're not necessary for basic access matters?

#92 ::: Martin Sutherland ::: (view all by) ::: June 11, 2009, 02:46 PM:

@Raphael #91:

  • The SetStyleExample.js is used for the style switcher, to make the text bigger or smaller. After the upgrade to MT4, I'd like to change this to a body switcher script (viz. Invasion of the Body Switchers and Look who's switching too), and optimize the CSS files.
  • feed.js is an external script from blogads.com to display adverts in the sidebar.
  • urchin.js is a Google Analytics page tracker.
  • On the individual entry archives, there is an in-line script that deals with the "Don't make me type all this again" functionality: storing your email address, name, and URL in a cookie, and reading them back out again when you next visit the page.
  • All of these scripts are progressive enhancements, so the page works just fine without them. (For example, if you have JS turned off.)

#93 ::: Raphael ::: (view all by) ::: June 11, 2009, 05:06 PM:

Ah, thanks.

#94 ::: David Manheim ::: (view all by) ::: June 11, 2009, 05:43 PM:

On the double posting issue, I was thinking about it. You'd want the preview page to come with an identifier, from a space large enough to prevent almost all collisions, (or an actual pre-post count of each comment previewed) and then have the posting engine check that something hasn't already come in with that identifier already.

Theoretically simple, but I've never worked with MT, and implementing it may be anywhere from easy to functionally impossible without re-writing the entire posting engine.

#95 ::: DavidS ::: (view all by) ::: June 11, 2009, 07:40 PM:

A very low priority bug that I noticed a while ago: Using Safari, turn on the "Larger type" or "Larger type with serifs" option. Then go to a specific comment. (For example, paste http://nielsenhayden.com/makinglight/archives/011348.html#347140 into your address bar or click on one of the "recently commented" links.) The comment in question will appear in your browser, in a small font. A fraction of a second later, the page will render in the large font, and the comment will no longer be centered in your scroll window (and quite probably, not visible.)

In Firefox, I don't get that effect. In case it matters, this is Safari v. 3.1.2.

#96 ::: DavidS ::: (view all by) ::: June 11, 2009, 07:45 PM:

On further experimentation, this doesn't happen if you are already viewing the thread on which that comment was left. (So my example was a bad one, since you are already on the same page.) Try http://nielsenhayden.com/makinglight/archives/011336.html#347151 to get the effect.

#97 ::: Jan Vaněk jr. ::: (view all by) ::: June 12, 2009, 08:54 AM:

Martin Sutherland #92: So THAT's why "Don't make me type this all again" didn't work for me! Progressive - I guess so, but ungracefully documented.

#98 ::: Jan Vaněk jr. ::: (view all by) ::: June 12, 2009, 08:58 AM:

BTW, sorry for bad linebreaking - some weird glitch on a library computer (Gnome with FF3!) I was using. Some Unicode NBSPs, or what? They display weirdly on VAB.

#99 ::: Raphael ::: (view all by) ::: June 12, 2009, 11:54 AM:

The "Making Lighter" link is very neat, but in contexts in which it is useful, it would be even more useful if there was a copy of it at the top of the left collum.

#100 ::: David Goldfarb ::: (view all by) ::: June 12, 2009, 01:37 PM:

DavidS@95-96: The same happens in Safari 4.0. (Just downloaded it recently, and I recommend it: I notice that most pages render noticeably faster.)

#101 ::: Lexica ::: (view all by) ::: June 12, 2009, 03:51 PM:

DavidS @95-96, David Goldfarb @ 100 -- It happens with Firefox, too. I see it both on the home computer (Mac) and the work computer (Windows XP Pro).

#102 ::: Arthur D. ::: (view all by) ::: June 13, 2009, 11:00 PM:

Much thanks for all the fixes and upgrades, and for documenting them here - it's always interesting to see how these things are done.

#103 ::: Patrick Nielsen Hayden ::: (view all by) ::: June 14, 2009, 12:18 PM:

DavidS #95 -- Yes, that's a drawback of our style-switcher in pretty much all browsers. It happens to me all the time, because I generally read the site in the largest-type style, but I have frequent reason to go to particular comments via their unique URLs. Like when Abi or Teresa or Jim is chatting with me in IM and wants to send me to a particular comment.

It's one of many things we'd like to improve once we get the site hauled into a more modern CMS.

#104 ::: David Goldfarb ::: (view all by) ::: June 15, 2009, 01:54 AM:

I still read with the text at smallest setting; my report was purely out of curious interest...of course, I'm only 41 and the presbyopia hasn't yet hit in earnest.

Choose:
Smaller type (our default)
Larger type
Even larger type, with serifs

Dire legal notice
Making Light copyright 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 by Patrick & Teresa Nielsen Hayden. All rights reserved.