Making Light: "Google, the stupidity amplifier"

August 24, 2012

“Google, the stupidity amplifier”
Posted by Patrick at 06:34 AM *

SF writer Greg Egan has been trying, since August 10, to point out to Google that, alongside a biographical squib about him, they’re serving a photograph of a different Gregory Egan altogether.

As of August 14 he managed to attract the attention of an actual human at Google, who fixed the problem.

As of August 24, the problem is unfixed. See for yourself.

The fact that Google’s algorithms make mistakes doesn’t bother me. The fact that it takes those of us outside Google so long and so much effort to get something like this fixed—and that the problem can then recur at random—should concern everyone. If Google wants to be accorded special status for its supposed mission to “organize the world’s information,” it needs to start demonstrating a sense of accountability to the rest of the human race. And by “some” I mean “an amount detectable with a microscope or Geiger counter,” because even that would be an improvement on the current state of affairs.

Comments on "Google, the stupidity amplifier":

#1 ::: Patrick Nielsen Hayden ::: (view all by) ::: August 24, 2012, 06:38 AM:

(I should say, by the way, thanks to Graham Sleight on Twitter for pointing this one out.)

#2 ::: Dave Bell ::: (view all by) ::: August 24, 2012, 06:57 AM:

Getting the wrong picture, that's a pretty ordinary mistake.

Having such a struggle to get it fixed, that's bad, though that did include a weekend. It seems to be the modern custom that 24/7 internet services shut down support over weekends. (Weekdays can also be bad, if you're in the wrong timezone.)

But a system that replaces the correct photo with an incorrect one, however that happens, is not laziness or cost-cutting. It looks rather like outright incompetence.

#3 ::: Jonathan Crowe ::: (view all by) ::: August 24, 2012, 07:25 AM:

I get the impression that algorithms are to Google what procedures are to every other bureaucracy in the world: a fuckup just means that they need better algorithms/procedures -- not, you know, human discretion and judgment.

If the U.S. Navy was designed by experts to be run by idiots, as the saying goes, then Google was designed by experts to be run by computers.

#4 ::: Alan ::: (view all by) ::: August 24, 2012, 07:39 AM:

@Dave Bell: there is no "correct" photo, since Greg Egan does not wish to post one on the internet.

That could be the root of the problem. Google have a mechanism for ranking photos, but there's no way to rank the "null photo", if you will. It may have been built on an assumption, that anyone in public life will have a public photo somewhere.

Unfortunately, Google don't seem to consider Tom Siddell notable enough to see whether they would respect the most obvious solution to this assumption.

#5 ::: John Mark Ockerbloom ::: (view all by) ::: August 24, 2012, 08:06 AM:

There are a couple of interesting aspects to this, reflecting some notable biases in the worldview of Google (and indeed, of a number of other information technology projects).

The first bias is that, in the usual data conventions of the Semantic Web, it's much easier to express positive statements in widely understood ways than it is to express negative ones. Statements in RDF, the standard Semantic Web form of expression, say things like "A property B", e.g. "Greg-Egan-SF-writer has-portrait-at some-URL". Moreover, the Semantic Web has an "open world" model, so the *lack* of an assertion at any one place does mean, or even suggest, that the assertion is false, even if a human would normally expect a particular assertion to be made at a particular place (like an author photo, if the author desired one, at an author's own web site). So any suggestion raised by the lack of such assertion can be easily overridden at any other site your Semantic Web reasoner pulls information from (like some other SF fan site, in this case).

You can say "there are no photos of Greg Egan on the Web" in RDF, but it's quite a bit more roundabout. You invoke a particular first-order-logic schema in RDF, and use it to say something like "There-exists-no some-URL such that Greg-Egan-SF-writer has-portrait-at some-URL". For a semantic reasoner to draw the right conclusions from this, it has to understand and interpret this more complex statement, *and* decide that this statement overrides any contradictory statement (such as the SF-fan site saying it *does* have a photo) found at any other site it pulls from.

And here's where we get to the second bias of Google: it assumes that information it wants to know about you should exist somewhere it can get to, and it's entitled to get it, or infer it, and link it up with other information it has about you, whether you like it or not. The notion that some people might want to keep some of their information private (like what they look like in a photograph) is one that Google pushes back against whenever it can.

Lots of things that Google does reflect this mindset. Like its refusal to let you use certain services (like Google+) unless you volunteer personal information like gender and exact birthdate that it doesn't actually need to know. Or the new "unified" privacy policy that doesn't let you opt out of Google's combining your personal information across all services Google owns. Or, as I've heard from a few people, all the various algorithms Google apparently uses to try to match you to a particular identity *even if you don't sign in or explicitly link multiple identities*. (Google's public interfaces don't display all it infers about you, but I've heard more than one person report that they capture *lots* of information on users and expend significant effort trying to correlate it.)

With these two biases in mind, Google's persistent linking of an incorrect photo with someone who doesn't want a photo of themselves online isn't exactly "stupid"; it's pragmatic and willful. All else being equal, Google sees more benefit in hoovering up information wherever it can get it, and occasionally making an unwarranted, or unwanted, inference about you, than it does in being more conservative and cautious about ensuring that what it knows sbout you is actually true, or something you don't mind it knowing or believing.

#6 ::: John Mark Ockerbloom has been gnomed ::: (view all by) ::: August 24, 2012, 08:09 AM:

I wonder if they like semantic scones.

#7 ::: Greg Egan ::: (view all by) ::: August 24, 2012, 08:15 AM:

Thanks very much for mentioning this, Patrick! And if anyone reading this has 10 seconds to spare to vote-down this misinformation, I'd be very grateful: do a Google search for my name, click on "Feedback" (at the bottom of the biographical sidebar), then click on "Wrong?" under the photo.

Just to clarify one point: what made the misattributed photo go away (for a while) wasn't action by any human at Google, but action by a Spanish SF blogger whose Picasa album contained the image file that an Italian SF blogger (linked to by Google in their mini-bio as the "source" of the image) had incorrectly posted as an image of me. When the Spanish guy kindly deleted the file completely (which he'd previously just unlinked from his blog, at my request) it finally disappeared from the Google mash-up.

But now the image of the same Professor Gregory K Egan being served up by Google is from the Monash University web site that is its legitimate home, so it's no longer a matter of Google being misled by the errors on some fan web sites. It's now making the same mistake all by itself. I guess that's what they call "machine learning".

BTW, I do have some "photos of the SF writer Greg Egan" here that show up in Google Image Search and may have reduced the problem a bit when humans are involved ... but alas, Google's algorithm won't take the bait.

#8 ::: John Mark Ockerbloom ::: (view all by) ::: August 24, 2012, 08:26 AM:

I said:

"...the *lack* of an assertion at any one place does mean, or even suggest..."

Ack-- somehow I missed a word omission when I read this over. I meant to say "does *not* mean, or even suggest". Sorry about the confusion.

#9 ::: P J Evans ::: (view all by) ::: August 24, 2012, 08:29 AM:

Google Books has short biographies of authors. The one for my aunt who wrote a book (about fish, for divers) is for a completely different woman with the same name (writing on aromatherapy).

#10 ::: Steve ::: (view all by) ::: August 24, 2012, 08:59 AM:

I don't want to annoy the gnomes by posting a URL (I'm guessing that's how that happens), but there's also at least one interview out there that's illustrated with a picture that Google's image match reckons is Vernor Vinge....

#11 ::: Mark C. Chu-Carroll ::: (view all by) ::: August 24, 2012, 08:59 AM:

As a former Google employee, let me fill in a bit of their side.

One of the big problems at Google is scalability. That is, the number of requests that the company receives each day is simply astonishing.

Every day, there are, quite literally, tens of millions of requests being received there. Most of those are basically spam - automated requests being generated by SEO scammers. But the scammers are clever - and they're constantly changing to get past any automated tests that Google sets up.

So after the best effort to eliminate the obvious spam, they're still left with millions of requests. Each of those takes a non-trivial amount of time to check - at a minimum, it takes something like 5 minutes to go through, verify that it's wrong, and figure out what to do to fix it.

In addition, a lot of the internal process around things is very ad-hoc over there. A lot of people have an image of the Google indexer as being one thing, but it isn't. It's dozens of different systems, doing overlapping analyses. So you fix something in one place, and think it's fixed... and it turns out that 7 different systems could create the same link, and you only fixed it in one. (Or in 6!)

#12 ::: David Harmon ::: (view all by) ::: August 24, 2012, 09:01 AM:

Jonathan Crowe #3: It's worse than that: The scale of what Google does is such that they can't manage it with human judgement -- even if they want to "override the system", they have to do that by tweaking filters, not so different from Jim with ML's spam filters. And they have no hope of getting enough staff to deal with all the things that should require human judgement.

#13 ::: Stephen ::: (view all by) ::: August 24, 2012, 10:09 AM:

@11: David, I call shenanigans. Google recently $2.9 BILLION in profit for a single quarter.

Are you seriously telling me that Google can't afford enough customer service agents to handle an override or filter? They seem to be able to remove infringing items from their search results OK!

#14 ::: Peter Hollo ::: (view all by) ::: August 24, 2012, 10:15 AM:

(I should say, by the way, thanks to Graham Sleight on Twitter for pointing this one out.)

Heh. I thought probably the appearance of this here, now, was due to my tweet earlier today - and it was, albeit via the redoutable mr Sleight.

Another place where Google falls down for the same reasons is Google Maps. It can very very difficult for businesses listed incorrectly on Google Maps to correct the errors, which will often be from some syndicated database with erronenous information - firstly, it's nearly impossible to talk to a real person at Google, and secondly, the wrong information just keeps on creeping back.

So Greg has my sympathies, although it must be acknowledged that his is a pretty unusual situation ;)

#15 ::: David Harmon ::: (view all by) ::: August 24, 2012, 10:21 AM:

Stephen #13: They seem to be able to remove infringing items from their search results OK!

At least for the moment... I'm reasonably sure that "customer service agents" have little power over the search engine which is Google's heart. If you convince them that you do have a plausible case, they refer the issue to a technician (by whatever name) who would do the actual blacklisting of a given URL or text. Those technicians, and the security systems needed to manage and monitor their work, don't come as cheap as "more customer service". And even billions of dollars can't buy you full control of a heavily-automated machine-learning engine which is continuously being fed by essentially the entire Internet.

#16 ::: Jim Macdonald ::: (view all by) ::: August 24, 2012, 10:23 AM:

Note: Mark's post at #11 was gnomed because it had the string S - E - O in it. Pretty much any time those three letters appear side-by-side in a comment, that comment is spam. (Many times the s-e-o monkeys helpfully label themselves.)

Note 2: Stephen at #13: That might be because of prioritization. If they don't remove infringing material they're breaking the law and face fines or jail time.

#17 ::: Greg Egan ::: (view all by) ::: August 24, 2012, 10:53 AM:

Jim at #16 wrote: That might be because of prioritization. If they don't remove infringing material they're breaking the law and face fines or jail time.

I'm looking forward to the litigation when they mash-up the biographical details of some rarely photographed businessman with an image of a convicted pedophile.

#18 ::: Patrick Nielsen Hayden ::: (view all by) ::: August 24, 2012, 10:54 AM:

All of those explanation are very interesting (by which I mean, genuinely, "very interesting") -- but none of it makes it okay to spread false information about people, particularly when you've been told it's false.

You know something, if I build a giant 150-foot-tall steel-and-titanium food-harvesting robot that periodically shits plutonium onto your house, you're very unlikely to be impressed by my explanations that it's a really tough technical challenge to make the robot stop intermittently shitting plutonium. Even if the robot's purpose is a wonderful, virtous, world-improving thing.

#19 ::: Anon4Now ::: (view all by) ::: August 24, 2012, 12:11 PM:

I've had to change some of my web habits for years because of the way google can confuse things. I'd rather not bother to make photos of myself prominent online. I'm a photographer, so I need to keep my work prominent. However, I happen to share my name with a child rapist on one of the sex offender lists (also with an individual whose only google significance is LARPing as a dark elf in blackface, which is bad, but doesn't compare). The result for his listing comes up towards the bottom of the first page when searching for my name. Ever since I discovered that, I have made sure there is a clear, prominent picture of me on any bio page just to make it clear that that is not me and hope that google does not otherwise make the confusion. Luckily google image searches for my name is still dominated by my work. The first couple pages are portraits I have done of other people, along with a few self-portraits. It takes some time before you get to the sex offender.

I have found that google seems to disproportionately value data from google+, so I filled out one of those profiles and seeded it with portfolio work, just to be sure it stayed high on google and has only the info I want on there. If I weren't trying to defend against possible confusion, I probably wouldn't bother with it at all.

So far, I've really just been competing for notoriety with other people sharing my name, but I do not look forward to having to sort out any confusion with google if someone, or some computer, makes makes a mistake.

#20 ::: D. Potter ::: (view all by) ::: August 24, 2012, 12:19 PM:

1. Jim MacDonald @16: Fortunately the capital of South Korea has not been much in the news lately, eh?
2. They could hire a thousand people to do nothing but fact-check. When data is your business, bad data is your embezzler.
3. It is now making me very happy that when you google me you get pages of doctors.

#21 ::: Doctor Science ::: (view all by) ::: August 24, 2012, 12:31 PM:

I am not seeing *any* image next to the biographical stub. I just double-checked, and it's the same in all browsers, and doesn't depend on where you link from.

In which case, resolving the problem took only two weeks, which strikes me as quite reasonable and even agile on Google's part.

A few years ago, I found that a certain public park in my area was mis-labelled on Google maps -- they pointed you to the elementary school field instead. I filed a map bug report and it took more than a month for them to correct it. But they *did* correct it.

It's partly a question of how long you think such corrections *should* take. For anything less than a legal or otherwise critical, time-sensitive situation, I wouldn't expect to see results in less than a month.

#22 ::: C. Wingate ::: (view all by) ::: August 24, 2012, 12:59 PM:

The issue behind this is that Google's image search isn't all that good on anything that isn't so common that it can find a couple of hundred good hits which drive off the thousand bad hits it also generates. If you search "Bodkin Island Lighthouse" or "Bodkin Point Lighthouse", both legitimate names of a long-destroyed structure in the Chesapeake, you'll get the few genuine pictures which can be found, but you'll also get pictures of most other lights in the bay, and pictures of houses, pictures of people, and in the latter search case, several pictures of classical Indian dance. If you search my name you'll get lots of pictures of Orde and a certain rapper who had trouble with the law (and if you take the main page, the latter is whose picture will show up) as well as a bunch of other people, but you won't get any of the few actual pictures of me out there.

#23 ::: Jim Macdonald ::: (view all by) ::: August 24, 2012, 01:22 PM:

#21 ::: Doctor Science I am not seeing *any* image next to the biographical stub.

I promise you that two hours ago it was there.

I suspect that there's a threshold number of "wrong picture" clicks that removes the photo, and the Making Light commentariat's clicks are what turned the tide?

#24 ::: Dave DuPlantis ::: (view all by) ::: August 24, 2012, 01:27 PM:

Peter Hollo @14: I had a frustrating experience trying to correct a Google Maps error myself. Two employers ago, we had (I think) a caterer show up to the wrong location because they used Maps and ended up two blocks west. I checked out the map, and sure enough, our "address" was located in the wrong place. (It was a really easy mistake to spot: the building is located at an angled intersection - NW-to-SE crossing W-to-E - and the place where the address was listed was at a standard intersection. The problem seemed to be that they had somehow managed to get blocks and blocks of a particular street off; imagine if 2000 were actually listed as 2200, 2100 as 2300, etc.)

So I filed the error with Google, and waited, and waited, and waited. The error slowly made its way through the steps on Google's checklist: phase 1 (we've received it, IIRC), phase 2 (we're looking to confirm it's an error), phase 3 (we acknowledge it's an error) ... but no phase 4 (we are fixing it).

Months passed. The report bounced from phase 3 back to phase 2. Nothing changed. Eventually, the company eliminated my position, and it was no longer my problem ... but for some time after that, the error remained unfixed. (It's since been corrected.)

While I appreciate the volume of information and corrections that Google processes, it's small consolation if things that are directly relevant to me are wrong and don't get fixed. And this was something that was relatively easy to confirm for users: our building had our name on the outside, and it was very clearly identifiable. There would be no confusing it for another building. (Not a standard office building, IOW.) Confirming, say, a person's identity or location could be much more difficult.

#25 ::: Elliott Mason ::: (view all by) ::: August 24, 2012, 01:56 PM:

Almost everywhere I've looked that isn't a central business district/downtown, Google Maps has the street addresses off by between two houses and three blocks.

Even in Chicago where we're a Cartesian plane with a known zero and known calibration-type arterial straight streets ... I imagine other towns would have it worse.

#26 ::: Chris W. ::: (view all by) ::: August 24, 2012, 02:02 PM:

A colleague had the same error in reverse. Until recently, if you searched for his name the wikipedia article for an olympic short-track speed skater with the same name came up with his head shot from our web site.

Even funnier was the fact that this was a case of Google outsmarting itself. My colleague is Chinese but has been living in the States for quite some time and uses the Western name order, while the skater uses the Chinese name order, so Google correctly identified the name in question as Chinese, and figured out that the two versions were equivalent, but failed to distinguish between a male, middle-aged, policy analyst and a 20-something female athlete.

And even weirder, the skater is the only person by that name to have a Wikipedia entry, but she fails to show up in the first page of Google results, which is entirely full of the bios and personal web pages of various researchers by that name.

#27 ::: Chris W. got gnomed ::: (view all by) ::: August 24, 2012, 02:04 PM:

I would offer to share the bulgogi I had for lunch, but alas, I ate it all.

#28 ::: Avram ::: (view all by) ::: August 24, 2012, 02:14 PM:

As of yesterday, I've switched to DuckDuckGo as my primary search engine. Remember when Google used to have that refreshingly lightweight interface that just returned a useful list of results, rather than a cluster of images of whatever the database was thinking while it processed your search request? DuckDuckGo is like that. I use Google only when DDG doesn't turn up what I'm looking for.

Which, sadly, is pretty often. Google's results are just plain better if I'm searching for anything difficult. But Google's result page, when it includes all that bullshit sidebar stuff, also chokes my computer with data URI images, leaving my browser locked up for annoyingly long periods.

#29 ::: C. Wingate ::: (view all by) ::: August 24, 2012, 02:22 PM:

Avram, as of two minutes ago I've switched too. The difference is simply astounding.

#30 ::: Graham Sleight ::: (view all by) ::: August 24, 2012, 02:50 PM:

Peter @14: yes, as I said in my tweet, you are indeed the fons et origo wossname on this.

For more than one reason, this debate - human intervention in a machine-dominated information world - makes me think of David Langford's "New Hope for the Dead".

#31 ::: Torrilin ::: (view all by) ::: August 24, 2012, 03:17 PM:

Almost everywhere I've looked that isn't a central business district/downtown, Google Maps has the street addresses off by between two houses and three blocks.

Right street, wrong block errors I don't mind so much. I can deal with walking a couple extra blocks, and I'm not biking and religiously following what Google maps tells me in real time anyway.

Misplaced streets... that is a problem. Google has misplaced Madison's State Street (yes, really), a couple major bike paths, and I forget what all else I've personally found while I've lived here. Plus I found about 6 months ago that Google had decided that Waltonville, PA existed, and was placed approximately on top of the house I grew up in in PA. This is um... really good historical research, since Waltonville, PA hasn't existed since around 1970 at the very latest, and more likely since around 1930. It took them months to correct. I will grant them that the Waltonville schoolhouse pump is in fact in the backyard of that house... So their location was pretty good. But there's zip code assigned to Waltonville, and mail addressed that way wouldn't get delivered, tho there are houses. I'm really not sure what caused their map engine to believe that Waltonville existed in the first place.

#32 ::: Mary Aileen ::: (view all by) ::: August 24, 2012, 03:55 PM:

Right street, wrong block errors drive me batty (not just Google Maps, I first encountered them on Mapquest). The worst was the time I was 45 minutes late for a work-related meeting ("fortunately" only missing the pre-meeting dinner hour) because Mapquest had the hotel just north of the highway, when it was actually just *south* of the highway. At least it was the right exit.

#33 ::: Rikibeth ::: (view all by) ::: August 24, 2012, 11:37 PM:

I would just like to know why Google Maps thinks that to get to Broadway addresses ib Somerville from the Alewife Brook Parkway, one should turn left on Mass. Ave, right on Winter Street, and then right on Broadway, when that puts you on Broadway in the town of Arlington, and it only becomes Somerville on the OTHER side of Alewife Brook Parkway, and right onto Broadway is NOT a prohibited turn. WTF.

#34 ::: TexAnne ::: (view all by) ::: August 25, 2012, 12:17 PM:

So if we switch to DDG instead of Google, how do we verb it? "I don't know, I'll have to duckduckgoose it"? "I duckduckwent that and here are the top 3 hits"?

#35 ::: Xopher HalfTongue ::: (view all by) ::: August 25, 2012, 12:35 PM:

Google Maps also doesn't understand subways. I can see why it thinks each one is a point location on a street, but not why it can't display that location before telling you "go south on Main Street" or whatever. South from WHERE?

In other words, I don't expect Google to tell me what exit to use (yet), but it could easily tell me where it's assuming the station is.

#36 ::: janetl ::: (view all by) ::: August 25, 2012, 01:38 PM:

I see the occasional error on Google Maps — just recently it wanted me to turn left where that wasn't allowed — but the website is right in my experience well over 90% of the time. That includes the Portland's Trimet bus system.

Thanks for the pointer to duckduckgo. It is a nice, clean interface, and I love that it's not tracking your searches, or filtering the results. It creeps me out that I don't see the same results in a google search that someone else does.

I just did a Google Image search for myself (after logging out of Gmail*), and was amused that a chocolate cake that I'd baked came up in the first page of results. It had been posted by the birthday boy.

*Why, yes. I don't like Google tracking my searches but I used Gmail. Convenience trumps privacy, once again.

#37 ::: Rob Rusick ::: (view all by) ::: August 25, 2012, 04:39 PM:

TexAnne @34: Dubledeegee? "I dubledeegeedit".

'Double' deliberately misspelled. Seems a little cumbersome still; maybe dubdeegee.

#38 ::: Kaleberg ::: (view all by) ::: August 25, 2012, 11:35 PM:

I don't understand what Google is doing wrong. It's a search engine, not a question answering site. It looks at nearby text and captions to index photos and videos. It's not trying to be Siri with a clever gag answer for every quip.

Years ago Yahoo tried to build a hierarchy of all websites, but a tree structured categorization was increasingly hard to maintain and pointless, aside from the issues of intentionally misleading categorization. When Altavista came out with text search, it was much better, though you often got serious bloopers. Google did search even better. When it gives wrong answer, it's because of what is on the web.

For example, there's a hamlet called Sappho around here, and there used to be a wooden statue of a woman in a long white dress at the main junction. We put up a photo of her and captioned it "The Sappho Maiden". All sorts of people snarfed the photo as a picture of Sappho, the poet. Sappho lived in ancient Greece and was noted for her use of language, deep feelings and her fondness for schoolgirls. There are no photos of Sappho.

Personally, I like the fact that when I google my name I get all sorts of alternate identities. For a while I was a dentist or a hockey player, which greatly enriched my fantasy life. Even neater would have been to discover I was actually a hockey playing dentist. By not actually trying to answer questions, Google makes our lives much richer.

#39 ::: Paul A. ::: (view all by) ::: August 26, 2012, 02:13 AM:

Kaleberg @ #38: I don't understand what Google is doing wrong. It's a search engine, not a question answering site.

Then perhaps what it's doing wrong is trying to be a question answering site.

There's nothing wrong with a search engine, asked to search for Greg Egan, producing a set of results that includes a biography of one Greg Egan and a photograph of a different Greg Egan. But that's not (or not just) what Google was doing. The Google search was also producing a sidebar which combined several results into, implicitly, an answer to the question "Who is Greg Egan?" - and getting the answer wrong.

#40 ::: Greg Egan ::: (view all by) ::: August 26, 2012, 02:31 AM:

Kelberg @38: Most people's annoyance with Google, from the evidence of this thread, is with their maps, which do claim to provide factual information, and often get it wrong.

Google is great when it doesn't try to be more than a search engine. My complaint was that they've started to kid themselves that they know more about the real world than they actually do. When they grab bits from several different web pages and collate them into a mash-up in a little sidebar, they are no longer just showing you search results. Instead of leaving you to follow up the links to those pages and making your own judgement about their reliability and relevance to your query, Google are making their own claim to have figured out the actual properties of the thing they think you're looking for.

By not actually trying to answer questions, Google makes our lives much richer.

Sure. The serendipitous results you can get from a text search are wonderful. But now Google are trying to answer questions. They should leave this to Wikipedia; they get the bulk of their information from that source anyway, and then just add their own layer of unchecked, uneditable noise.

#41 ::: David Harmon ::: (view all by) ::: August 26, 2012, 08:05 AM:

Kaleberg #38: I got kinda depressed when I googled myself and got someone working at the Smithsonian and publishing scholarly books in areas I'm actually interested in (eco-diversity). wife and family, too.

Paul A. #39: Well put. They've long since gone beyond "search", into actively trying to answer queries -- often at the expense of a more complete search.

#42 ::: C. Wingate ::: (view all by) ::: August 26, 2012, 09:06 AM:

re 38: The problem is that Google has turned itself into a search engine that tries to give you the answers you want to hear. Therefore, if for instance you do searches on political topics, and you tend to click on one side's sites more than the others, you're going to tend to get that side's sites. Have a look at DDG's explanation to see some examples of this in action.

#43 ::: Melissa Mead ::: (view all by) ::: August 26, 2012, 11:27 AM:

Could be worse. Googling my name brings up a porn actress.
Also a gynecologist and a librarian.

#44 ::: Lee ::: (view all by) ::: August 26, 2012, 01:02 PM:

I think I'm lucky. Googling my name brings up a shitload of links for a popular science writer, who is male. The first link that's definitely me is for my artist page on Spacetaker, and the first image that's related to me is one of the jewelry photos on that page. It should be noted that I have my LJ, DW, and FB accounts all set not to be searchable.

#45 ::: Jeremy Leader ::: (view all by) ::: August 26, 2012, 01:48 PM:

Melissa Mead @41: She sounds like one hell of a woman!

Oh, three different people? Never mind.

#46 ::: Andrew Plotkin ::: (view all by) ::: August 26, 2012, 02:23 PM:

Paul A @39: "Then perhaps what it's doing wrong is trying to be a question answering site"

I'd say that's what they're doing right. They're not doing it *perfectly*, of course.

I suspect that any serious "web search engine" has to turn itself into a question-answering engine early in its career, starting with the implicit question: "what information does this user really want?" A literal reading of the input line is *not* the best way to tackle that, statistically speaking. (Google does lots of statistical testing of query-response algorithms.)

Spelling-correction of search terms is the first obvious example of the difference. Here's another (more obscure) one that I learned recently: go to Google and type "mre uptl" (without the quotes). What do you get? Why? It looks silly but if their data indicates that most people who type "mre uptl" *mean* that, then it's not surprising that Google supplies those answers.

(I note that the literal reading "mre" gets used for the display of on-page ads -- they're all about Meals Ready to Eat. I have no idea why that part of the page thinks differently from the search-result page. Different departments, different algorithms, but there's no way to tell from the outside what the choices are that result in this disparity.)

As to the question of Greg Egan's face -- as I look, there's no photo on that result page. So something has been rejiggered. But I am in no way surprised that these things can change "at random". From Google's point of view, the world is a thunderous mass of data that's changing all the time. Putting a pin in a fact must always be a conditional, temporized decision.

Or maybe somebody screwed up a data transfer.

#47 ::: praisegod barebones ::: (view all by) ::: August 26, 2012, 03:27 PM:

Jeremy Leader @ 43

Sounds a bit like Gore Vidal Sassoon: War poet, novelist, revolutionary hairdresser.

#48 ::: Dave Harmon ::: (view all by) ::: August 26, 2012, 06:35 PM:

Andrew Plotkin #44 "mre uptl"

On Google I get the NYC public library, and a page dominated by New York, which name is bolded as a keyword.

Whereas DDG gets a literal match in at least the second place: From an italian article about steganography.

Mhich is more correct? Also: " if their data indicates that most people who type "mre uptl" *mean* that" is a trap and a lure to disaster. Every decade or so, somebody pulls the DWIM ("do what I mean") concept out of its padded cell, and it's usually a disaster, and when it isn't it's seriously flaky (q.v. AutoCowrecks).

#49 ::: Dave Harmon has been gnomed ::: (view all by) ::: August 26, 2012, 06:36 PM:

Join me at the Asian buffet?

#50 ::: Jeremy Leader ::: (view all by) ::: August 27, 2012, 12:54 AM:

Dave Harmon @47: remember that Google has a *huge* amount of feedback telling them whether they get it right or wrong. Every time someone clicks on one search result, and doesn't click on another, that tells Google a tiny bit about what they're looking for. If Google shows mostly New York results when you ask for "mre uptl", it's probably because that's the main thing people have clicked on when they searched for that string in the past.

Google's not trying to answer questions, they're not trying to perform searches. They're trying to give people what they're looking for, so they don't switch to a competitor. And, of course, hoping that sometimes people click an ad instead.

#51 ::: Andrew Plotkin ::: (view all by) ::: August 27, 2012, 02:14 AM:

Dave Harmon @ 48: "Mhich is more correct?"

(It's more artistic if you misspelled that on purpose...)

I'll lay excellent odds that "New York" is more correct, for the majority of people who have ever typed "mre uptl" at Google's prompt.

Riddle for the crowd: why? What's going on here? There's a clear answer. (I didn't figure it out as a riddle, though; someone pointed out the trick to me.)

What Jeremy Leader said. Google's entire business plan is shaving margins off the error in their "do what I mean" schema, and they've gotten good at it.

(Remember, of course, that you can direct Google to skip the spelling-and-synonym correction by using the double quotes. If you do that, Google finds many literal usages, including your example.)

#52 ::: David Goldfarb ::: (view all by) ::: August 27, 2012, 03:52 AM:

Because if you type "new york" with your hands shifted over one letter on the keyboard, "mre uptl" is what comes out. I actually spent a little time figuring that out, and then you went and asked it as a (sort of) riddle.

And actually that leaves me quite impressed with Google that it's managed to learn that.

I note that Google does have some sort of question-answering going on, since relatively recently: if you type in a query in the form of a factual question, you'll get its best guess at an answer at the top in large bold type. Frex: "when was pride and prejudice published" "January 28, 1813".

#53 ::: Elliott Mason ::: (view all by) ::: August 27, 2012, 05:00 AM:

What annoys me is that sometime in the past three years, Google's search box started ignoring doublequotes and explicit booleans (classic examples of the latter being 'monty -python' or 'python -monty -language') if they ruddy well feel like it.

If I put a string INSIDE QUOTES, I want it to be treated literally. Instead, Google often decides I didn't MEAN that 'the' in the middle, or whatever it is, and gives me results that are significantly nonuseful. Especially if it explicitly includes things I put with a boolean NOT in front of them (-).

I want to opt out of the cutesy smart-search stuff when I know the search I'm going to do is complicated. There is no option for that.

#54 ::: Pendrift ::: (view all by) ::: August 27, 2012, 07:13 AM:

Elliott @53: Patrick sidelighted Google search tips a few months ago; they've helped my google-fu immensely.

#55 ::: P J Evans ::: (view all by) ::: August 27, 2012, 08:32 AM:

53
Elliott, I know exactly what you mean. We learned how to make our searches work, and then they changed the rules.

#56 ::: Mary Aileen ::: (view all by) ::: August 27, 2012, 10:15 AM:

Elliott Mason (53): The 'Verbatim' option under 'Search tools' on the left of the results pages (sometimes helpfully hidden another layer down under 'More search tools') will force Google to search exactly what you typed. That's been a life-saver for me.

#57 ::: Anon4Now ::: (view all by) ::: August 28, 2012, 08:46 AM:

Note: #19 wasn't actually me. It was someone else, posting from a device that had for some reason decided to autofill the Anon4Now name and address as its default for Making Light. (I've posted under this nym from that device, but have since used it to post under my regular non-anonymous nym, so I don't know why it chose to autofill Anon4Now.) Just for disambiguation purposes.

#58 ::: Caroline ::: (view all by) ::: August 28, 2012, 09:12 AM:

Elliott Mason @ 53: It is super annoying, isn't it?

They do still have the Advanced Search page, which will take your inclusion/exclusion requests literally, but they've made it extremely hard to find. In fact, I only found it by googling for "advanced search" and finding the help page, which linked to it.

I think I'll bookmark it now, in fact.

I remember search before Google. You had to put in a lot of effort to work around the spam and useless crap to find anything useful. Google changed everything with an algorithm that let the computer actually work for you. But now it seems that using Google is heading back towards trying to work around the algorithms to get useful results.

#59 ::: lorax ::: (view all by) ::: August 28, 2012, 01:04 PM:

And actually that leaves me quite impressed with Google that it's managed to learn that.

I suspect that's not actually all that hard - if a very large percentage of searches for "mre uptl" were followed not by clicking on one of the results, but instead by a second search for "new york", that's a pretty obvious indication.

Raw search data - what people actually type - is really, really messy. I've seen it. (Yahoo's, not Google's.) Results would be far worse if Google required literal, character-for-character matches for everything typed, and couldn't learn that a good result for "new york city" was also a good result for "nwe york city" (though it should continue to provide a link to literal results for the latter for people who actually do want them.)

#60 ::: Andrew Plotkin ::: (view all by) ::: August 28, 2012, 03:50 PM:

@53: "What annoys me is that sometime in the past three years, Google's search box started ignoring doublequotes and explicit booleans (classic examples of the latter being 'monty -python' or 'python -monty -language') if they ruddy well feel like it."

Apparently they never feel like it with me.

Not trying to be snarky; I bet there is some difference between our cases, but I have no way to diagnose it.

(I wipe my cookies very frequently, which is good for getting me consistent Google behavior -- they don't have any search preferences or personalization associated with my browser. This does mean, however, that I'm not using the "verbatim" preference mentioned upthread. I still get the double quotes and minus signs working for me.)

(* Ads are different. I'm not talking about ads here.)

#61 ::: Mary Buck ::: (view all by) ::: August 28, 2012, 09:31 PM:

Egan #40: "But now Google are trying to answer questions. They should leave this to Wikipedia..."

If Google and Wikipedia were so hot at finding answers, than I, as a public librarian, would be bored and lonely. Instead, I'm seeing an increase of users who need help wading through the tsunami of information provided by Google and Wikipedia.

Viva Librarians!

#62 ::: individ-ewe-al ::: (view all by) ::: August 29, 2012, 05:41 AM:

The verb for 'to use DuckDuckGo to perform a search' is quack.

If the data quality of Google maps is too poor for your purposes, the map equivalent of DuckDuckGo is OpenStreetMap. And whaddya know, it's a Wiki so if you find errors in it you can fix them yourself, no agonizing customer service process.

I don't have any connection with either of these services, I'm just sharing the fruits of my experience since the Google+ fiasco drove me away from Google. (I don't use Gmail any more either, but the best alternative I've found is less good and costs a small sum of money.)

#63 ::: Nancy Lebovitz ::: (view all by) ::: August 29, 2012, 06:27 AM:

Panix is cheaper than zoho, and I've been happy with getting my email through them, but I don't know whether they offer all the features you want.

#64 ::: Jeremy Leader ::: (view all by) ::: September 01, 2012, 02:25 AM:

individ-ewe-al @62: that's an excellent coinage! I think you've motivated me to set DDG as my default search engine, just to be able to tell people "I quacked that, and...".

#65 ::: NelC ::: (view all by) ::: September 02, 2012, 11:05 PM:

I'm amused by finding that the second picture that comes up when I google my name is of China Miéville. Justifiable I guess since it's a picture I took of him, and used under CC licence by a website for an interview they did with him. I'm enormously tempted to use it in my profile on a dating site.

#66 ::: Kevin Marks ::: (view all by) ::: September 06, 2012, 09:16 PM:

The bulk of what comes up in that right sidebox on Google is from Wikipedia; if Greg wanted a photo of him to appear, uploading one to wikipedia commons and adding it to the entry would be the way to convince Google. Wanting nothing to appear is indeed harder.

Thing is, the right answer for [Greg Egan] or [Melissa Mead] or [Kevin Marks] on google does depend on who is asking. Thanks to my profligate blogging and tweeting, all the other Kevin Marks's get crowded out of the results on google.
I was involved in starting the google profiles project when I was there, which was supposed to let their friends find them instead, among other issues.

There is a way to link different instances of yourself together on the web easily, which is the rel='me' microformat. Google will follow chains of rel='me' links to cluster individuals (if you make them bidirectional, it guards against impersonation to some extent, if you have a known good root). If one of these chains passes through a Google+ profile, it will show your photo next to results you authored. Frankly, this isn't good enough - it privileges their own site over others on the web (there is a widely adopted way to mark up a visiting card with name, image, contact etc in the hCard microformat, which Google indexes but doesn't display in search results.

We have had a discussion about indicating non-identity (I like rel='notme'), but that isn't really deployed yet.

Back to previous post: Romney lies about tithing

Go to Making Light's front page.

Forward to next post: Some reasons I read fanfic

Subscribe (via RSS) to this post's comment thread. (What does this mean? Here's a quick introduction.)

Choose:
Smaller type (our default)
Larger type
Even larger type, with serifs

Dire legal notice