Making Light: The Implications are Staggering

August 7, 2013

The Implications are Staggering
Posted by Jim Macdonald at 01:11 AM * 63 comments

Apparently some models of Xerox photocopiers are substituting one number for another in photocopied documents. This isn’t an OCR error, and it isn’t just blurred pixels; these are whole different numbers being printed in apparently true copies. Confused Xerox copiers rewrite documents, expert finds

[German computer scientist David Kriesel] said the anomaly is caused by Jbig2, an image compression standard.
Image compression is typically used in scanners and copiers to make file sizes of scans smaller.
Jbig2 would substitute figures it thought were the same, meaning similar numbers were being wrongly swapped.

The results are duplicatable, and have been found in at least two models of Xerox machines, both with original and recently patched software installed. Photocopied invoices, part numbers, engineering tables, and medical information could be just plain wrong, even if the document that was being copied was 100% correct.

Mr. Kriesel presents his findings, complete with examples, here: Xerox scanners/photocopiers randomly alter numbers in scanned documents

In this article I present in which way scanners / copiers of the Xerox WorkCentre Line randomly alter written numbers in pages that are scanned. This is not an OCR problem (as we switched off OCR on purpose), it is a lot worse - patches of the pixel data are randomly replaced in a very subtle and dangerous way: The scanned images look correct at first glance, even though numbers may actually be incorrect.

Xerox, according to the BBC, is “preparing a statement.” I’m sure it will be very interesting.

Comments on The Implications are Staggering:

#1 ::: Edd Vick ::: (view all by) ::: August 07, 2013, 01:29 AM:

I for one welcome our duplicating overlords.

While fearing for my life the next time I get a prescription...

#2 ::: Miramon ::: (view all by) ::: August 07, 2013, 01:32 AM:

It's a hardware bug!
It's a software bug!
It's two, two, two bugs in one!

#3 ::: Lawrence ::: (view all by) ::: August 07, 2013, 01:44 AM:

Well, damn, that really screws up one of the tests of parallel-world travel my characters used in "The Drifter."

#4 ::: DriveBy ::: (view all by) ::: August 07, 2013, 01:58 AM:

It's an image compression error.

Data compression algorithms can work, in part, by recognizing repetition in the data being compressed and storing only one copy of the repeated data. Where the input data is grainy, as is often the case with scanned print, "repetition" becomes a matter of judgment; insisting on pixel-perfect matches as the determinant of repetition would result in no matches, and no compression (at least by that part of the algorithm). So, in practice, the machine applies fudge factors to determine whether this grainy part of the image over here is close enough to that grainy part of the image over there to be called a duplicate.

When the fudge factors are set too liberally, similar-looking grainy parts can be mistaken for duplicates. When an erroneous match is made, one or the other of the grainy parts will be assigned as the master part and will be plugged into both places. This can happen to image parts containing text, especially if there is a common visual theme decorating the text. Apparently, the Xerox machines in question are susceptible to this problem primarily when the text in question contains numbers, but not when the text is alphabetical.

There is an update on the situation which includes words like JBIG2, and gets into the specifics of settings on Xerox machines. There are fingers yet to be pointed, and still some question about who knew what when and why certain people didn't understand sooner what was happening. Lawyers may at some point become involved. But at the lowest technical level, it's an image compression issue.

#5 ::: Laura ::: (view all by) ::: August 07, 2013, 02:02 AM:

Add in that many Xerox machines have fax capability, and all now have a boatload of memory, and I see even more problems - industrial espionage by chipset in trusted hardware.

#6 ::: Bill Stewart ::: (view all by) ::: August 07, 2013, 02:19 AM:

I'm sure their statement will be interesting, but I'm not going to believe any of the numbers in it...

#7 ::: Bruce Cohen (Speaker to Managers) ::: (view all by) ::: August 07, 2013, 02:49 AM:

Architects, contractors, airplane designers, and artillery units had all better be checking what model copier they're using, or things may start falling down, crashing, or blowing up unexpectedly.

#8 ::: dcb ::: (view all by) ::: August 07, 2013, 04:19 AM:

My local Metro (free paper associated with trains, Tube etc.) says that Xerox's "principal engineer, Francis Tse" is saying that the problem can be combatted by copying at higher resolution - which would make sense from what DriveBy @4 says.

In note that the large copier at work which I use occasionally to scan e.g. a book chapter is set at 200 x 200 as standard and I have to keep resetting it to 300 x 300 (for each document, if I'm copying several, which is a pain, 'cos it resets to the lower res automatically at the end of each document). I'm sure the lower standard setting is to save memory, but it makes for lousy copies, particularly if, as I tend to do, you print two pages to a side* (and double sided of course) to save paper.

*Note: two pages to a side works better with UK/European paper sizes, A4 etc. than with American paper sizes.

#9 ::: Rob Rusick ::: (view all by) ::: August 07, 2013, 06:49 AM:

If I understood Mr. Kriesel's article, this is an issue when one uses a 'scan to PDF' feature of the copier, where the alterations are made in the saved PDF files.

I didn't see a claim that standard copying was affected.

#10 ::: Daniel Martin ::: (view all by) ::: August 07, 2013, 07:03 AM:

Specifically, from reading the updates it's a problem of using the quality setting "normal" when scanning to PDF. The three possible settings for quality are "normal", "higher", and "high"; the factory default for quality isn't the "normal" setting but because it doesn't warn "normal activates patch-based image compression (JBIG2), which will corrupt text" in big red letters, apparently many penny-wise users will change the default to "normal" to store their documents in less memory.

Memory/storage space is cheap. Efforts to conserve it almost always come around to bite you in the end.

#11 ::: Jim Macdonald ::: (view all by) ::: August 07, 2013, 07:31 AM:

Another problem of photocopiers which store documents to disk in the process of making copies is that, when you eventually discard/resell/whatever the machine, copies of all documents you ever copied, however confidential, may still be on it and floating around out there somewhere.

See: Copier Data Security: A Guide for Businesses

#12 ::: Nangleator ::: (view all by) ::: August 07, 2013, 09:22 AM:

I'm sure only copiers can make electronic mistakes. I'm sure any other conceivable electronic glitches are completely unimportant. Even multiplied by, say, 314 million. The population of the United States.

Insignificant. Even if then multiplied by the number of phone calls and emails U.S. citizens make every day, all year long.

We can rely on our data, even in a court of law. Even secret courts of law.

#13 ::: Remus Shepherd ::: (view all by) ::: August 07, 2013, 09:32 AM:

The big problem with this bug is that the 'scan to PDF' feature is intended for users who want a paperless office, which means they shred the original paper documents once they're done scanning them.

It would be interesting if civilization were to collapse over a software bug as trivial as this.

#14 ::: P J Evans ::: (view all by) ::: August 07, 2013, 09:53 AM:

The place I worked switched to Ricohs a few years back. In some ways, not an improvement over Xerox, and we swore a lot at them. (We were printing graphics, like PDFs reduced from 24x36 to 11x17. The old Xerox copies were readable.)

#15 ::: Fragano Ledgister ::: (view all by) ::: August 07, 2013, 11:05 AM:

Creative copying?

#16 ::: Stan ::: (view all by) ::: August 07, 2013, 12:23 PM:

A knowledgable MeFi user commented on this problem:
http://www.metafilter.com/130641/Cat-images-reportedly-unaffected#5125209

#17 ::: Tom Whitmore ::: (view all by) ::: August 07, 2013, 12:59 PM:

The original photocopiers worked by making a direct master, reversed, of the original document on a photosensitive drum. Now, most copiers work by scanning the image and then printing it out. I wonder, what other sorts of errors are introduced by this change?

#18 ::: Bill Higgins-- Beam Jockey has been gnomed ::: (view all by) ::: August 07, 2013, 02:40 PM:

Bruce Cohen writes in #7:

Architects, contractors, airplane designers, and artillery units had all better be checking what model copier they're using, or things may start falling down, crashing, or blowing up unexpectedly.

I don't believe you'll find many artillery units using Xerox copiers. They tend to favor Canon.

#19 ::: Jim Macdonald ::: (view all by) ::: August 07, 2013, 03:02 PM:

Bill Higgins #18 --

Can't find your gnomed post. Sorry about that.

-- JDM

#20 ::: Fragano Ledgister ::: (view all by) ::: August 07, 2013, 03:04 PM:

Bill Higgins #18: That's Serge-level punditry.

#21 ::: Ken Fletcher ::: (view all by) ::: August 07, 2013, 03:45 PM:

"It's not a bug; it's a feature!"

A useful optional setting to discourage clandestine copying of documents thick with numbers. Could maybe even track the individual copy machine, and when the copies were made.

#22 ::: David Goldfarb ::: (view all by) ::: August 07, 2013, 04:33 PM:

dcb @8: Moving between any two standard paper sizes is easier with European A/B series than with the stupid American letter/ledger/architectural etc. non-system. I worked in a copy shop for two decades, and people were constantly wanting documents designed for letter size blown up to ledger or 24x36, or something larger reduced to letter, and it was always a pain because the aspect ratio was different. Copy shop clerks in Europe have it so much easier: A4 to A3 is 141%, A4 to A5 is 77%, drop it on the glass and go. Sigh.

#23 ::: albatross ::: (view all by) ::: August 07, 2013, 04:35 PM:

The reason this seems like a big deal to me is that I am extremely skeptical that Xerox is the only copier/scanner on which this is a problem. We have spent some years now trying to get rid of paper in as many places as possible, ranging from voting to e-receipts to e-prescriptions to MERS (electronic mortgage records). And in all cases, there were cost and hassle savings up front. But part of the cost is that the underlying paper documents go away, and we can easily end up with *nothing but* electronic records. If those records are wrong--via glitch or misentry or tampering--there is nothing to check them against.

Our electronic technology is not reliable or secure enough to support this. With a fallback paper record, you have something that neither a glitch nor an attacker can change, which can be used to decide what should be in the electronic records. Without that, you trust the electronic records whether they deserve it or not.

#24 ::: Nancy Lebovitz ::: (view all by) ::: August 07, 2013, 04:47 PM:

The reason this seems like a big deal to me is that even if it's "just" two Xerox models (how long has this been going on?), it's probably enough to wreck a lot of subtly caused havoc. It wouldn't surprise me if it's enough to get people killed.

#25 ::: eric ::: (view all by) ::: August 07, 2013, 05:33 PM:

I predict that, no matter the real cause, the users will be blamed.
Where really, it's negligent use of a compression algorithm.

#26 ::: P J Evans ::: (view all by) ::: August 07, 2013, 05:45 PM:

23
The company I worked at was scanning (at a high resolution) all their old maps, and at a more reasonable resolutions all their other construction documents. Because they have to be able to track what was done until it's removed permanently, possibly 80 to 90 years.

#27 ::: Matthew Brown ::: (view all by) ::: August 07, 2013, 06:35 PM:

Eric@25:

Yes, I agree. This error falls outside of user expectations of a scanner or copier. Expected failure modes include unreadable scans. Readable-but-wrong scans fall outside of that.

I haven't seen anywhere mention if this is the normally configured compression mode of these devices, or whether it's an option the user has to set. Regardless, though, this is not a user error, since it falls outside of reasonable expectations. It'll just influence how many people have been bitten by this error.

Is it possible to determine if your scanned PDFs from one of these devices were scanned with the problematic compression option? If not, nobody who's got documents scanned with these can trust them.

Otherwise, it'll be hard to know if one is among the affected users and thus one will generally have to assume your documents might be wrong.

#28 ::: Tom Whitmore ::: (view all by) ::: August 07, 2013, 06:59 PM:

It's not just scanned PDFs, Matthew Brown -- it's simple photocopies. Things that look like they're a printed picture of the original, only with these anomalies. The copier makes a digital file, then prints that file, rather than simply taking a picture and printing that -- and this error comes in when it makes the digital file.

#29 ::: Matthew Brown ::: (view all by) ::: August 07, 2013, 07:19 PM:

At least one of the linked articles I read mentioned that it's only the PDFs that are affected: is that inaccurate?

#30 ::: Heather Rose Jones ::: (view all by) ::: August 07, 2013, 09:32 PM:

In my mind I am running through all the sorts of documents scanned at my Place of Employment where the scan/copy is treated as equivalent to the original document. Documents like QC test results or Certificates of Analysis shipped with our products as proof that they meet specifications. Lots of numerals in pharmaceutical manufacturing documents. I don't believe there are any points where disposition decisions are made based on a scan/copy rather than an original document (or more often on purely electronic data). But there are certainly points where scanned/copied documents are used to document those decisions, which could present the appearance of non-conformance.

#31 ::: lorax ::: (view all by) ::: August 07, 2013, 09:37 PM:

As I understand the nature of the issue from reading the linked article, there is no a priori reason why this shouldn't affect alphabetical characters as well as numerical ones in similar circumstances (different isolated blocks of characters occurring in different locations in a primarily non-textual document). Things like names, for instance.

#32 ::: Doug Burbidge ::: (view all by) ::: August 08, 2013, 12:12 AM:

David Goldfarb @22:

> A4 to A5 is 70.7%

FTFY.

Going up a size, of course, involves multiplying by √2 (i.e. 141%); going down a size involves dividing by √2 (i.e. multiplying by 70.7%).

Another annoying property of US paper is that there are several different weight scales, all of which are abbreviated to "pounds": bond, index, cover, etc. The rest of the world has just one scale: gsm, or grams per square metre -- a calculation made simple when you know that an A0 sheet is one square metre, or 16 A4 sheets is one square metre.

#33 ::: Doug Burbidge ::: (view all by) ::: August 08, 2013, 12:15 AM:

Mr. Kriesel's sample scans for room dimensions show the error in a rectangular block, containing the dimension number inside the block. The JBIG2 encoder inside the Xerox scanner has decided that this rectangular block is a single glyph, and has (incorrectly) replaced the whole glyph.

Further down, he shows scans where '6' has been substituted with '8'.

Wikipedia says that JBIG2 can use either of two methods for encoding data that it thinks is text: pattern matching and substitution, or soft pattern matching. Re the first method, it says "substitution errors could be made during the process if the image resolution is low." The second method stores the differences, so nominally even if it guessed the wrong glyph, the JBIG2 viewer used to view the encoded file would start from the wrong glyph and apply the differences, thus producing something with the correct appearance, which is all you need.

You sometimes see this when copy-pasting text from a JBIG2-encoded PDF: the text looks roughly correct in the PDF, but when you paste into Word or whatever, you see 'I' substituted for '1', or 'O' for '0' or whatever. (This could also be a problem, but not as insidious as the one Mr. Kriesel has identified.)

#34 ::: dcb ::: (view all by) ::: August 08, 2013, 04:10 AM:

David Goldfarb @22: Yeah. I really only realised the US problem when I was visiting a specialist library and suggested to another person there that they could reduce-copy to give two pages to a side - which, as you know, is so easy with A4 etc. I was flabbergasted when I realised that two American letter size sheets didn't easily reduce-copy-70% (or 71%, since as Doug Burbidge says @32 it should be 70.7%) onto one sheet of American letter. Whoever designed the "A" system deservs a prize - and we just take the advantages for granted.

#35 ::: Lila ::: (view all by) ::: August 08, 2013, 07:47 AM:

Heather Rose Jones @#30, that was the first thing that leapt to mind about EMR (electronic medical records)--that the PDF file would make it look like someone had gotten the wrong dosage of meds, or too strenuous an exercise prescription (wrist curls with 8# 2 weeks after surgery?), and hello malpractice suit.

Scanning paper documents and storing them as PDFs is pretty common practice for EMR, at least in the small clinics I'm familiar with.

#36 ::: Dave Bell ::: (view all by) ::: August 08, 2013, 08:24 AM:

Lila @35

I read the original report, and what struck me was that the examples they showed were of small print, only just large enough to resolve some of the distances between a 6 and an 8. It was the sort of thing you could get with a 9-pin dot-matrix printer, a huge relative pixel size. It was marginally readable even without the errors.

If critical records are in print that small, it maybe isn't the Xerox machine that would be your problem.

#37 ::: oldster ::: (view all by) ::: August 08, 2013, 09:51 AM:

dcb @34:

"Whoever designed the "A" system deserves a prize - and we just take the advantages for granted."

That would be Georg Christoph Lichtenberg for the really cool part of the insight (i.e. seeing the role of root-2), and Walter Porstmann for specifying it to the metric system and making it a standard. (From the Wiki page for "paper size".)

Probably no prizes will be forthcoming, but some people refer to the aspect ratio as the "Lichtenberg ratio." Having your name used as a nickname for root-2 is pretty cool, and better than most prizes. I mean, what prize could I possibly prefer to having some fundamental constant referred to as "the oldster number"?

Pie?

#38 ::: Bob Webber ::: (view all by) ::: August 08, 2013, 10:13 AM:

Xerox Workcenter like Bank of America: JBIG2 FAIL.

#39 ::: Clifton ::: (view all by) ::: August 08, 2013, 12:46 PM:

Forwarded the original link to my wife yesterday, because I knew that she and her workplace have spent part of the past year scanning documents on a Xerox WorkCenter for permanent storage in PDF form. Uh-oh....

The kind of work she does could be (and sometimes is) the subject of lawsuits and/or "administrative hearings" though it's not the kind that usually depends on the values of a few digits.

Meanwhile on Mefi, a user comments that they've seen this problem in the form of inappropriate keming, and had been wondering what caused that - so it's not just numbers, even if numbers seem to be particularly vulnerable .

#40 ::: SarahS ::: (view all by) ::: August 08, 2013, 02:56 PM:

Did anyone else immediately think, "Oh my lord...royalty statements!"

#41 ::: David Goldfarb ::: (view all by) ::: August 08, 2013, 05:16 PM:

Doug Burbidge @32: Thanks for the correction.

Half letter size is actually close enough in aspect ratio to letter that reducing two pages onto one isn't that hard; usually you just have to clip a little bit of whitespace around the edge. The other stuff I mentioned is much more annoying.

#42 ::: Bill Higgins-- Beam Jockey ::: (view all by) ::: August 08, 2013, 06:35 PM:

Oldster in #37:

That Lichtenberg guy must have been a million laughs. He was the Jon Singer of his day.

#43 ::: Jacque ::: (view all by) ::: August 08, 2013, 06:41 PM:

This is fascinating, as I am, at this very moment, proofing a scanned document. Maybe I'm just behind the times, but I'm still boggled that the OCR tech is as good as it is. And, to be fair, the original document has ^{eentsy weentsy teeny tiny} type, printed on that gray, speckled "recycled" paper that was all the rage back in the '90s, so that the speckle in the paper is less than an order of magnitude smaller than the features in the font. So I am seeing quite a bit of...interpretation. Unsurprisingly, commas become semicolons and so on. Interestingly, less in the numerals than in the letters.

For example:

Home > Honie
Petroleum > Petroi"Lim
PROPERTIES > Pl'loPERTIES, or l'flOPER:ms
Equipment > Equip,;ent, or Eqilipnieht (the latter is obviously the German spelling)

(I find this perversely amusing. If you cock your head and squint just so, you can kinda see how it got there.)

But I do see some numeracy fails:

222,195,990 > 222, 195,99\)_
6,775,900 > 6,775,90P
57,147,790 > 5}; 147:790
71,242 > 7i ,242

Interestingly, the errors actually look correct on the PDF, as compared to the paper. It's only when the text is copied out and pasted into another program that the parsing errors show up.

I haven't yet caught it out swapping one number for another. But I haven't actually proofed my processed content against the actual paper yet, either.

But I would never, in a million years, even think about putting out the processed text as "correct," without proofing it first.

#44 ::: Jacque ::: (view all by) ::: August 08, 2013, 07:11 PM:

Okay, then. Just finished proofing, and I did, indeed, find three instances where it changed the numbers. (One of them where it saw 13 and read it as 8.)

#45 ::: P J Evans ::: (view all by) ::: August 08, 2013, 08:13 PM:

OCR translations of text can be almost as fun as running things through Babelfish a few times.

#46 ::: Cally Soukup ::: (view all by) ::: August 08, 2013, 08:26 PM:

Proofing OCRed text for Distributed Proofreaders for Project Gutenberg has made me smile every time I see the word "arid". Because, you see, in an OCRed text, ninety-nine times or more out of a hundred the word is supposed to be "and". Also: watch out for he/be errors. That's another very, very common swap, in both directions.

#47 ::: P J Evans ::: (view all by) ::: August 08, 2013, 08:33 PM:

I spent a few months proofing OCRd text. My favorites were when the software scrambled 'District' to 'Omelet' and turned 'legal obligation' to 'lethal obligation' (well, it was about DC bonds).

#48 ::: Erik Nelson ::: (view all by) ::: August 08, 2013, 08:46 PM:

This is what I deal with every day on the job.

#49 ::: albatross ::: (view all by) ::: August 09, 2013, 11:03 AM:

eric:

The users will be blamed, but shipping a product with a normal setting that can cause this kind of damage is an amazingly bad idea. And I'll bet that there are other copiers and scanners with the same problems.

#50 ::: Alan Hamilton ::: (view all by) ::: August 09, 2013, 12:01 PM:

This is inherent to anything using JBIG2 compression, so yeah, it could affect other documents. I've seen the same issue on scanned documents converted to PDFs on a PC. For example, this NTSB report. There are a lot of typos and weird font substitutions. This was converted using PDFWriter 3 in 2000, so this is nothing new.

#51 ::: Lin Daniel ::: (view all by) ::: August 10, 2013, 07:28 PM:

Speech to text software does some amusing stuff, too. jan finder would send me email with uncorrected speech to text. There were times I'd have to read it out loud, with his inflections, to figure out what it was he was saying. And once, sent it back saying, "I love it when you talk dirty to me" because it was totally unintelligible.

#52 ::: Kevin Reid ::: (view all by) ::: August 11, 2013, 01:09 PM:

Interestingly, the errors actually look correct on the PDF, as compared to the paper. It's only when the text is copied out and pasted into another program that the parsing errors show up.

An OCR'd PDF simply has the text computed by the OCR algorithm placed behind a copy of the original image, so that it is invisible (but still possible to select). When you select some text, you're actually getting the hidden character text, but it happens to line up with the part of the original image.

(Similar scenario that was a topic a while ago: failed redactions of PDFs, where black boxes are just added on top, but the original text is unaltered.)

#53 ::: oldster ::: (view all by) ::: August 11, 2013, 02:49 PM:

Relax, people!

It's all cool. Xerox knew about this all along. In fact, they warned you about it, in their documentation.

From a user-manual:
“Normal/Small produces small files by using advanced compression techniques. Image quality is acceptable but some quality degradation and *character substitution errors may occur* with some originals.” [asterisks added].

From the IBM spox:
"You are also correct that we have documented in our user guides as well as within our devices that the high compression mode may cause character substitution which means we have known about the potential for this issue. Our design philosophy was to make available a very useful mode that creates small files while at the same time providing information about its limitations."

Both of these quotes are from their blog:

http://realbusinessatxerox.blogs.xerox.com/2013/08/07/update-on-scanning-issue-software-patches-to-come/#.UgfbRmTF2RY

where they also say that they are working on a software patch for this.

I give them some credit for being very honest on the blog. Of course, Corporate Counsel's Office doesn't mind being honest, because they know that they warned about it in the original users manuals. And that warning means that they are shielded from any liability.

Don't you feel more relaxed now? Xerox does!

#54 ::: Pfusand ::: (view all by) ::: August 11, 2013, 02:50 PM:

The OCR error I was most grateful to spot was the one that changed "She licked her lips" into "She licked her hips."

#55 ::: David Weingart ::: (view all by) ::: August 13, 2013, 01:34 PM:

Pfusand @ 54: She'd have to be pretty flexible

#56 ::: Pfusand ::: (view all by) ::: August 13, 2013, 03:39 PM:

David @ 55,
Well, she was an alien, but not that sort of alien.

#57 ::: Allan Beatty ::: (view all by) ::: August 13, 2013, 06:55 PM:

Corporate counsel may think they are honest when there is a warning buried deep in a manual.

It would be more honest if the button on the screen said "Sub-Normal" instead of "Normal".

#58 ::: Lee ::: (view all by) ::: August 14, 2013, 11:48 AM:

Allan, #57: Exactly. A format subject to errors of this type should never have been set as the DEFAULT. Designating it as "smallest" or "ultra-compressed" or, yes, "sub-normal" -- combined with the manual's warning that in using this mode you risk substitution errors -- would have been the ethical approach.

#59 ::: Mary Aileen ::: (view all by) ::: August 14, 2013, 01:14 PM:

Lee (58): Daniel@10 says the overly compressed setting was not the factory default, but calling it "Normal" implies that it's perfectly okay. And yes, that's *very* problematic.

#60 ::: Renee ::: (view all by) ::: August 15, 2013, 10:27 PM:

Hmm. I've seen read errors of this sort when using a hand-held scanner to do inventories of barcoded items. Here I thought such errors were all caused by smudged barcodes. Nice to know what causes this -- in the sense that 'nice' means I can now explain why random nonsense is showing up in the data.

...And now I get to worry about non-nonsensical number swaps. Sigh.

#61 ::: paul ::: (view all by) ::: August 16, 2013, 03:21 PM:

Not that it really matters, but how many of the people who use a typical officer copier even have access to the user manual, much less have read the thing cover to cover?

#62 ::: Kaleberg ::: (view all by) ::: August 18, 2013, 07:40 PM:

The real problem is that only one in a thousand font designers realizes that "6" and "8" are different characters. The rest of them seem to lack the mental capacity to understand the concept. I'm not surprised a Xerox machine couldn't tell a six from an eight. Most of the time, I can't either.

#63 ::: Daniel Martin ::: (view all by) ::: September 01, 2015, 02:03 PM:

The discoverer of the bug has recorded an hour-long presentation on the experience of finding and reporting the bug here: https://www.youtube.com/watch?v=c0O6UXrOZJo.

It includes some interesting details not available when this first broke, and among other things I need to retract my statement in comment 10: there are Xerox scanners that, in the factory default setting, mangled text including substituting some characters for others.

Patches now exist for those pieces of hardware, but many places haven't applied them. If you depend on scanned documents that would be made useless by the transposition of one or more characters, ensure that your IT people know of this problem and have applied the appropriate patches.

Welcome to Making Light's comment section. The moderators are Avram Grumer, Teresa & Patrick Nielsen Hayden, and Abi Sutherland. Abi is the moderator most frequently onsite. She's also the kindest. Teresa is the theoretician. Are you feeling lucky?

Comments containing more than seven URLs will be held for approval. If you want to comment on a thread that's been closed, please post to the most recent "Open Thread" discussion.

You can subscribe (via RSS) to this particular comment thread. (If this option is baffling, here's a quick introduction.)

Post a comment.
(Real e-mail addresses and URLs only, please.)

HTML Tags:
<strong>Strong</strong> = Strong
<em>Emphasized</em> = Emphasized
<a href="http://www.url.com">Linked text</a> = Linked text

Spelling reference:
Tolkien. Minuscule. Gandhi. Millennium. Delany. Embarrassment. Publishers Weekly. Occurrence. Asimov. Weird. Connoisseur. Accommodate. Hierarchy. Deity. Etiquette. Pharaoh. Teresa. Its. Macdonald. Nielsen Hayden. It's. Fluorosphere. Barack. More here.

Choose:
Smaller type (our default)
Larger type
Even larger type, with serifs

Dire legal notice