RE: Update: Web browsers - a mini-farce (MSIE gives in)
> From: Valdis.Kletnieks@xxxxxx [mailto:Valdis.Kletnieks@xxxxxx] 
> Sent: Wednesday, 27 October, 2004 12:24
> 
> On Wed, 27 Oct 2004 06:32:07 PDT, Michael Wojcik said:
> 
> > > "A program designed for inputs from people is usually
> > > stressed beyond breaking point by computer-generated inputs.
> > > -- Dennis Ritchie
> > 
> > Moot.  Since HTML is frequently computer-generated, HTML 
> > renderers shouldn't be designed for human-generated input.
> 
> Not moot at all.  Remember that an array of test cases is 
> human-generated inputs - and the renderer is usually mostly
> tested against said inputs.
So what?  Ritchie wrote "a program designed for inputs from people".  My
point is that an HTML renderer should not be designed for inputs from
people.  If it is, it's misdesigned.
(And test cases can obviously include machine-generated input.  Nothing
about a test case mandates that the input be human-generated.)
> And even more to the point - automated testing isn't a panacea
> either.
I never claimed it was.  In fact, my point was precisely the opposite.
> Just because you've "fixed" the browser so it doesn't crash when
> you point it at file:///dev/urandom (or moral equivalent) doesn't
> mean that you've achieved good coverage.
More straw men.  Where did I ever argue otherwise?  Where did anyone in this
thread argue otherwise?
> > I think that's a straw man, Valdis.  HTML renderers should 
> > expect malformed HTML input, and dealing with it is not
> > difficult.
> 
> I was speaking more in general - although it's *true* that 
> there should be basic sanity checking, the *general* problem is
> that the programmer can't, in general, write code to protect
> against bugs he hasn't conceived of.
I don't believe that's correct.  In general, programmers can protect against
classes of bugs without prior understanding of those bugs.
You don't have to understand how to exploit a buffer overflow in order to
avoid overflowing buffers.  You don't have to understand SQL code-injection
attacks to restrict SQL input fields to valid characters.  You don't have to
understand cross-site scripting by embedded HTML to strip or sanitize HTML
tags from user-supplied input that shouldn't have them.  You don't need to
understand how signed-integer overflow could cause a problem to check for
it.  And so on.
*In general*, if programmers enforce contracts rather than assuming them (by
validating input, checking sizes, checking for error returns, and so forth),
they eliminate entire categories of bugs without having to know anything at
all about them.
> > Basic input validation and sanitization isn't that difficult.
> 
> Yes, *basic* validation isn't that hard.  It's the corner 
> cases that end up biting you most of the time.
I don't believe this is true, statistically.  Most of the bugs I see come
through Bugtraq aren't corner cases at all.  They're basic errors, like not
checking that input doesn't overflow a buffer.  I suspect careful analysis
of the archives would bear that out: few of the prominent bugs, or indeed
all the reported bugs, represent unusual conditions.  They just represent
conditions that the programmers were too lazy to check for.
> > I write comms code - client- and server-side middleware.  I 
> > wouldn't dream of implementing a protocol with code that didn't
> > sanity-check the data it gets off the wire.
> 
> And you've *never* shipped a release that had a bug reported 
> against it, and when you looked at it, you did a Homer Simpson
> and said "D'Oh!"?
What a large pile of straw you have at your disposal.  That's a mighty
conceptual leap, from "my protocol implementations sanity-check their input"
to "there are never dumb bugs in the products that include them".
I always put on my seat belt when I drive my car.  That doesn't mean I've
never been in an accident.
I stand by what I wrote: when I implement a protocol, the data off the wire
is sanity-checked.  It doesn't overflow input buffers, and the parser
doesn't read past the end of the input.  Parameters are examined for sane
values.  Return codes are checked.
And if I were writing an HTML renderer, the same would be true there.  It's
very little additional effort, amortized over the product lifetime.
> > I don't see any reason why browser writers shouldn't be
> > held to the same standard.  Avoiding unsafe assumptions 
> > when processing input should not add significantly to
> > develompment time; if it does, you need to retrain your
> > developers.
> 
> How much would it have added to development time to have 
> closed *all* the holes *up front* (including *thinking* of them)
"thinking of them" isn't a prerequisite.
> to stop Liu Die Yu's "Six Step IE Remote Compromise Cache
> Attack"?
Another straw man.  My argument was "avoiding unsafe assumptions when
processing input should not add significantly to development time".  (From
context it should be clear that this refers specifically to *direct*
processing of the input, such as receiving and parsing it.  Obviously, a
sufficiently broad definition of "processing input" could include everything
a program does.)  That is in no way equivalent to "clos[ing] all the holes"
in Six-Step, particularly since several of the steps have nothing to do with
unsafe assumptions in input processing; they're cases of IE (and Windows)
being broken as designed.
> Remember what David Brodbeck said, which is what I was replying
> to:
> 
> > How many times have you seen a word processor crash due to an
> > unfortunate sequence of commands or opening a corrupted
> > file, for example?
> 
> The point people are missing is that covering all (or even 
> anywhere *near* "all") the "unfortunate sequences" or "corrupted
> files" is *really really* hard,
And unnecessary, in the latter case.  What I'm talking about doesn't require
exhaustive testing.  There is no excuse - none - for a word processor
crashing when trying to open a file.  Every step in that process, from
performing I/O to parsing the file format to building the internal
representation, can be done safely for arbitrary input.
I'll grant that arbitrary command sequences are certainly more difficult,
since they exercise the entire code base.  But that's not what I'm concerned
with here.
-- 
Michael Wojcik
Principal Software Systems Developer, Micro Focus