Hi All!Gosh, it is nice to see interest in the Internet Archive
I am in charge of the crawl just recieved my clandestine check to clear out all that anti Bush stuff. (not!!!). Seriously, the archive is based on a crawl that is on a two month cylce to collect as much of the web as possible. Ideally a page would only show up once during a crawl, realisticly, we get about a 25% duplicate rate during a two month period.
That said, we did do a daily crawl as a commissioned work by the Library of Congress for September 11. That is why you see the sites outlined as being covered better after September 11th.
Now, actually changing data in the archive after we have collected it is very difficult, not impossible but not practical. A lot of compression of data goes on so that we can server up 120 Terabytes and changing compressed data is kinda like brain surgery. The archive would have to be re - taught where everything is!
Since the archive is material generated by others, any site owner can request that the archive not serve the information to the public net. In fact, because the archive electronically checks with the site before serving archived information, the site owner has control over what the archive will server from the past. So, in that sense the archive can have a touch of amnesia through the public interface even though the bits remain in the archive for future historians.
Anyway, I'm going out now, in my new Lamburgini I bought from that clandestine check!
P