I’ve been putting some thought into the source of the problem plaguing bandwidth consuming internet customers over the past few years, where ISPs, faced with saturating their network connections, have resorted to charging customers extra for heavy downloading.
The Problem
In 1996, the fastest reasonable residential connection you could get was a 28.8kbit modem, and the fastest reasonable ISP network connection was 100MBit ethernet. It was overkill, but you could easily sustain about 3,472 customers simultaneously, as they all saturated their bandwidth completely.
Today, the fastest reasonable residential speed in North America is between 5mbit and 20mbit, and in Japan and Europe speeds easily approach 50mbit in some places, and the fastest reasonable ISP network connection nowadays is 1gbit ethernet. Now you can only service 200 customers saturating their download links at 5mbit (Just about the slowest high speed you can find) and if your customers run at 50 mbit then you’re down to 20 customers, and that’s all it takes to bring a gigabit uplink to its knees. 10gbit connections are becoming more and more reasonable, and there will be more so as time goes on, but it’s hardly keeping pace with the advances in residential internet speed, and the gap between residential and ISP interconnects keeps getting smaller and smaller.
In addition to that, the idea that 3,472 customers in the same area would all be simultaneously downloading at full speed in 1996 was a ludicrous hypothetical. Today, it’s getting closer and closer to reality. A solid percentage of customers are exactly the type who will turn their modems on full blast day and night, and if you’re reading my blog then you probably know at least one of them ;)
On top of that, your data doesn’t just make a direct line between your connection and the destination. It gets aggregated into a web of super fast connections like blood flowing from capillaries to veins and arteries, and back to capillaries again. These super links are called backbones, in which the many thousands of connections to your local area become many millions of connections (Some of which are business or university or military customers that use up gigabits just by themselves), and when you add it all up it’s not pretty. In my experience, there’s never been a time that the backbone providers didn’t charge the ISPs for bytes transferred.
So what’s an ISP to do? The answer, thusfar, seems to be a combination of limiting download speeds during peak hours, and charging overage fees for heavy consumers, because there simply isn’t enough speed to go around.
The Idea
My idea is deceptively simple. Right now, if a thousand people in the same area download the same youtube clip (as an example) then every one of those thousand downloads goes through the same links. In other words, one video is transmitted down the same path a thousand times, meaning ISPs need a thousand times more speed in their interconnects, and pay a thousand times more in backbone fees than is actually required to provide their customers with that video clip.
My solution is deceptively simple. Cache it. And cache it in such a way that any application, from web browsers to torrent clients, can all interact with the same cache. It could be as simple as a server that accepts a hash, and replies with the data.
Example Applications
A web user browser visits http://www.example.com/bigpicture.png The browser retreives http://www.example.com/CacheInfo?bigpicture.png and gets a hash description, then checks the cache server for it. If it’s not present, it downloads the image as normal, and then supplies a copy of it to the hash server. This can probably be done with a firefox plugin or some other simple implementation.
A torrent client (Which already has access to hashes in order to do its job) does a query for each piece of the file from the cache server when it first starts downloading, and retrieves those before it even bothers to access the peer-to-peer network. This just needs a proof of concept and then chances are most torrent clients will incorporate it.
Windows update, and other automatic downloads could be modified to take advantages of these servers, saving Microsoft, you, and your ISP.
Problems and Solutions
If you see any problems with this idea, you’re welcome to add a comment to this post. Here’s a few I’ve thought up myself:
Q: Wouldn’t this require some huge effort to get it standardized?
A: No. With browser plugins and open source torrent clients, it can be implemented now, and deployed on private networks such as offices and apartment buildings and hotels. Someone just needs to make the cache server and some proof of concept implementations such as special browser/torrent client plugins. Standardization would help it, and it will be a lot easier if there’s some implementations floating around out there first.
Q: You’ve explained why ISPs would want this, but why would end users want this?
A: Cacheing makes downloads faster. You want this, especially if you’re a prolific downloader, and you’ll be losing out if it’s offered to you and you pass. On top of that, ISPs could set up a system where they charge you for everything except cache downloads, so that you have to use it or else face possible overages.
Q: What if someone downloads kiddie porn? Wouldn’t the ISPs be liable if it’s found on their cache server? / I don’t want people knowing that I download unauthorized copies and porn, but I still want to benefit from cacheing.
A: It’s possible to obfuscate the data on the cache server in a way that the server operator has no idea what’s on it. Consider a protocol like this. You, as a cache contributor, know the hash, and the hash data. So you could encrypt the hash data with the hash using AES or something., and then send a hash-of-the-hash to the server. The server would have no way of finding out the hash, or the data. But as a cache consumer who knows the hash, you can provide the cache server with that same hash-of-the-hash, and get the encrypted data, and then decrypt it with the hash. Even if they sniff this interaction, they will still be unable to ever find out what’s in there. Hopefully, there’s enough sanity in the justice system that people won’t be sued successfully for serving up data that is totally unknown to them, based on what you get when you decrypt it.
Q: What about abusers of the cache service?
A: The cache server would expire unused hashes, so if you upload fluff then it’ll promptly disappear from the cache server anyway. If you maliciously upload corrupt data then the server can check and verify it, unless you’re using a privacy scheme like the one described above, in which case it will have to be policed the same way as e-mail. Complaints would be generated, and log files will become suspicious, and the credentials used to access the cache server by the malicious user would be revoked or turned read-only. Size caps can also be implemented, so there will have to be some sort of protocol for breaking up huge files into bite sized chunks, preferably implemented in the client.
Q: How will I know what cache to use?
A: A search progression could be used. For example, if it doesn’t find it in one, it can check the next. If this gets implemented on your local network, your LAN administrator will tell you which cache to use. If it gets implemented by your ISP, they’d have to set up a special address such as ucache.isp.com much like how they set up their e-mail servers, only it would have to be dynamic depending on who asks so it points at the cache server closest to you in their network.