If you have more than one computer running Ubuntu (or Debian), or maybe you are experimenting with different installations of Ubuntu using something like VirtualBox, you might find yourself using a lot of bandwidth and time when downloading packages from the Internet to update or add capabilities to your machine. By default each installation of Ubuntu will go directly to the Ubuntu download servers to get packages, producing a situation where you are downloading the same file multiple times through your connection to your ISP. There is an alternative to this situation though, you can download the packages through a host on your own network that will act as a cache. The next time any machine wanting that file requires it, the cache will serve its own copy, instead of having to download it again. This is a lot quicker, as the speed in your internal network will be much higher than the speed of the connection to your ISP, it is also a great bonus if you have maximum download allowances as part of your Internet connectivity package. Setting this up is not too difficult, thanks to a program called Apt-cacher.
This is worth doing even if you find yourself running two or three virtual machines on your laptop as it will save time when downloading updates on the second machine (if it is the same architecture), and also make updates locally available if you should want to set up another virtual machine. If you are running only one machine it is not worth doing this as you will of course use up a but of disc space in holding the cache of packages. I followed the instructions on the Ubuntu Wiki at https://help.ubuntu.com/community/Apt-Cacher-Server to set this up. Firstly, I installed the apt-cacher (don't confuse this with apt-cache which is a command used to search the package directory) package from the repositories on the machine I was using to host the cache, then edited /etc/default/apt-cacher to change AUTOSTART to 1 (this will cause the apt-cache daemon to start automatically when the cache host machine starts). There is a configuration file at /etc/apt-cacher/apt-cacher.conf in which I only changed the admin email, but there are many other options if you want more control over how the system works. Now you can start up your new caching server with:
sudo /etc/init.d/apt-cacher start
You only need to do this when you first install it as it will be started automatically when you boot up your host. You should be able to see signs of life from your apt cacher server if you go to http://localhost:3142/ if you are setting it up on a remote machine, change 'localhost' for the hostname of the machine.
Now you can configure your client (or guest) machines to use your new cache. There are two ways to do this, if you use Synaptic to for package management you can just go to Settings -> Preferences -> Network and select "Manual Proxy Configuration" and enter the IP address or hostname used for your caching host under "HTTP Proxy" and select 3142 for the port name. If your router allocates IP addresses dynamically then it is probably worth setting up a fixed IP address for your caching host to avoid having to repeat this step. If the machine that is going to be a client of the proxy is a machine that you might take off your own network (e.g. if it is a laptop) then you can just switch this setting back to "Direct Connection to the Internet" while you are away.
If you are using the command line utility apt then set up a file named 01Proxy under /etc/apt/apt.conf.d and enter into it:
Acquire::http::Proxy "http://[IP address or hostname of the caching server]:3142";
Not a lot of configuration for a real benefit. You don't have to be running the same version of Ubuntu on your host as your guests either, I've been using this on an Ubuntu 8.04 host with some virtual machine images of Ubuntu 8.10 and it works very well.
Re: Saving bandwidth when using multiple Ubuntu machines ...
Your document is clear; itś well-written. However I´d appreciate something about how one goes about testing this. I believe I followed your instructions to the letter. To test it: I did a sudo apt-get update at the client. What appeared is no different from the normal sudo apt-get update before apt-cacher. All the ¨Get:nn http://" are against the repos from the sources.list. Itś not clear whether the clientś apt is even going to the apt-cacher server.
I have made no changes to my server and client sources.list.
I can connect to the server with a http://[192.168.1.1]:3142 from the client
In the client:
o I setup a ¨Manual Proxy Config¨ via synaptic.
o I also set up the 01proxy in apt.conf.d
How do I know this is working or not?
Thanks
MP
Re: Saving bandwidth when using multiple Ubuntu machines ...
Hi Max,
This is a good question, as apt-cacher is acting as a proxy you won't see any difference in the output of apt-get update. Fortunately, apt-cacher provides logging functionality so you can see exactly what it is up to. The log files apt-cacher creates can be found under /var/log/apt-cacher/ on the machine you have apt-cacher installed. A good way to test it is to use this command:
sudo tail -f /var/log/apt-cacher/access.log
This will output what apt-cacher is doing to a terminal window and show you if it had the resource being required or had to fetch it. If you try to update a client machine now you should see that this generates output here. Also available is a summary report of the traffic apt-cacher handled. To generate this you need to edit /etc/apt-cacher/apt-cacher.conf and make sure you switch the reporting option on, so you need a line that says:
generate_reports=1
After these reports have run (they are generated once a day), you will find a file named report.html under /var/log/apt-cacher/ which will show you some statistics about how your instance of apt-cacher is performing.
Hope this helps!
Re: Saving bandwidth when using multiple Ubuntu machines ...
Hi,
Your post is indeed useful. I have a question.
Do you know if there's a way to set up any of the clients to act as servers? Say I have 6 clients. I do not want any particular client or the server to act as the host. If client A downloads a package (client A could be any of the 6), the next time some client B needs the same package, it should be downloaded from client A.
Thanks!