Currently no one has backed up github.com (aside from Github). This webpage is about progress toward that. If you have 150-200TB of disk space and really good internet, please contact me about getting a copy of github.
Finally, full repository metadata is available in JSON format. The format is explained on the github API.
The files are available in batches of 10,000 at
http://za3k.com/github/repos-<X>0000-<X>9999.json http://za3k.com/github/repos-<X>0000-<X>9999.json.gzTo download all files, run
for x in {0..5000}; do \ echo "https://za3k.com/github/repos-${x}0000-${x}9999.json.gz"; \ done | wget -N -i -The combined size of these files is 9.7G compressed, 115G uncompressed. Files are grouped by github's internal id; since some repositories are deleted or privated, each file contains less than 10,000 repositories.
Metadata for gists is currently unavailable from github, but I'm working with them to make it public.
The Events Timeline is emphemeral, and being successfully recorded by githubarchive.org. A second person running the same program in case of downtime would be a plus.
(New!) http://ghtorrent.org/ is downloading the same timeline, and also fetching fuller historical data.
I selected 1000 random repositories from the above list, removing 427 forks. I then checked out all repositories. The total size was 4.3G, with or without compression. It was around 3 GB for a shallow checkout. If we assume forks take no space, this means an average github repository takes up 4.3M. Omitting the largest repositories may improve this estimate, but I didn't run further tests.
If there are 28,000,000 repositories on github at an average size of 4.3M each, that multiplies out to around 120TB data total for the git repositories.