Google Cloud Storage
M-Lab publishes all data it collected in raw form as archives on Google Cloud Storage (GCS) at the following location:
File Layout
All M-Lab files are packaged up in compressed tarballs. They are placed in folders and named according to the following schema:
[tool]/[YYYY]/[MM]/[DD]/[YYYYMMDD]T[HHMMSS]-[server]-[tool]-[file index].tgz
tool
: The measurement tool that that generated the data.YYYYMMDDTHHMMSS
: Start of the time window in which the data was collected.server
: M-Lab server that collected the data.file index
: Index of the file.
This means that each tarball contains all the data collected during a single day, by a single tool running on a single M-Lab server.
If the data collected during one day by one tool on one server are more than 1GB (uncompressed), the files are split into multiple tarballs of up to 1 GB in size.
For example, the tarball 20090218T000000Z-mlab1-lga01-ndt-0000.tgz
contains the first 1 GB of data collected by all the NDT tests that were served by the M-Lab server mlab1-lga01 on Feb 18, 2009.
Project Data
Direct links to each M-Lab project’s raw data are available below:
- Glasnost
- NDT
- Neubot
- Neubot measures the Internet in order to gather data useful to study broadband performance, network neutrality, and Internet censorship.
- More information is available at Nexa Center and Github.
- NPAD
- OONI
- OONI measures censorship, surveillance and traffic manipulation on the Internet.
- More information is available at OONI
- Paris Traceroute
- Paris Traceroute maps network topology between two points on the Internet.
- More information is available at Paris Traceroute.
- pathload2 (deprecated)
- M-Lab no longer supports this tool, but its archived data is available on GCS. For similar measurements with a current and supported tool, see NDT.
- Pathload2 measures the available bandwidth of an Internet connection.
- More information is available at https://code.google.com/p/pathload2-gatech/.
- Shaperprobe (deprecated)
- M-Lab no longer supports this tool, but its archived data is available on GCS.
- Shaperprobe detects prioritization of network traffic.
- More information is available at ShaperProbe
- SideStream
- SideStream collects TCP state information about completed TCP connections on a system.
- More information is available on Github.
- mlab-collectd
- mlab-collectd is a monitoring tool for M-Lab slices that collects resource utilization information about all M-Lab servers.
- More information is available on Github.
Accessing Data Programmatically
Accessing Data with gsutil
The easiest way to access M-Lab data on GCS programmatically is by using the gsutil
command line utility.
# List the contents of the M-Lab NDT data in GCS.
$ gsutil ls -l gsutil ls -l gs://m-lab/
# Copy a file from GCS locally.
$ gsutil cp gs://m-lab/ndt/2009/02/18/20090218T000000Z-mlab1-lga01-ndt-0000.tgz .
Accessing Data with Common HTTP Tools
The URLs shown in M-Lab’s GCS web interface require the user to be logged in, which can present challenges when attempting to access the data with common HTTP utilities like curl
or wget
.
You can access M-Lab files programmatically by replacing:
storage.cloud.google.com/m/cloudstorage/b
with
storage.googleapis.com
in any GCS URL.
For example, if the URL of a raw NDT archive on the GCS Web application is:
You can access it without authentication via this URL:
https://storage.googleapis.com/m-lab/ndt/2015/12/28/20151228T000000Z-mlab1-lga04-ndt-0001.tgz
GCS File Index
A list of all M-Lab files in GCS is available at:
https://storage.googleapis.com/m-lab/list/all_mlab_tarfiles.txt.gz
This file provides gs://
URLs to M-Lab data. To convert these URLs to https://
URLs (compatible with common HTTP tools) you can convert the file using the following bash script:
$ curl https://storage.googleapis.com/m-lab/list/all_mlab_tarfiles.txt.gz | gunzip | \
while read; do echo ${REPLY/gs:\/\//https://storage.googleapis.com/}; done