PyPI provides download records as a publicly available dataset on Google's BigQuery. You can access the data with a Google Cloud account here.
The data update begins at 01:00:00 UTC and should take about 10 minutes.
PyPI download records are generated by a service known as linehaul. The previous iteration of the service had an issue which caused it to restart regularly due to running out of memory, resulting in a large quantity of dropped download records. On July 26, a newer version of the service was deployed, which is much more robust and reliable.
The cumulative download counts consider only the download records which are not from a known set of PyPI mirror
applications, namely bandersnatch
, z3c.pypimirror
, Artifactory
, and
devpi
. In other words, the cumulative download counts take the sum of the downloads from the
Without_Mirrors dataset from the chart.
The With_Mirrors and Without_Mirrors downloads are not mutually exclusive sets of download counts like the other segmentations provided. In fact, the Without_Mirrors downloads are a subset of the downloads in With_Mirrors.
Some entities will create a mirror, or clone, of the PyPI repository using a tool like bandersnatch
for the sake of security or availability. This means that their mirror repository regularly syncs with PyPI by
downloading all of the Python packages available (and versions thereof) that it does not already have. Those
downloads are recorded by PyPI with bandersnatch
as the user-agent. You will see also that on days
in which you release a new version of your package there will be many more downloads from mirrors, as active
mirrors will sync with PyPI by downloading those new releases.
pypistats.org filters downloads from known mirrors from the version and system segmentations on the website. Downloads by mirrors are intentionally excluded from download breakdowns because they do not represent end-users of the software. Instead, they serve as an alternative provider to other end-users on a separate (sometimes private) network.
The existence of mirrors means that the downloads provided by PyPI and BigQuery come with some uncertainty with respect to the actual aggregate usage of Python packages. One might expect that mirrors will mask end-user downloads for more commonly used packages while simultaneously inflating the download counts of less common ones. This uncertainty is difficult to quantify because the mirrors don't report subsequent downloads back to PyPI.
One can, however, assume that PyPI serves a significant proportion of the Python community's packaging downloads. Hopefully significant enough that the quantities provided here are representative of their users and relevant to package maintainers. There are other distributors, like Conda, which also serve python packages, but their download data is currently not publicly available at the event level like PyPI's, and thus are not incorporated into the metrics on this website.
The intent of disregarding mirrors is to provide metrics that reflect end-user download aggregation.
Downloads from CI/CD tools are included in all metrics. There is currently no easy way to attribute downloads to build/deployment tools.