FAQs

What is the source of the download data?

PyPI provides download records as a publicly available dataset on Google's BigQuery. You can access the data with a Google Cloud account here.

When is the website data updated?

The data update begins at 01:00:00 UTC and should take about 10 minutes.

Why are there so many more downloads after July 26, 2018?

PyPI download records are generated by a service known as linehaul. The previous iteration of the service had an issue which caused it to restart regularly due to running out of memory, resulting in a large quantity of dropped download records. On July 26, a newer version of the service was deployed, which is much more robust and reliable.

Why are the cumulative download counts different from the sum of the downloads from the overall chart?

The cumulative download counts consider only the download records which are not from a known set of PyPI mirror applications, namely bandersnatch, z3c.pypimirror, Artifactory, and devpi. In other words, the cumulative download counts take the sum of the downloads from the Without_Mirrors dataset from the chart.

What is the difference between Without_Mirrors and With_Mirrors downloads?

The With_Mirrors and Without_Mirrors downloads are not mutually exclusive sets of download counts like the other segmentations provided. In fact, the Without_Mirrors downloads are a subset of the downloads in With_Mirrors.

Some entities will create a mirror, or clone, of the PyPI repository using a tool like bandersnatch for the sake of security or availability. This means that their mirror repository regularly syncs with PyPI by downloading all of the Python packages available (and versions thereof) that it does not already have. Those downloads are recorded by PyPI with bandersnatch as the user-agent. You will see also that on days in which you release a new version of your package there will be many more downloads from mirrors, as active mirrors will sync with PyPI by downloading those new releases.

pypistats.org filters downloads from known mirrors from the version and system segmentations on the website. Downloads by mirrors are intentionally excluded from download breakdowns because they do not represent end-users of the software. Instead, they serve as an alternative provider to other end-users on a separate (sometimes private) network.

The existence of mirrors means that the downloads provided by PyPI and BigQuery come with some uncertainty with respect to the actual aggregate usage of Python packages. One might expect that mirrors will mask end-user downloads for more commonly used packages while simultaneously inflating the download counts of less common ones. This uncertainty is difficult to quantify because the mirrors don't report subsequent downloads back to PyPI.

One can, however, assume that PyPI serves a significant proportion of the Python community's packaging downloads. Hopefully significant enough that the quantities provided here are representative of their users and relevant to package maintainers. There are other distributors, like Conda, which also serve python packages, but their download data is currently not publicly available at the event level like PyPI's, and thus are not incorporated into the metrics on this website.

Why disregard mirrors from aggregate data?

The intent of disregarding mirrors is to provide metrics that reflect end-user download aggregation.

What about downloads due to CI/CD tools?

Downloads from CI/CD tools are included in all metrics. There is currently no easy way to attribute downloads to build/deployment tools.