# FAQs

PyPI provides download records as a publicly available dataset on Google's BigQuery. You can access the data with a Google Cloud account here.

### When is the website data updated?

The data update begins at 01:00:00 UTC and should take about 10 minutes.

### Why are there so many more downloads after July 26, 2018?

PyPI download records are generated by a service known as linehaul. The previous iteration of the service had an issue which caused it to restart regularly due to running out of memory, resulting in a large quantity of dropped download records. On July 26, a newer version of the service was deployed, which is much more robust and reliable.

The cumulative download counts consider only the download records which are not from a known set of PyPI mirror applications, namely bandersnatch, z3c.pypimirror, Artifactory, and devpi. In other words, the cumulative download counts take the sum of the downloads from the Without_Mirrors dataset from the chart.

Some entities will create a mirror, or clone, of the PyPI repository using a tool like bandersnatch for the sake of security or availability. This means that their mirror repository regularly syncs with PyPI by downloading all of the Python packages available (and versions thereof) that it does not already have. Those downloads are recorded by PyPI with bandersnatch as the user-agent. You will see also that on days in which you release a new version of your package there will be many more downloads from mirrors, as active mirrors will sync with PyPI by downloading those new releases.

pypistats.org filters downloads from known mirrors from the version and system segmentations on the website. Downloads by mirrors are intentionally excluded from download breakdowns because they do not represent end-users of the software. Instead, they serve as an alternative provider to other end-users on a separate (sometimes private) network.

The existence of mirrors means that the downloads provided by PyPI and BigQuery come with some uncertainty with respect to the actual aggregate usage of Python packages. One might expect that mirrors will mask end-user downloads for more commonly used packages while simultaneously inflating the download counts of less common ones. This uncertainty is difficult to quantify because the mirrors don't report subsequent downloads back to PyPI.

One can, however, assume that PyPI serves a significant proportion of the Python community's packaging downloads. Hopefully significant enough that the quantities provided here are representative of their users and relevant to package maintainers. There are other distributors, like Conda, which also serve python packages, but their download data is currently not publicly available at the event level like PyPI's, and thus are not incorporated into the metrics on this website.

### Why disregard mirrors from aggregate data?

The intent of disregarding mirrors is to provide metrics that reflect end-user download aggregation.