mirror of
https://gitlab.archlinux.org/archlinux/aurweb.git
synced 2025-02-03 10:43:03 +01:00
feat: archive git repository (experimental)
See doc/git-archive.md for general Git archive specifications See doc/repos/metadata-repo.md for info and direction related to the new Git metadata archive
This commit is contained in:
parent
ec3152014b
commit
30e72d2db5
34 changed files with 1104 additions and 50 deletions
75
doc/git-archive.md
Normal file
75
doc/git-archive.md
Normal file
|
@ -0,0 +1,75 @@
|
|||
# aurweb Git Archive Specification
|
||||
|
||||
<span style="color: red">
|
||||
WARNING: This aurweb Git Archive implementation is
|
||||
experimental and may be changed.
|
||||
</span>
|
||||
|
||||
## Overview
|
||||
|
||||
This git archive specification refers to the archive git repositories
|
||||
created by [aurweb/scripts/git_archive.py](aurweb/scripts/git_archive.py)
|
||||
using [spec modules](#spec-modules).
|
||||
|
||||
## Configuration
|
||||
|
||||
- `[git-archive]`
|
||||
- `author`
|
||||
- Git commit author
|
||||
- `author-email`
|
||||
- Git commit author email
|
||||
|
||||
See an [official spec](#official-specs)'s documentation for spec-specific
|
||||
configurations.
|
||||
|
||||
## Fetch/Update Archives
|
||||
|
||||
When a client has not yet fetched any initial archives, they should clone
|
||||
the repository:
|
||||
|
||||
$ git clone https://aur.archlinux.org/archive.git aurweb-archive
|
||||
|
||||
When updating, the repository is already cloned and changes need to be pulled
|
||||
from remote:
|
||||
|
||||
# To update:
|
||||
$ cd aurweb-archive && git pull
|
||||
|
||||
For end-user production applications, see
|
||||
[Minimize Disk Space](#minimize-disk-space).
|
||||
|
||||
## Minimize Disk Space
|
||||
|
||||
Using `git gc` on the repository will compress revisions and remove
|
||||
unreachable objects which grow the repository a considerable amount
|
||||
each commit. It is recommended that the following command is used
|
||||
after cloning the archive or pulling updates:
|
||||
|
||||
$ cd aurweb-archive && git gc --aggressive
|
||||
|
||||
## Spec Modules
|
||||
|
||||
Each aurweb spec module belongs to the `aurweb.archives.spec` package. For
|
||||
example: a spec named "example" would be located at
|
||||
`aurweb.archives.spec.example`.
|
||||
|
||||
[Official spec listings](#official-specs) use the following format:
|
||||
|
||||
- `spec_name`
|
||||
- Spec description; what this spec produces
|
||||
- `<link to repo documentation>`
|
||||
|
||||
### Official Specs
|
||||
|
||||
- [metadata](doc/specs/metadata.md)
|
||||
- Package RPC `type=info` metadata
|
||||
- [metadata-repo](repos/metadata-repo.md)
|
||||
- [users](doc/specs/users.md)
|
||||
- List of users found in the database
|
||||
- [users-repo](repos/users-repo.md)
|
||||
- [pkgbases](doc/specs/pkgbases.md)
|
||||
- List of package bases found in the database
|
||||
- [pkgbases-repo](repos/pkgbases-repo.md)
|
||||
- [pkgnames](doc/specs/pkgnames.md)
|
||||
- List of package names found in the database
|
||||
- [pkgnames-repo](repos/pkgnames-repo.md)
|
|
@ -70,20 +70,48 @@ computations and clean up the database:
|
|||
* aurweb-pkgmaint automatically removes empty repositories that were created
|
||||
within the last 24 hours but never populated.
|
||||
|
||||
* aurweb-mkpkglists generates the package list files; it takes an optional
|
||||
--extended flag, which additionally produces multiinfo metadata. It also
|
||||
generates {archive.gz}.sha256 files that should be located within
|
||||
* [Deprecated] aurweb-mkpkglists generates the package list files; it takes
|
||||
an optional --extended flag, which additionally produces multiinfo metadata.
|
||||
It also generates {archive.gz}.sha256 files that should be located within
|
||||
mkpkglists.archivedir which contain a SHA-256 hash of their matching
|
||||
.gz counterpart.
|
||||
|
||||
* aurweb-usermaint removes the last login IP address of all users that did not
|
||||
login within the past seven days.
|
||||
|
||||
* aurweb-git-archive generates Git repository archives based on a --spec.
|
||||
This script is a new generation of aurweb-mkpkglists, which creates and
|
||||
maintains Git repository versions of the archives produced by
|
||||
aurweb-mkpkglists. See doc/git-archive.md for detailed documentation.
|
||||
|
||||
These scripts can be installed by running `poetry install` and are
|
||||
usually scheduled using Cron. The current setup is:
|
||||
|
||||
----
|
||||
*/5 * * * * poetry run aurweb-mkpkglists [--extended]
|
||||
# Run aurweb-git-archive --spec metadata directly after
|
||||
# aurweb-mkpkglists so that they are executed sequentially, since
|
||||
# both scripts are quite heavy. `aurweb-mkpkglists` should be removed
|
||||
# from here once its deprecation period has ended.
|
||||
*/5 * * * * poetry run aurweb-mkpkglists [--extended] && poetry run aurweb-git-archive --spec metadata
|
||||
|
||||
# Update popularity once an hour. This is done to reduce the amount
|
||||
# of changes caused by popularity data. Even if a package is otherwise
|
||||
# unchanged, popularity is recalculated every 5 minutes via aurweb-popupdate,
|
||||
# which causes changes for a large chunk of packages.
|
||||
#
|
||||
# At this interval, clients can still take advantage of popularity
|
||||
# data, but its updates are guarded behind hour-long intervals.
|
||||
*/60 * * * * poetry run aurweb-git-archive --spec popularity
|
||||
|
||||
# Usernames
|
||||
*/5 * * * * poetry run aurweb-git-archive --spec users
|
||||
|
||||
# Package base names
|
||||
*/5 * * * * poetry run aurweb-git-archive --spec pkgbases
|
||||
|
||||
# Package names
|
||||
*/5 * * * * poetry run aurweb-git-archive --spec pkgnames
|
||||
|
||||
1 */2 * * * poetry run aurweb-popupdate
|
||||
2 */2 * * * poetry run aurweb-aurblup
|
||||
3 */2 * * * poetry run aurweb-pkgmaint
|
||||
|
|
121
doc/repos/metadata-repo.md
Normal file
121
doc/repos/metadata-repo.md
Normal file
|
@ -0,0 +1,121 @@
|
|||
# Repository: metadata-repo
|
||||
|
||||
## Overview
|
||||
|
||||
The resulting repository contains RPC `type=info` JSON data for packages,
|
||||
split into two different files:
|
||||
|
||||
- `pkgbase.json` contains details about each package base in the AUR
|
||||
- `pkgname.json` contains details about each package in the AUR
|
||||
|
||||
See [Data](#data) for a breakdown of how data is presented in this
|
||||
repository based off of a RPC `type=info` base.
|
||||
|
||||
See [File Layout](#file-layout) for a detailed summary of the layout
|
||||
of these files and the data contained within.
|
||||
|
||||
**NOTE: `Popularity` now requires a client-side calculation, see [Popularity Calculation](#popularity-calculation).**
|
||||
|
||||
## Data
|
||||
|
||||
This repository contains RPC `type=info` data for all packages found
|
||||
in AUR's database, reorganized to be suitable for Git repository
|
||||
changes.
|
||||
|
||||
- `pkgname.json` holds Package-specific metadata
|
||||
- Some fields have been removed from `pkgname.json` objects
|
||||
- `ID`
|
||||
- `PackageBaseID -> ID` (moved to `pkgbase.json`)
|
||||
- `NumVotes` (moved to `pkgbase.json`)
|
||||
- `Popularity` (moved to `pkgbase.json`)
|
||||
- `pkgbase.json` holds PackageBase-specific metadata
|
||||
- Package Base fields from `pkgname.json` have been moved over to
|
||||
`pkgbase.json`
|
||||
- `ID`
|
||||
- `Keywords`
|
||||
- `FirstSubmitted`
|
||||
- `LastModified`
|
||||
- `OutOfDate`
|
||||
- `Maintainer`
|
||||
- `URLPath`
|
||||
- `NumVotes`
|
||||
- `Popularity`
|
||||
- `PopularityUpdated`
|
||||
|
||||
## Popularity Calculation
|
||||
|
||||
Clients intending to use popularity data from this archive **must**
|
||||
perform a decay calculation on their end to reflect a close approximation
|
||||
of up-to-date popularity.
|
||||
|
||||
Putting this step onto the client allows the server to maintain
|
||||
less popularity record updates, dramatically improving archiving
|
||||
of popularity data. The same calculation is done on the server-side
|
||||
when producing outputs for RPC `type=info` and package pages.
|
||||
|
||||
```
|
||||
Let T = Current UTC timestamp in seconds
|
||||
Let PU = PopularityUpdated timestamp in seconds
|
||||
|
||||
# The delta between now and PU in days
|
||||
Let D = (T - PU) / 86400
|
||||
|
||||
# Calculate up-to-date popularity:
|
||||
P = Popularity * (0.98^D)
|
||||
```
|
||||
|
||||
We can see that the resulting up-to-date popularity value decays as
|
||||
the exponent is increased:
|
||||
- `1.0 * (0.98^1) = 0.98`
|
||||
- `1.0 * (0.98^2) = 0.96039999`
|
||||
- ...
|
||||
|
||||
This decay calculation is essentially pushing back the date found for
|
||||
votes by the exponent, which takes into account the time-factor. However,
|
||||
since this calculation is based off of decimals and exponents, it
|
||||
eventually becomes imprecise. The AUR updates these records on a forced
|
||||
interval and whenever a vote is added to or removed from a particular package
|
||||
to avoid imprecision from being an issue for clients
|
||||
|
||||
## File Layout
|
||||
|
||||
#### pkgbase.json:
|
||||
|
||||
{
|
||||
"pkgbase1": {
|
||||
"FirstSubmitted": 123456,
|
||||
"ID": 1,
|
||||
"LastModified": 123456,
|
||||
"Maintainer": "kevr",
|
||||
"OutOfDate": null,
|
||||
"URLPath": "/cgit/aur.git/snapshot/pkgbase1.tar.gz",
|
||||
"NumVotes": 1,
|
||||
"Popularity": 1.0,
|
||||
"PopularityUpdated": 12345567753.0
|
||||
},
|
||||
...
|
||||
}
|
||||
|
||||
#### pkgname.json:
|
||||
|
||||
{
|
||||
"pkg1": {
|
||||
"CheckDepends": [], # Only included if a check dependency exists
|
||||
"Conflicts": [], # Only included if a conflict exists
|
||||
"Depends": [], # Only included if a dependency exists
|
||||
"Description": "some description",
|
||||
"Groups": [], # Only included if a group exists
|
||||
"ID": 1,
|
||||
"Keywords": [],
|
||||
"License": [],
|
||||
"MakeDepends": [], # Only included if a make dependency exists
|
||||
"Name": "pkg1",
|
||||
"OptDepends": [], # Only included if an opt dependency exists
|
||||
"PackageBase": "pkgbase1",
|
||||
"Provides": [], # Only included if `provides` is defined
|
||||
"Replaces": [], # Only included if `replaces` is defined
|
||||
"URL": "https://some_url.com",
|
||||
"Version": "1.0-1"
|
||||
},
|
||||
...
|
||||
}
|
15
doc/repos/pkgbases-repo.md
Normal file
15
doc/repos/pkgbases-repo.md
Normal file
|
@ -0,0 +1,15 @@
|
|||
# Repository: pkgbases-repo
|
||||
|
||||
## Overview
|
||||
|
||||
- `pkgbase.json` contains a list of package base names
|
||||
|
||||
## File Layout
|
||||
|
||||
### pkgbase.json:
|
||||
|
||||
[
|
||||
"pkgbase1",
|
||||
"pkgbase2",
|
||||
...
|
||||
]
|
15
doc/repos/pkgnames-repo.md
Normal file
15
doc/repos/pkgnames-repo.md
Normal file
|
@ -0,0 +1,15 @@
|
|||
# Repository: pkgnames-repo
|
||||
|
||||
## Overview
|
||||
|
||||
- `pkgname.json` contains a list of package names
|
||||
|
||||
## File Layout
|
||||
|
||||
### pkgname.json:
|
||||
|
||||
[
|
||||
"pkgname1",
|
||||
"pkgname2",
|
||||
...
|
||||
]
|
15
doc/repos/users-repo.md
Normal file
15
doc/repos/users-repo.md
Normal file
|
@ -0,0 +1,15 @@
|
|||
# Repository: users-repo
|
||||
|
||||
## Overview
|
||||
|
||||
- `users.json` contains a list of usernames
|
||||
|
||||
## File Layout
|
||||
|
||||
### users.json:
|
||||
|
||||
[
|
||||
"user1",
|
||||
"user2",
|
||||
...
|
||||
]
|
14
doc/specs/metadata.md
Normal file
14
doc/specs/metadata.md
Normal file
|
@ -0,0 +1,14 @@
|
|||
# Git Archive Spec: metadata
|
||||
|
||||
## Configuration
|
||||
|
||||
- `[git-archive]`
|
||||
- `metadata-repo`
|
||||
- Path to package metadata git repository location
|
||||
|
||||
## Repositories
|
||||
|
||||
For documentation on each one of these repositories, follow their link,
|
||||
which brings you to a topical markdown for that repository.
|
||||
|
||||
- [metadata-repo](doc/repos/metadata-repo.md)
|
14
doc/specs/pkgbases.md
Normal file
14
doc/specs/pkgbases.md
Normal file
|
@ -0,0 +1,14 @@
|
|||
# Git Archive Spec: pkgbases
|
||||
|
||||
## Configuration
|
||||
|
||||
- `[git-archive]`
|
||||
- `pkgbases-repo`
|
||||
- Path to pkgbases git repository location
|
||||
|
||||
## Repositories
|
||||
|
||||
For documentation on each one of these repositories, follow their link,
|
||||
which brings you to a topical markdown for that repository.
|
||||
|
||||
- [pkgbases-repo](doc/repos/pkgbases-repo.md)
|
14
doc/specs/pkgnames.md
Normal file
14
doc/specs/pkgnames.md
Normal file
|
@ -0,0 +1,14 @@
|
|||
# Git Archive Spec: pkgnames
|
||||
|
||||
## Configuration
|
||||
|
||||
- `[git-archive]`
|
||||
- `pkgnames-repo`
|
||||
- Path to pkgnames git repository location
|
||||
|
||||
## Repositories
|
||||
|
||||
For documentation on each one of these repositories, follow their link,
|
||||
which brings you to a topical markdown for that repository.
|
||||
|
||||
- [pkgnames-repo](doc/repos/pkgnames-repo.md)
|
14
doc/specs/popularity.md
Normal file
14
doc/specs/popularity.md
Normal file
|
@ -0,0 +1,14 @@
|
|||
# Git Archive Spec: popularity
|
||||
|
||||
## Configuration
|
||||
|
||||
- `[git-archive]`
|
||||
- `popularity-repo`
|
||||
- Path to popularity git repository location
|
||||
|
||||
## Repositories
|
||||
|
||||
For documentation on each one of these repositories, follow their link,
|
||||
which brings you to a topical markdown for that repository.
|
||||
|
||||
- [popularity-repo](doc/repos/popularity-repo.md)
|
14
doc/specs/users.md
Normal file
14
doc/specs/users.md
Normal file
|
@ -0,0 +1,14 @@
|
|||
# Git Archive Spec: users
|
||||
|
||||
## Configuration
|
||||
|
||||
- `[git-archive]`
|
||||
- `users-repo`
|
||||
- Path to users git repository location
|
||||
|
||||
## Repositories
|
||||
|
||||
For documentation on each one of these repositories, follow their link,
|
||||
which brings you to a topical markdown for that repository.
|
||||
|
||||
- [users-repo](doc/repos/users-repo.md)
|
Loading…
Add table
Add a link
Reference in a new issue