Commit graph

61 commits

Author SHA1 Message Date
morganamilo
c0026caab0 libalpm: Give -U downloads a random .part name if needed
archweb's download links all ended in /download. This cause all the temp
files to be named download.part. With parallel downloads this results in
multiple downloads to go to the same temp file and breaks the transaction.

Assign random temporary filenames to downloads from URLs that are either
missing a filename, or if the filename does not contain at least three
hyphens (as a well formed package filename does).

While this approach to determining when to use a temporary filename is
not 100% foolproof, it does keep nice looking download progress bar names
when a proper package filename is given. The only downside of not using
temporary files when provided with a filename  with three or more hyphens
is URLs created specifically to bypass temporary filename usage can not
be downloaded in parallel. We probably do not want to download packages
from such URLs anyway.

Fixes FS#71464

Modified-by: Allan McRae (do not use temporary files for realish URLs)
Signed-off-by: Allan McRae <allan@archlinux.org>
2021-09-04 10:33:51 +10:00
Allan McRae
17f9911ffc Update copyright year
Signed-off-by: Allan McRae <allan@archlinux.org>
2021-03-01 12:22:20 +10:00
Anatol Pomozov
f078c2d3bc Move signature payload creation to download engine
Until now callee of ALPM download functionality has been in charge of
payload creation both for the main file (e.g. *.pkg) and for the accompanied
*.sig file. One advantage of such solution is that all payloads are
independent and can be fetched in parallel thus exploiting the maximum
level of download parallelism.

To build *.sig file url we've been using a simple string concatenation:
$requested_url + ".sig". Unfortunately there are cases when it does not
work. For example an archlinux.org "Download From Mirror" link looks like
this https://www.archlinux.org/packages/core/x86_64/bash/download/ and
it gets redirected to some mirror. But if we append ".sig" to the end of
the link url and try to download it then archlinux.org returns 404 error.

To overcome this issue we need to follow redirects for the main payload
first, find the final url and only then append '.sig' suffix.
This implies 2 things:
 - the signature payload initialization need to be moved to dload.c
 as it is the place where we have access to the resolved url
 - *.sig is downloaded serially with the main payload and this reduces
 level of parallelism

Move *.sig payload creation to dload.c. Once the main payload is fetched
successfully we check if the callee asked to download the accompanied
signature. If yes - create a new payload and add it to mcurl.

*.sig payload does not use server list of the main payload and thus does
not support mirror failover. *.sig file comes from the same server as
the main payload.

Refactor event loop in curl_multi_download_internal() a bit. Instead of
relying on curl_multi_check_finished_download() to return number of new
payloads we simply rerun the loop iteration one more time to check if
there are any active downloads left.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
2020-07-07 21:35:35 +10:00
Anatol Pomozov
84723cab5d Cleanup the old sequential download code
All users of _alpm_download() have been refactored to the new API.
It is time to remove the old _alpm_download() functionality now.

This change also removes obsolete SIGPIPE signal handler functionality
(this is a leftover from libfetch days).

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
2020-06-26 15:59:16 +10:00
Anatol Pomozov
16d98d6577 Convert '-U pkg1 pkg2' codepath to parallel download
Installing remote packages using its URL is an interesting case for ALPM
API. Unlike package sync ('pacman -S pkg1 pkg2') '-U' does not deal with
server mirror list. Thus _alpm_multi_download() should be able to
handle file download for payloads that either have 'fileurl' field
or pair of fields ('servers' and 'filepath') set.

Signature for alpm_fetch_pkgurl() has changed and it accepts an
output list that is populated with filepaths to fetched packages.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
2020-06-26 15:59:08 +10:00
Anatol Pomozov
b96e0df4dc Implement multibar UI
Multiplexed download requires ability to draw UI for multiple active progress
bars. To implement it we use ANSI codes to move cursor up/down and then
redraw the required progress bar.
`pacman_multibar_ui.active_downloads` field represents the list of active
downloads that correspond to progress bars.
`struct pacman_progress_bar` is a data structure for a progress bar.

In some cases (e.g. database downloads) we want to keep progress bars in order.
In some other cases (package downloads) we want to move completed items to the
top of the screen. Function `multibar_move_completed_up` allows to configure
such behavior.

Per discussion in the maillist we do not want to show download progress for
signature files.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
2020-05-09 11:58:39 +10:00
Anatol Pomozov
6a331af27f Implement multiplexed download using mCURL
curl_multi_download_internal() is the main loop that creates up to
'ParallelDownloads' easy curl handles, adds them to mcurl and then
performs curl execution. This is when the paralled downloads happens.
Once any of the downloads complete the function checks its result.
In case if the download fails it initiates retry with the next server
from payload->servers list. At the download completion all the payload
resources are cleaned up.

curl_multi_check_finished_download() is essentially refactored version of
curl_download_internal() adopted for multi_curl. Once mcurl porting is
complete curl_download_internal() will be removed.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
2020-05-09 11:58:21 +10:00
Anatol Pomozov
fa68c33fa8 Inline dload_payload->curlerr field into a local variable
dload_payload->curlerr is a field that is used inside
curl_download_internal() function only. It can be converted to a local
variable.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
2020-05-09 11:58:21 +10:00
Anatol Pomozov
dc98d0ea09 Add multi_curl handle to ALPM global context
To be able to run multiple download in parallel efficiently we need to
use curl_multi interface [1]. It introduces a set of APIs over new type
of handler 'CURLM'.

Create CURLM object at the application start and set it to global ALPM
context.

The 'single-download' CURL handle moves to payload struct. A new CURL
handle is created for each payload with intention to be processed by CURLM.

Note that curl_download_internal() is not ported to CURLM interface due
to the fact that the function will go away soon.

[1] https://curl.haxx.se/libcurl/c/libcurl-multi.html

Signed-off-by: Allan McRae <allan@archlinux.org>
2020-05-09 11:58:21 +10:00
Anatol Pomozov
a8a1a1bb3e Introduce alpm_dbs_update() function for parallel db updates
This is an equivalent of alpm_db_update but for multiplexed (parallel)
download. The difference is that this function accepts list of
databases to update. And then ALPM internals download it in parallel if
possible.

Add a stub for _alpm_multi_download the function that will do parallel
payloads downloads in the future.

Introduce dload_payload->filepath field that contains url path to the
file we download. It is like fileurl field but does not contain
protocol/server part. The rationale for having this field is that with
the curl multidownload the server retry logic is going to move to a curl
callback. And the callback needs to be able to reconstruct the 'next'
fileurl. One will be able to do it by getting the next server url from
'servers' list and then concat with filepath. Once the 'parallel download'
refactoring is over 'fileurl' field will go away.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
2020-05-09 11:58:21 +10:00
Allan McRae
e76ec94083 build-aux/update-copyright 2019 2020
Signed-off-by: Allan McRae <allan@archlinux.org>
2020-02-10 10:46:03 +10:00
Allan McRae
f37a3752b3 Update copyright years
make update-copyright OLD=2018 NEW=2019

Signed-off-by: Allan McRae <allan@archlinux.org>
2019-10-23 22:06:54 +10:00
Eli Schwartz
860e4c4943 Remove all modelines from the project
Many of these are pointless (e.g. there is no need to explicitly turn on
spellchecking and language dictionaries for the manpages by default).

The only useful modelines are the ones enforcing the project coding
standards for indentation style (and "maybe" filetype/syntax, but
everything except the asciidoc manpages and makepkg.conf is already
autodetected), and indent style can be applied more easily with
.editorconfig

Signed-off-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Allan McRae <allan@archlinux.org>
2018-05-14 09:59:15 +10:00
Allan McRae
b6bb8cb7dc Update coyrights for 2018
make update-copyright OLD=2017 NEW=201

Signed-off-by: Allan McRae <allan@archlinux.org>
2018-03-14 13:31:31 +10:00
Andrew Gregory
59bb21fce3 dload: ensure callback is always initialized once
Frontends rely on an initialization call for setup between downloads.
Checking for intialization after checking for a completed download can
skip initialization in cases where files are small enough to be
downloaded all at once (FS#56408).  Relying on previous download size
can result in multiple initializations if there are multiple
non-transfer events prior to the download starting (fS#56468).

Introduce a new cb_initialized variable to the payload struct and use it
to ensure that the callback is initialized exactly once prior to any
actual events.

Fixes FS#56408, FS#56468

Signed-off-by: Andrew Gregory <andrew.gregory.8@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
2018-01-06 12:59:32 +10:00
Allan McRae
1a2d5bee3b Update copyright years
Signed-off-by: Allan McRae <allan@archlinux.org>
2017-01-04 13:59:14 +10:00
Martin Kühne
e83e868a77 Parametrise the different ways in which the payload is reset
In FS#43434, Downloads which fail and are restarted on a different server
will resume and may display a negative download speed. The payload's progress
in libalpm was not properly reset which ultimately caused terminal noise
because the line width calculation assumes positive download speeds.

This patch fixes the incomplete reset of the payload by mimicing what
be_sync.c:alpm_db_update() does over in sync.c:download_single_file().
The new dload.c:_alpm_dload_payload_reset_for_retry() extends beyond the
current behavior by updating initial_size and prevprogress for this case.
This makes pacman reset the progress properly in the next invocation of the
callback and display positive download speeds.

Fixes FS#43434.

Signed-off-by: Martin Kühne <mysatyre@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
2016-12-05 15:20:08 +10:00
Ivy Foster
0d2ba870c9 Do not #define _RESERVED_IDENTIFIERS
Signed-off-by: Ivy Foster <ivy.foster@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
2016-09-25 18:04:57 +10:00
Allan McRae
4742f5929d Update copyright years for 2016
make update-copyright OLD=2015 NEW=2016

Signed-off-by: Allan McRae <allan@archlinux.org>
2016-01-04 13:27:08 +10:00
Allan McRae
2e48101999 Update copyright notices for 2015
Signed-off-by: Allan McRae <allan@archlinux.org>
2015-02-01 21:19:04 +10:00
Christian Hesse
d649dc669d dload: mark final_url as const
Signed-off-by: Allan McRae <allan@archlinux.org>
2014-10-19 20:48:40 +10:00
Florian Pritz
cd2370754a Remove ts and sw from vim modeline when noet is set
Forcing vim users to view files with a tabstop of 2 seems really
unnecessary when noet is set. I find it much easier to read code with
ts=4 and I dislike having to override the modeline by hand.

Command run:
find . -type f -exec sed -i '/vim.* noet/s# ts=2 sw=2##' {} +

Signed-off-by: Florian Pritz <bluewind@xinu.at>
Signed-off-by: Allan McRae <allan@archlinux.org>
2014-01-28 20:19:25 +10:00
Dan McGee
30740d9d2f Minor struct member reordering for packing concerns
Noticed using clang and `-Wpadded`.

Signed-off-by: Dan McGee <dan@archlinux.org>
Signed-off-by: Allan McRae <allan@archlinux.org>
2014-01-06 14:38:50 +10:00
Allan McRae
3bb3b1555a Update copyright years for 2014
Signed-off-by: Allan McRae <allan@archlinux.org>
2014-01-06 14:38:50 +10:00
Christian Hesse
3b3152fc50 dload: avoid renaming files downloaded via sync operations
If the server redirects from ${repo}.db to ${repo}.db.tar.gz pacman gets
this wrong: It saves to new filename and fails when accessing
${repo}.db.

We need the remote filename only when downloading remote files with
pacman's -U operation. This introduces a new field 'trust_remote_name'
to payload. If set pacman downloads to the filename given by the server.

The field trust_remote_name is set in alpm_fetch_pkgurl().

Fixes FS#36791 ([pacman] downloads to wrong filename with redirect).

[dave: remove redundant assignment leading to memory leak]

Signed-off-by: Allan McRae <allan@archlinux.org>
2013-09-18 14:28:03 +10:00
Dave Reisner
27067b1372 dload: pass back the effective URL to callers of _alpm_download
I suspect that eventually we're going to end up returning a pointer to
an allocated struct to describe the download result, but that's for
another patch when the need arises...

Fixes FS#33508.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Allan McRae <allan@archlinux.org>
2013-01-29 13:36:58 +10:00
Dave Reisner
132e1ac10c dload: avoid showing progress bars on some redirects
RFC 2616 doesn't forbid a 301 or 302 repsonse from having a body, and
servers exist in the wild that show this behavior. In order to prevent
pacman from showing a progress bar when we aren't actually downloading a
package (and merely following one of these pain in the butt redirects),
capture the server response code in the response header, rather than
waiting to peel it off the handle after the download has finished.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Reported-by: Alexandre Filgueira <alexfilgueira@cinnarch.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
2013-01-17 22:32:54 +10:00
Allan McRae
1dd3405813 Update copyright year for 2013
Signed-off-by: Allan McRae <allan@archlinux.org>
2013-01-03 12:03:09 +10:00
Allan McRae
326c6a8eed Update copyright years
Add 2012 to the copyright range for all libalpm and pacman source files.

Signed-off-by: Allan McRae <allan@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
2012-02-20 16:54:34 -06:00
Dan McGee
0b155677cf sync: extract build_payload() method from find_dl_candidates
This is done by both the delta and regular file code, so we can extract
a little helper method. Done mostly to satisfy my "why are we repeating
code here" itch.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-10-21 19:29:31 -05:00
Dave Reisner
b633985e60 dload: add pointer to server list for each payload
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-10-17 08:40:20 -05:00
Dan McGee
a33424f879 Merge branch 'maint' 2011-10-14 08:16:18 -05:00
Dan McGee
185cbb8a44 Add missing #ifdef around cURL error code in download struct
Thanks to Eduardo Tongson on the mailing list.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-10-14 07:38:58 -05:00
Dan McGee
5f3629bea0 Introduce alpm_time_t type
This will always be a 64-bit signed integer rather than the variable length
time_t type. Dates beyond 2038 should be fully supported in the library; the
frontend still lags behind because 32-bit platforms provide no localtime64()
or equivalent function to convert from an epoch value to a broken down time
structure.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-10-12 14:01:25 -05:00
Dave Reisner
ad8d3ceb89 move prevprogress onto payload handle
This is a poor place for it, and it will likely move again in the
future, but it's better to have it here than as a static variable.

Initialization of this variable is now no longer necessary as its
zeroed on creation of the payload struct.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-09-29 12:58:37 -05:00
Dan McGee
e0acf2f144 Refactor download payload reset and free
This was done to squash a memory leak in the sync database download
code. When we downloaded a database and then reused the payload struct,
we could find ourselves calling get_fullpath() for the signatures and
overwriting non-freed values we had left over from the database
download.

Refactor the payload_free function into a payload_reset function that we
can call that does NOT free the payload itself, so we can reuse payload
structs. This also allows us to move the payload to the stack in some
call paths, relieving us of the need to alloc space.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-09-28 13:01:03 -05:00
Dan McGee
9a58d5c6c5 Initialize cURL library on first use
Rather than always initializing it on any handle creation. There are
several frontend operations (search, info, etc.) that never need the
download code, so spending time initializing this every single time is a
bit silly. This makes it a bit more like the GPGME code init path.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-09-28 13:01:03 -05:00
Dan McGee
f66f9f11cd Fix memory leak in download payload->remote_name
In the sync code, we explicitly allocated a string for this field, while
in the dload code itself it was filled in with a pointer to another
string. This led to a memory leak in the sync download case.

Make remote_name non-const and always explicitly allocate it. This patch
ensures this as well as uses malloc + snprintf (rather than calloc) in
several codepaths, and eliminates the only use of PATH_MAX in the
download code.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-09-28 04:48:33 -05:00
Dan McGee
2e7d002315 Use off_t rather than double where possible
Beautiful of libcurl to use floating point types for what are never
fractional values. We can do better, and we usually want these values in
their integer form anyway.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-25 16:09:52 -05:00
Dave Reisner
d64c409913 dload: add open_mode to payload struct
This is a precursor to a following patch which will move the setting of
options to a separate function. With the open mode as part of the
struct, we can avoid modifying stack allocated variables.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-22 09:18:00 -05:00
Dave Reisner
592ed13bce dload: rename cd_filename to content_disp_name
This is more in line with the menagerie of file name members that we now
have on the payload struct.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-22 09:15:11 -05:00
Dave Reisner
329a7b7e24 dload: move tempfile and destfile to payload struct
These are private to the download operation already, so glob them onto
the struct. This is an ugly rename patch, with the only logical change
being that destfile and tempfile are now freed by the payload_free
function.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-22 09:14:45 -05:00
Dave Reisner
43940f591e dload: rename payload->filename to payload->remote_name
This is a far more accurate description of what this is, since it's more
than likely not really a filename at all, but the name after a final
slash on a URL.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-19 19:00:40 -05:00
Dave Reisner
24824b54ce dload: add 'unlink_on_fail' to payload struct
Let callers of _alpm_download state whether we should delete on fail,
rather than inferring it from context. We still override this decision
and always unlink when a temp file is used.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-19 09:43:19 -05:00
Dave Reisner
57eac093c4 absorb fileinfo struct into dload_payload
This transitional struct becomes delicious noms for dload_payload.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
2011-07-05 23:00:03 -04:00
Dave Reisner
3eec745910 absorb some _alpm_download params into payload struct
Restore some sanity to the number of arguments passed to _alpm_download
and curl_download_internal.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
2011-07-05 23:00:02 -04:00
Dave Reisner
6dc71926f9 lib/dload: prevent large file attacks
This means creating a new struct which can pass more descriptive data
from the back end sync functions to the downloader. In particular, we're
interested in the download size read from the sync DB. When the remote
server reports a size larger than this (via a content-length header),
abort the transfer.

In cases where the size is unknown, we set a hard upper limit of:

* 25MiB for a sync DB
* 16KiB for a signature

For reference, 25MiB is more than twice the size of all of the current
binary repos (with files) combined, and 16KiB is a truly gargantuan
signature.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
2011-07-05 22:58:55 -04:00
Dave Reisner
6c9b82e72a dload: handle irregular URLs
URLs might end with a slash and follow redirects, or could be a
generated by a script such as /getpkg.php?id=12345. In both cases, we
may have a better filename that we can write to, taken from either
content-disposition header, or the effective URL.

Specific to the first case, we write to a temporary file of the format
'alpmtmp.XXXXXX', where XXXXXX is randomized by mkstemp(3). Since this
is a randomly generated file, we cannot support resuming and the file is
unlinked in the event of an interrupt.

We also run into the possibility of changing out the filename from under
alpm on a -U operation, so callers of _alpm_download can optionally pass
a pointer to a *char to be filled in by curl_download_internal with the
actual filename we wrote to. Any sync operation will pass a NULL pointer
here, as we rely on specific names for packages from a mirror.

Fixes FS#22645.

Signed-off-by: Dave Reisner <d@falconindy.com>
2011-07-05 22:58:27 -04:00
Allan McRae
64c1cf7921 Rename pmhandle_t to alpm_handle_t
Signed-off-by: Allan McRae <allan@archlinux.org>
2011-06-28 14:04:00 +10:00
Dan McGee
e2aa952689 Move pm_errno onto the handle
This involves some serious changes and a very messy diff, unfortunately.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-06-13 19:38:38 -05:00