Filling in more of the payload fields before passing to the downloader ensures
that the these fields do not get lost during sandboxed operations.
It also fixes the use of -U with XferCommand, but testsuite still fails due to
"404" page being downloaded for the signature. Given we can not identify this
as being a non-signature download with the XferCommand, we can just turn off
signature checking in this test.
Signed-off-by: Allan McRae <allan@archlinux.org>
Cache servers differ from regular servers in that they do not produce
warnings and are not removed from the server pool for "soft errors"
(i.e. the server was reachable, but the download failed) and they are
not used for databases. If a host is used for both a cache server and a
regular server, it may still be removed from the server pool for soft
errors that occur when used as cache server and removal from the server
pool for soft errors will not affect future attempted use as a cache
server.
Signed-off-by: Andrew Gregory <andrew.gregory.8@gmail.com>
archweb's download links all ended in /download. This cause all the temp
files to be named download.part. With parallel downloads this results in
multiple downloads to go to the same temp file and breaks the transaction.
Assign random temporary filenames to downloads from URLs that are either
missing a filename, or if the filename does not contain at least three
hyphens (as a well formed package filename does).
While this approach to determining when to use a temporary filename is
not 100% foolproof, it does keep nice looking download progress bar names
when a proper package filename is given. The only downside of not using
temporary files when provided with a filename with three or more hyphens
is URLs created specifically to bypass temporary filename usage can not
be downloaded in parallel. We probably do not want to download packages
from such URLs anyway.
Fixes FS#71464
Modified-by: Allan McRae (do not use temporary files for realish URLs)
Signed-off-by: Allan McRae <allan@archlinux.org>
Until now callee of ALPM download functionality has been in charge of
payload creation both for the main file (e.g. *.pkg) and for the accompanied
*.sig file. One advantage of such solution is that all payloads are
independent and can be fetched in parallel thus exploiting the maximum
level of download parallelism.
To build *.sig file url we've been using a simple string concatenation:
$requested_url + ".sig". Unfortunately there are cases when it does not
work. For example an archlinux.org "Download From Mirror" link looks like
this https://www.archlinux.org/packages/core/x86_64/bash/download/ and
it gets redirected to some mirror. But if we append ".sig" to the end of
the link url and try to download it then archlinux.org returns 404 error.
To overcome this issue we need to follow redirects for the main payload
first, find the final url and only then append '.sig' suffix.
This implies 2 things:
- the signature payload initialization need to be moved to dload.c
as it is the place where we have access to the resolved url
- *.sig is downloaded serially with the main payload and this reduces
level of parallelism
Move *.sig payload creation to dload.c. Once the main payload is fetched
successfully we check if the callee asked to download the accompanied
signature. If yes - create a new payload and add it to mcurl.
*.sig payload does not use server list of the main payload and thus does
not support mirror failover. *.sig file comes from the same server as
the main payload.
Refactor event loop in curl_multi_download_internal() a bit. Instead of
relying on curl_multi_check_finished_download() to return number of new
payloads we simply rerun the loop iteration one more time to check if
there are any active downloads left.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
All users of _alpm_download() have been refactored to the new API.
It is time to remove the old _alpm_download() functionality now.
This change also removes obsolete SIGPIPE signal handler functionality
(this is a leftover from libfetch days).
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Installing remote packages using its URL is an interesting case for ALPM
API. Unlike package sync ('pacman -S pkg1 pkg2') '-U' does not deal with
server mirror list. Thus _alpm_multi_download() should be able to
handle file download for payloads that either have 'fileurl' field
or pair of fields ('servers' and 'filepath') set.
Signature for alpm_fetch_pkgurl() has changed and it accepts an
output list that is populated with filepaths to fetched packages.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Multiplexed download requires ability to draw UI for multiple active progress
bars. To implement it we use ANSI codes to move cursor up/down and then
redraw the required progress bar.
`pacman_multibar_ui.active_downloads` field represents the list of active
downloads that correspond to progress bars.
`struct pacman_progress_bar` is a data structure for a progress bar.
In some cases (e.g. database downloads) we want to keep progress bars in order.
In some other cases (package downloads) we want to move completed items to the
top of the screen. Function `multibar_move_completed_up` allows to configure
such behavior.
Per discussion in the maillist we do not want to show download progress for
signature files.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
curl_multi_download_internal() is the main loop that creates up to
'ParallelDownloads' easy curl handles, adds them to mcurl and then
performs curl execution. This is when the paralled downloads happens.
Once any of the downloads complete the function checks its result.
In case if the download fails it initiates retry with the next server
from payload->servers list. At the download completion all the payload
resources are cleaned up.
curl_multi_check_finished_download() is essentially refactored version of
curl_download_internal() adopted for multi_curl. Once mcurl porting is
complete curl_download_internal() will be removed.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
dload_payload->curlerr is a field that is used inside
curl_download_internal() function only. It can be converted to a local
variable.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
To be able to run multiple download in parallel efficiently we need to
use curl_multi interface [1]. It introduces a set of APIs over new type
of handler 'CURLM'.
Create CURLM object at the application start and set it to global ALPM
context.
The 'single-download' CURL handle moves to payload struct. A new CURL
handle is created for each payload with intention to be processed by CURLM.
Note that curl_download_internal() is not ported to CURLM interface due
to the fact that the function will go away soon.
[1] https://curl.haxx.se/libcurl/c/libcurl-multi.html
Signed-off-by: Allan McRae <allan@archlinux.org>
This is an equivalent of alpm_db_update but for multiplexed (parallel)
download. The difference is that this function accepts list of
databases to update. And then ALPM internals download it in parallel if
possible.
Add a stub for _alpm_multi_download the function that will do parallel
payloads downloads in the future.
Introduce dload_payload->filepath field that contains url path to the
file we download. It is like fileurl field but does not contain
protocol/server part. The rationale for having this field is that with
the curl multidownload the server retry logic is going to move to a curl
callback. And the callback needs to be able to reconstruct the 'next'
fileurl. One will be able to do it by getting the next server url from
'servers' list and then concat with filepath. Once the 'parallel download'
refactoring is over 'fileurl' field will go away.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
Many of these are pointless (e.g. there is no need to explicitly turn on
spellchecking and language dictionaries for the manpages by default).
The only useful modelines are the ones enforcing the project coding
standards for indentation style (and "maybe" filetype/syntax, but
everything except the asciidoc manpages and makepkg.conf is already
autodetected), and indent style can be applied more easily with
.editorconfig
Signed-off-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Allan McRae <allan@archlinux.org>
Frontends rely on an initialization call for setup between downloads.
Checking for intialization after checking for a completed download can
skip initialization in cases where files are small enough to be
downloaded all at once (FS#56408). Relying on previous download size
can result in multiple initializations if there are multiple
non-transfer events prior to the download starting (fS#56468).
Introduce a new cb_initialized variable to the payload struct and use it
to ensure that the callback is initialized exactly once prior to any
actual events.
Fixes FS#56408, FS#56468
Signed-off-by: Andrew Gregory <andrew.gregory.8@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
In FS#43434, Downloads which fail and are restarted on a different server
will resume and may display a negative download speed. The payload's progress
in libalpm was not properly reset which ultimately caused terminal noise
because the line width calculation assumes positive download speeds.
This patch fixes the incomplete reset of the payload by mimicing what
be_sync.c:alpm_db_update() does over in sync.c:download_single_file().
The new dload.c:_alpm_dload_payload_reset_for_retry() extends beyond the
current behavior by updating initial_size and prevprogress for this case.
This makes pacman reset the progress properly in the next invocation of the
callback and display positive download speeds.
Fixes FS#43434.
Signed-off-by: Martin Kühne <mysatyre@gmail.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
Forcing vim users to view files with a tabstop of 2 seems really
unnecessary when noet is set. I find it much easier to read code with
ts=4 and I dislike having to override the modeline by hand.
Command run:
find . -type f -exec sed -i '/vim.* noet/s# ts=2 sw=2##' {} +
Signed-off-by: Florian Pritz <bluewind@xinu.at>
Signed-off-by: Allan McRae <allan@archlinux.org>
If the server redirects from ${repo}.db to ${repo}.db.tar.gz pacman gets
this wrong: It saves to new filename and fails when accessing
${repo}.db.
We need the remote filename only when downloading remote files with
pacman's -U operation. This introduces a new field 'trust_remote_name'
to payload. If set pacman downloads to the filename given by the server.
The field trust_remote_name is set in alpm_fetch_pkgurl().
Fixes FS#36791 ([pacman] downloads to wrong filename with redirect).
[dave: remove redundant assignment leading to memory leak]
Signed-off-by: Allan McRae <allan@archlinux.org>
I suspect that eventually we're going to end up returning a pointer to
an allocated struct to describe the download result, but that's for
another patch when the need arises...
Fixes FS#33508.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Allan McRae <allan@archlinux.org>
RFC 2616 doesn't forbid a 301 or 302 repsonse from having a body, and
servers exist in the wild that show this behavior. In order to prevent
pacman from showing a progress bar when we aren't actually downloading a
package (and merely following one of these pain in the butt redirects),
capture the server response code in the response header, rather than
waiting to peel it off the handle after the download has finished.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Reported-by: Alexandre Filgueira <alexfilgueira@cinnarch.com>
Signed-off-by: Allan McRae <allan@archlinux.org>
Add 2012 to the copyright range for all libalpm and pacman source files.
Signed-off-by: Allan McRae <allan@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
This is done by both the delta and regular file code, so we can extract
a little helper method. Done mostly to satisfy my "why are we repeating
code here" itch.
Signed-off-by: Dan McGee <dan@archlinux.org>
This will always be a 64-bit signed integer rather than the variable length
time_t type. Dates beyond 2038 should be fully supported in the library; the
frontend still lags behind because 32-bit platforms provide no localtime64()
or equivalent function to convert from an epoch value to a broken down time
structure.
Signed-off-by: Dan McGee <dan@archlinux.org>
This is a poor place for it, and it will likely move again in the
future, but it's better to have it here than as a static variable.
Initialization of this variable is now no longer necessary as its
zeroed on creation of the payload struct.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
This was done to squash a memory leak in the sync database download
code. When we downloaded a database and then reused the payload struct,
we could find ourselves calling get_fullpath() for the signatures and
overwriting non-freed values we had left over from the database
download.
Refactor the payload_free function into a payload_reset function that we
can call that does NOT free the payload itself, so we can reuse payload
structs. This also allows us to move the payload to the stack in some
call paths, relieving us of the need to alloc space.
Signed-off-by: Dan McGee <dan@archlinux.org>
Rather than always initializing it on any handle creation. There are
several frontend operations (search, info, etc.) that never need the
download code, so spending time initializing this every single time is a
bit silly. This makes it a bit more like the GPGME code init path.
Signed-off-by: Dan McGee <dan@archlinux.org>
In the sync code, we explicitly allocated a string for this field, while
in the dload code itself it was filled in with a pointer to another
string. This led to a memory leak in the sync download case.
Make remote_name non-const and always explicitly allocate it. This patch
ensures this as well as uses malloc + snprintf (rather than calloc) in
several codepaths, and eliminates the only use of PATH_MAX in the
download code.
Signed-off-by: Dan McGee <dan@archlinux.org>
Beautiful of libcurl to use floating point types for what are never
fractional values. We can do better, and we usually want these values in
their integer form anyway.
Signed-off-by: Dan McGee <dan@archlinux.org>
This is a precursor to a following patch which will move the setting of
options to a separate function. With the open mode as part of the
struct, we can avoid modifying stack allocated variables.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
This is more in line with the menagerie of file name members that we now
have on the payload struct.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
These are private to the download operation already, so glob them onto
the struct. This is an ugly rename patch, with the only logical change
being that destfile and tempfile are now freed by the payload_free
function.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
This is a far more accurate description of what this is, since it's more
than likely not really a filename at all, but the name after a final
slash on a URL.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
Let callers of _alpm_download state whether we should delete on fail,
rather than inferring it from context. We still override this decision
and always unlink when a temp file is used.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>