CMD > Bash > DL biggest file (size in URL)

· EOG, like EOF


Table of Contents

² Find biggest number in URL to DL only big version of pics #

³ Possibility #1 : Well, usual suspects #

 1
 2urls="
 3https://cosplayrule34.com/images/a/1280/-213443757/301492376/457348840.webp
 4https://cosplayrule34.com/images/a/1280/-213443757/301492376/457348841.webp
 5https://cosplayrule34.com/images/a/604/-213443757/301492376/457348840.webp
 6https://cosplayrule34.com/images/a/604/-213443757/301492376/457348841.webp
 7"
 8
 9# Step 1: Extract all XXX numbers
10numbers=$(echo "$urls" | sed -E 's|.*/a/([0-9]+)/.*|\1|' | sort -n)
11
12# Step 2: Get the biggest number
13max=$(echo "$numbers" | tail -n1)
14
15# Step 3: Filter URLs that have the biggest number
16echo "$urls" | grep "/$max/"
17

³ Possibility #2 : Thought using tail in #1 doesnt always give biggest ... #

... so requested this new solution.
Sorry, of course tail -1 uses biggest from numerical order in sort -n

 1
 2# Extract XXX numbers and find the maximum numerically
 3max=0
 4for url in $urls; do
 5    size=$(echo "$url" | cut -d/ -f6)
 6    if (( size > max )); then
 7        max=$size
 8    fi
 9done
10
11# Filter URLs that have the biggest size
12for url in $urls; do
13    [[ $(echo "$url" | cut -d/ -f6) -eq $max ]] && echo "$url"
14done
15

³ Possibility #3 : Usin some fat awk construction #

. Group URLs by final file name (the foo.webp part, last field).
. Within each group, pick URLs with largest 6th field (the dimension, like 1280 vs 604).

 1
 2# Pick the URL with the biggest 6th field per file name
 3awk -F/ '{
 4    file=$NF           # group by filename
 5    dim=$6             # dimension
 6    if (dim > max[file]) {
 7        max[file]=dim
 8        url[file]=$0
 9    }
10}
11END {
12    for (f in url) print url[f]
13}' <<<"$urls"
14

³ Possibility #4 : Bash / coreutils solution using only cut, sort, uniq, grep (no awk). #

 1
 2for file in $(cut -d/ -fNF <<<"$urls" | sort -u); do
 3    # filter urls for this filename, sort by dimension, take biggest
 4    grep "/$file$" <<<"$urls" \
 5        | sort -t/ -k6,6n \
 6        | tail -1
 7done
 8
 9#=> https://cosplayrule34.com/images/a/1280/-213443757/301492376/457348840.webp
10#=> https://cosplayrule34.com/images/a/1280/-213443757/301492376/457348841.webp
11

³ Possibility Bonus: ready to download #

1
2# DL
3| tail -1 | xargs wget
4
5# Make url list
6done | tee selected_urls.txt
7
last updated: