download_url()

一個 utility 函數，從 internet 下載檔案。

download_url(
    url: str, 
    root: str, 
    filename: Optional[str] = None, 
    md5: Optional[str] = None, 
    max_redirect_hops: int = 3) -> None

參數	說明
url	您要下載的檔案的完整 URL。它應該是一個有效 URL 的 String。URL 可以是檔案的直接連結、Google Drive 檔案的連結或 requires redirecting (重定向) 連結。
root	儲存下載檔案的目錄。如果該目錄不存在，則會建立該目錄。
filename	下載檔案的儲存名稱。如果未提供，則從 URL 的最後一段推斷檔案名稱。
md5	預期的 MD5 hash，用於驗證下載檔案的完整性。下載後，該函數會將檔案的 MD5 hash 與提供的值進行比較，以確保檔案在下載過程中沒有損壞或篡改。
max_redirect_hops	如果 URL 涉及 redirection，則處理允許的最大 redirect hops。這對於複雜的下載場景很有用，在這種情況下，URL 在到達最終下載連結之前可能會 redirect 到不同的位置。

Example 1: Basic Usage
這將下載檔案，並將其儲存在 ./data 目錄中，命名為 mydownload.txt。

from torchvision.datasets.utils import download_url

# Download a sample file
download_url("https://example.com/samplefile.txt", "./data", filename="mydownload.txt")

Example 2: Basic Usage

from torchvision.datasets.utils import download_url

# URL of the file to be downloaded
url = "https://example.com/sampledata.zip"

# Local directory to store the downloaded file
root = "./datasets"

# Optional custom filename
filename = "dataset.zip"

# Optional MD5 checksum for file verification
md5 = "e99a18c428cb38d5f260853678922e03"

# Download the file
download_url(url, root, filename, md5)

Example 3: Basic Usage

from pathlib import Path
from torchvision import datasets

# Download URL for the Dataset
url = "https://storage.googleapis.com/kaggle-data-sets/4046394/7034146/bundle/archive.zip?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20240118%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240118T141037Z&X-Goog-Expires=259200&X-Goog-SignedHeaders=host&X-Goog-Signature=3e9e70633bdd572747aa6dd9b0019ae93aec6db28d77cd499272bf6bd77aff6a7fbbc6f81cea2b46178c1531fedd9f8c815d08ab5669523b93ca78b030fb519f5ae3bbc89f61a283e6ae4771f2c89f9c331df701c7f13e2443de2df58808337674cc839b34599f40f82e0c31ffea56dc4279ef8911a6c2245faf48949ad09478c6dd8867f9cec4eccfca8c3af6dc48c3054dd968bd6ce82ae621c12b56e71f34ebc4ea65003ac7effbfad5896d56b835120b6a650b3cca53df01ceb3dc042ba744f625cfbf790254f4ea38f68929e8697a970fcf62821807bf8fca61358adbb0651044aac8e304f516a731fb55d859bd4025feb542332e1d8e6f31698e684101"

root = Path(__file__).parent
destination_dir = root / Path("dataset")

if not Path("archive.zip").exists():
    datasets.utils.download_url(url, 
                                root = destination_dir,
                                filename = "archive.zip")
    print("Download completed")

extract_archive()

一個 utility 函數，從各種 archive (存檔) 格式中提取檔案。Archive 類型和壓縮方法都會根據檔案名稱自動檢測。

extract_archive(
    from_path: str, 
    to_path: Optional[str] = None, 
    remove_finished: bool = False) -> str

參數	說明
from_path	指定需要解壓縮的 archive 檔案的路徑。該路徑應指向您要提取的 archive 檔案 (例如: `.zip`、`.tar`、`.tar.gz` 等)。
to_path	指定將提取 archive 內容的目標目錄。如果未提供，則預設將內容提取到與 archive 檔案相同的目錄中。
remove_finished	提取 archive 檔案內容後，是否應該刪除它。如果您想在解壓縮後刪除 archive 檔案，請設定為 `True`。

Example 1: Basic Usage
將位於 ./data 目錄中的 dataset.zip 的內容，提取到新目錄 ./data/extracted 中。解壓縮後，它將刪除 dataset.zip 檔案。

from torchvision.datasets.utils import extract_archive

# Path to the archive file
archive_file = './data/dataset.zip'

# Destination directory for extracted contents
destination_dir = './data/extracted'

# Extract the archive and then remove the archive file
extract_archive(from_path=archive_file, to_path=destination_dir, remove_finished=True)

download_and_extract_archive()

一個 utility 函數。它結合了從 URL 下載檔案然後提取檔案的功能。

download_and_extract_archive(
    url: str,
    download_root: str,
    extract_root: Optional[str] = None,
    filename: Optional[str] = None,
    md5: Optional[str] = None,
    remove_finished: bool = False) -> None

參數	說明
url	要下載的 URL 存檔。它應該是一個有效 URL 的 String。URL 可以是檔案的直接連結、Google Drive 檔案的連結或 requires redirecting (重定向) 連結。
download_root	儲存下載檔案的目錄。如果該目錄不存在，則會建立該目錄。
extract_root	指定將提取 archive 內容的目標目錄。如果未提供，則預設將內容提取到與 archive 檔案相同的目錄中。
filename	下載檔案的儲存名稱。如果未提供，則從 URL 的最後一段推斷檔案名稱。
md5	預期的 MD5 hash，用於驗證下載檔案的完整性。下載後，該函數會將檔案的 MD5 hash 與提供的值進行比較，以確保檔案在下載過程中沒有損壞或篡改。
remove_finished	提取 archive 檔案內容後，是否應該刪除它。如果您想在解壓縮後刪除 archive 檔案，請設定為 `True`。

此函數調用函數 download_url()，使用給定的 filename 和 md5 將檔案從 url 下載到 download_root 目錄。最後，調用函數 extract_archive() 將 archive 解壓到 extract_root 目錄中。

Example 1: Basic Usage
從提供的 URL 將 archive 下載到 ./data 目錄，然後將其內容提取到 ./data/extracted 目錄中。

from torchvision.datasets.utils import download_and_extract_archive

# URL of the archive
url = "https://example.com/datasets/sample_dataset.zip"

# Path where the archive will be downloaded
download_root = "./data"

# Path where the archive will be extracted
extract_root = "./data/extracted"

# Download and extract the archive
download_and_extract_archive(url, download_root, extract_root)

Example 2: Basic Usage

from pathlib import Path
from torchvision.datasets.utils import download_and_extract_archive
# Download URL for the Dataset
url = "https://storage.googleapis.com/kaggle-data-sets/4046394/7034146/bundle/archive.zip?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20240118%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240118T141037Z&X-Goog-Expires=259200&X-Goog-SignedHeaders=host&X-Goog-Signature=3e9e70633bdd572747aa6dd9b0019ae93aec6db28d77cd499272bf6bd77aff6a7fbbc6f81cea2b46178c1531fedd9f8c815d08ab5669523b93ca78b030fb519f5ae3bbc89f61a283e6ae4771f2c89f9c331df701c7f13e2443de2df58808337674cc839b34599f40f82e0c31ffea56dc4279ef8911a6c2245faf48949ad09478c6dd8867f9cec4eccfca8c3af6dc48c3054dd968bd6ce82ae621c12b56e71f34ebc4ea65003ac7effbfad5896d56b835120b6a650b3cca53df01ceb3dc042ba744f625cfbf790254f4ea38f68929e8697a970fcf62821807bf8fca61358adbb0651044aac8e304f516a731fb55d859bd4025feb542332e1d8e6f31698e684101"

root = Path(__file__).parent
destination_dir = root / Path("dataset")

if not Path("archive.zip").exists():
    download_and_extract_archive(url, 
                                 download_root = root, 
                                 extract_root = destination_dir, 
                                 filename = "archive.zip")
    print("Download completed")

list_dir()

一個 utility 函數。列出目錄中的目錄。

list_dir(root: str, 
         prefix: bool = False) -> List[str]:

參數	說明
root	將在其中搜尋子目錄的目錄路徑。
prefix	決定返回目錄名稱，還是完整路徑。如果為 `True`，則返回每個目錄的完整路徑。如果為 `False`，則返回目錄名稱。

list_dir() 和 list_files() 目的在幫助使用者在檔案系統中瀏覽目錄和檔案。

這些函數只會列出直接位於 root‵ 中的目錄/檔案，而不列出 `root‵ 中的子目錄中的目錄/檔案。

Example 1: Basic Usage
list_dir 函數用於列出 data/images 目錄內的所有子目錄。透過將 prefix 設為 True，該函數將返回這些子目錄的完整路徑。

from torchvision.datasets.utils import list_dir

# Specify the root directory
root_directory = 'data/images'

# Call the function to list all subdirectories with their full paths
subdirectories = list_dir(root=root_directory, prefix=True)

# Print the list of subdirectories
for subdir in subdirectories:
    print(subdir)

list_files()

一個 utility 函數。列出目錄中與特定後綴相符的檔案。

list_files(root: str, 
           suffix: str, 
           prefix: bool = False) -> List[str]:

參數	說明
root	將在其中搜尋檔案的目錄路徑。
suffix	尋找以此後綴結尾的檔案，使搜尋特定於檔案類型。這可以是單一 string (例如: `.png`) 或一個 tuple of strings (例如: `('.jpg', '.jpeg', '.png')`)。
prefix	決定返回檔案名稱，還是每個檔案的完整路徑。如果為 `True`，將返回每個檔案的完整路徑 (例如: `images/photo.png`)。如果為 `False`，則僅返回檔案名稱 (例如: `photo.png`)。

Example 1: Basic Usage
在此範例中，list_files 函數用於在 dataset/images 目錄中搜尋副檔名為 .jpg、.jpeg 和 .png 的影像檔案。 prefix=True 參數確保傳回每個檔案的完整路徑。

from torchvision.datasets.utils import list_files

# Directory to search in
directory = 'dataset/images'

# Types of files to search for
file_types = ('.jpg', '.jpeg', '.png')

# Call the function with parameters
files = list_files(root=directory, suffix=file_types, prefix=True)

# Print the results
for file in files:
    print(file)

Example 2: Basic Usage

from torchvision.datasets.utils import list_files
from PIL import Image

# List and process images
image_files = list_files('images', '.png')

for file in image_files:
    with Image.open(file) as img:
        # Perform operations on the image, e.g., resizing, filtering
        img_resized = img.resize((100, 100))
        img_resized.show()