vCenter content to S3 sync
Ansible playbook that automates the process of downloading items from a vCenter content library and subsequently uploading them to an AWS S3 bucket.
Why though ? In VMware infinite wisdom you can setup S3 as your content library that is then synced to your local storage. However ! this is one way street, you can't sync stuff back up to S3. So you added some templates and need to add them back to S3 what's your options ? You need to download it from content library, convert and upload... not to mention the download function is so crap is not even funny. It tends to fail a lot, you can't do multiple downloads as well. The main issue is that vCenter needs to "prepare" the download and send "keep-alive" signals to vCenter or the download dies.
I made Ansible playbook that have the "secret sauce" a python script that basically interact with vCenter API and request the download, create separate thread for keep alive signal and saves the file... it can pick certain files to download or the whole library.
Ansible vCenter content library to AWS S3 Migrator
This repository contains an Ansible playbook that automates the process of downloading items from a vCenter content library and subsequently uploading them to an AWS S3 bucket.
Features
- Retrieves items from a specified vCenter content library.
- Downloads each item from the library to the local system.
- Uploads the downloaded items to a specified AWS S3 bucket.
- Updates the Content Library .json using a Python script.
- Cleanup
Pre-requisites
- Ansible 2.9+
- AWS CLI installed and configured (Or part of the runtime environment)
- Hashicorp Vault with vCenter credentials
- Python 3.x
- Enough disk space for downloads
Configuration
AWX Template accepts following parameters and default values:
-
content_library_name: "Customer Content Library"
-
s3_bucket_name: "s3-use1-sddc-dev-bucket"
-
s3_bucket_folder: "Content-Library"
-
content_library_name: Name of the content library on targeted vCenter
-
s3_bucket_name: Targeted S3 bucket where the files will be synced
-
s3_bucket_folder: Name of folder in s3_bucket_name where the content library will be synced to.
Playbook also expect vcenter and AWS credentials (Provided from AWX credentials store)
Does not require privilege escalation.
This might not work for you out of the box, but it might help you get it working, since there simply is no other automation that would do this.
Scripts
Look how the ansible is executed and the role used. you can find two scripts in the role.
make_vcsp_2018.py
This is official script from VMware to generate needed json file for S3 bucket to be able used as content library source. I haven't touched this one, it just work.
download.py
This baby makes the magic happen, create two processes, one downloads the file and second one is constantly telling vCenters API "Do not fucking die!"
import requests
import time
import os
import threading
requests.packages.urllib3.disable_warnings()
# define the download function as a separate thread
def download_file(url, dest, timeout):
global download_finished
try:
MB = 1024 * 1024 # 1MB in bytes
chunk_size_mb = 512 * MB # change this to change chunk size
response = requests.get(url, stream=True, timeout=timeout, verify=False) # ignore SSL errors)
response.raise_for_status() # Raise an exception if the GET request failed
with open(dest, 'wb') as file:
for chunk in response.iter_content(chunk_size=chunk_size_mb):
file.write(chunk)
except Exception as e:
print(f"An error occurred during download: {str(e)}")
download_finished = True
exit(1) # exit the script with non-zero status
finally:
download_finished = True
# define the keep-alive function
def keep_alive(vmware_host, download_session_data, login_json, client_token, keepalive_progress):
url = f"https://{vmware_host}/api/content/library/item/download-session/{download_session_data}?action=keep-alive"
headers = {
'vmware-api-session-id': login_json,
'client_token': client_token
}
data = {"progress": keepalive_progress}
while not download_finished:
requests.post(url, headers=headers, json=data, verify=False) # ignore SSL errors)
time.sleep(10)
# define variables from Ansible
modified_download_url = os.getenv('MODIFIED_DOWNLOAD_URL')
local_temp_directory = os.getenv('LOCAL_TEMP_DIRECTORY')
folder_name = os.getenv('FOLDER_NAME')
item_name = os.getenv('ITEM_NAME')
local_download_temp_directory = os.getenv('LOCAL_DOWNLOAD_TEMP_DIRECTORY')
vmware_host = os.getenv('VMWARE_HOST')
download_session_data = os.getenv('DOWNLOAD_SESSION_DATA')
login_json = os.getenv('LOGIN_JSON')
client_token = os.getenv('CLIENT_TOKEN')
keepalive_progress = os.getenv('KEEPALIVE_PROGRESS')
# start the download as a separate thread
dest = f"{local_temp_directory}/{folder_name}/{item_name}"
download_finished = False
# start the download as a separate thread
download_thread = threading.Thread(target=download_file, args=(modified_download_url, dest, 900))
download_thread.start()
# start sending keep-alive messages
keep_alive_thread = threading.Thread(target=keep_alive, args=(vmware_host, download_session_data, login_json, client_token, keepalive_progress))
keep_alive_thread.start()
# wait for the download to finish
download_thread.join()
keep_alive_thread.join() # Ensure the keep-alive thread also finishes