Sparkplug
Sparkplug is a helper service designed to be used for image scraping. Essentially, you provide the service with some URLs that point at images. The service will download them, convert them to any formats you want, and present them for access via HTTP.
Usage
Sparkplug has two parts - the server and the worker. The server is responsible for responding to API requests(enquing images) and serving images that are requested. The worker will download any enqueued images, convert them, and mark them as scraped. It is meant to be called from cron.
A typical image enqueue request will look like this:
POST /v1/enqueue
{
"image_url": "https://www.google.com/favicon.ico",
"key": "shh, it's a secret!"
}
The response will look like this:
{
"scraped": false,
"uuid": "3336f8d5-f1de-4986-9175-b4fbcbc2e8c2",
}
If we let the worker run, and make the same request again, the response will look like this:
{
"scraped": true,
"uuid": "3336f8d5-f1de-4986-9175-b4fbcbc2e8c2",
"formats": {
"preview": "https://example.com/content/3336f8d5-f1de-4986-9175-b4fbcbc2e8c2_preview.ico",
"original": "https://example.com/content/3336f8d5-f1de-4986-9175-b4fbcbc2e8c2_original.ico"
}
}
This shows that the image has been scraped. It also provides the URLs of the available formats.
Here's the configuration we'd need for the above example:
database_url = "postgres:///sparkplug" # local postgres server, database name is sparkplug
image_directory = "public" # Where we store images
key = "shh, it's a secret!" # this must match in enqueue requests, or they will be ignored
server_name = "https://example.com/content/" # used when generating the image URLs
port = 1234 # port that the server listens on
max_retries = 3 # maximum retries before the worker gives up on downloading a specific image
max_size_bytes = 10_000_000 # maximum image size
[[formats]]
name = "original" # no constraints - image will not be changed
[[formats]]
name = "preview"
width = 300 # scale the image to be 300 pixels wide, while maintaing aspect ratio
The other available constraint is height
, which behaves just like width
. If both width
and height
are defined, the image will be scaled to be as large as possible without its width or height being larger than the given constraints. The image will preserve its aspect ratio.