- Removing task introspection with
quote
to save time running your code. - Writing task results to cloud storage such as S3 using a block to save memory.
- Saving data to disk within a flow rather than using results.
- Caching task results to save time and compute.
- Compressing results written to disk to save space.
- Using a task runner for parallelizable operations to save time.
Remove task introspection
When a task is called from a flow, each argument is introspected by Prefect, by default. To speed up your flow runs, disable this behavior for a task by wrapping the argument usingquote
.
Here’s a basic example that extracts and transforms some New York taxi data:
et_quote.py
quote
reduces execution time at the expense of disabling task dependency
tracking for the wrapped object.
Write task results to cloud storage
By default, the results of task runs are stored in memory in your execution environment. This behavior makes flow runs fast for small data, but can be problematic for large data. Save memory by writing results to disk. In production, it’s recommended to write results to a cloud provider storage such as AWS S3. Prefect lets you use a storage block from a Prefect Cloud integration library such as prefect-aws to save your configuration information. Learn more about blocks. Install the relevant library, register the block type with the server, and create your block. Then reference the block in your flow:notest
Save data to disk within a flow
To save memory and time with big data, you don’t need to pass results between tasks. Instead, write and read data to disk directly in your flow code. Prefect has integration libraries for each of the major cloud providers. Each library contains blocks with methods that make it convenient to read and write data to and from cloud object storage. The moving data guide has step-by-step examples for each cloud provider.Cache task results
Caching saves you time and compute by allowing you to avoid re-running tasks unnecessarily. Note that caching requires task result persistence. Learn more about caching.Compress results written to disk
If you’re using Prefect’s task result persistence, save disk space by compressing the results. Specify the result type withcompressed/
prefixed:
notest