Map/Reduce Script Stages
The map/reduce script type goes through at least two of five possible stages.
The stages are processed in the following order. Note that each stage must complete before the next stage begins.
-
Get Input Data – Acquires a collection of data. This stage is always processed first and is required. The input stage runs sequentially.
-
Map – Parses each row of data into a key-value pair. One pair (key-value) is passed per function invocation. If this stage is skipped, the reduce stage is required. Data may be processed in parallel in this stage.
-
Shuffle – Groups values based on keys. This is an automatic process that always follows completion of the map stage. There is no direct access to this stage as it is handled by the map/reduce script framework. Data is processed sequentially in this stage.
-
Reduce – Evaluates the data in each group. One group (key-values) is passed per function invocation. If this stage is skipped, the map stage is required. Data is processed in parallel in this stage.
-
Summarize – Summarizes the output of the previous stages. Developers can use this stage to summarize the data from the entire map/reduce process and write it to a file or send an email. This stage is optional and is not technically a part of the map/reduce process. The summarize stage runs sequentially.
It is not required to use both the map stage and the reduce stage. You may skip one of those stages.
The following diagram illustrates these stages, in the context of processing a set of invoices.
In this example, the stages are used as follows:
-
Get Input Data – A collection of invoices that require payment is loaded.
-
Map – Each invoice is paired to the customer expected to pay it. The key-value pairs are returned, where customerID is the key and the invoice is the value. For five invoices, the map is invoked five times.
-
Reduce – There are three unique groups of invoices based on customerID. For three groups, reduce is invoked three times. To create a customer payment for every group, custom logic iterates over each group using customerID as the key.
-
Summarize – Custom logic fetches various metrics (for example, number of invoices paid) and sends the output as an email notification.
For a code sample similar to this example, see Processing Invoices Example.
Passing Data to a Map or Reduce Stage
To prevent unintended alteration of data when it is passed between stages, key-value pairs are always serialized into strings. For map/reduce scripts, SuiteScript 2.x checks if the data passed to the next stage is a string, and uses JSON.stringify()
to convert the key or value into a string as necessary.
Objects serialized to JSON remain in JSON format. To avoid possible errors, SuiteScript does not automatically deserialize the data. For example, an error might result from an attempt to convert structured data types (such as CSV or XML) that are not valid JSON. At your discretion, you can use JSON.parse()
to convert the JSON string back into a native JS object.
The character limit for keys in map/reduce scripts (specifically, in mapContext or reduceContext objects) is 3,000 characters. In addition, error messages are returned when a key is longer than 3,000 characters or a value is larger than 10 MB. Keys longer than 3,000 characters will return the error KEY_LENGTH_IS_OVER_3000_BYTES. Values larger than 10 MB will return the error VALUE_LENGTH_IS_OVER_10_MB.
If you have map/reduce scripts that use the mapContext.write(options) or reduceContext.write(options) methods, make sure that key strings are shorter than 3,000 characters and value strings are smaller than 10 MB. Make sure that you consider the potential length of any dynamically generated strings, which may exceed these limits. You should also avoid using keys, instead of values, to pass your data.