Back to Blog Archive

Use cases for the readUrl function in DataWeave

Posted on: August 28, 2018
Author:
James
Note: This blog is concerned with DataWeave version 1.0. DataWeave 2.0 is not backwards compatible and thus material from this blog might not apply to 2.0.

readUrl is a very useful function which has only recently been documented. It allows you to reuse DataWeave functions and scripts within one application, across applications and even across deployments. Furthermore, it allows you to read any DataWeave supported file type, be it for example JSON, XML or CSV.

Important Note: After adding the file to be read by readUrl, build the project in order for readUrl to work on Preview in AnyPoint Studio. Moreover, any changes to the file at runtime will not be reflected in subsequent readUrl calls. After testing readUrl on large files, we concluded that readUrl loads the same file each and every time from disk, without any memory caching. This means that readUrl is an expensive operation which should not be used on large files.

In this blog, I describe three use cases where readUrl can be used in DataWeave to create elegant solutions for common business problems.

A library of DataWeave functions

One clear use case for readUrl is for creating a library of DataWeave functions which can be re-used throughout applications. This widens the scope of what can be achieved using DataWeave. With libraries of DataWeave functions, it is much easier to implement, organize, maintain and test complex transformations.

A library in DataWeave consists of a number of functions in the header section, along with a mapping (in the form of an object) from the exposed function names to the functions themselves in the body section. library.dwl is an example DataWeave library containing 5 functions in the header section, with only 4 being exposed publicly in the body section.

The result returned from readUrl on a .dwl script is the body of the script (i.e. the evaluation of the part under ---). This is why the mapping to the functions is needed in the body to expose the functions described in the header of the script. The following is how the library library.dwl is ‘imported’ using readUrl and its functions used in a simple transformation:

So, myLibrary.substring returns the function substringFunc, to which we apply the parameters (payload.myString,3,6). Similarly, myLibrary.getUniqueKeys returns the function getUniqueKeys, to which the parameter payload.myObject is applied.

Furthermore, for larger libraries, we can categorize the exposed functions using a hierarchy of objects. For example:

These can be accessed as follows:

myLibrary.functionsOnArrays.subArray(payload,1,3)
myLibrary.functionsOnObjects.getKeys(payload)

Calling DataWeave scripts dynamically

readUrl can also be used to call transformation scripts dynamically. A typical use-case is in an ETL process, where multiple entities need to be transformed differently. For example, let’s say we need to simultaneously read three different type of files describing the entities Person, Location and Product, which need to be transformed differently. One way to handle this is by using Transform Message Processors in a Choice Router as follows:

In this way, we need one Transform Message Processor per entity and thus, one route per entity in the Choice router. This does not scale well if we have to eventually deal with a significant number of entities.

We can instead do away with the Choice Router and use one Transform Message Processor which uses the readUrl function to call .dwl scripts dynamically based on the entity’s name. This is done as follows:

As long as the .dwl script names match entity names, we can handle any number of entities without changing this flow.

Loading the contents of a fixed file in the middle of a flow

readUrl can be used to read the contents of a fixed file in the middle of a flow, without deleting the file.

Note: this is different from the mule-module-requester which, unlike readUrl, consumes the file itself (i.e. deletes it) and the consumed file will contain updates performed at runtime.

This may come handy when for example a file containing details of a fixed list of entities is used to enrich data. Consider for example the following CSV file of all countries:

Given only a country’s name in the payload, we might want to enrich the payload with details about the given country found in this CSV file. Using readUrl, we can read this file at runtime and use its contents to enrich the payload as follows:

Note: This might not be the most efficient way of performing this task. If processing time is of importance, in this case it would be better to cache the country details in memory rather than reading them each time from disk. As specified in the beginning of this blog, from experiments it seems that readUrl always reads files from disk (i.e. never caches in memory) and changes to the file at runtime are not reflected in the read file.

Final Note: Difference between read and readUrl

Unlike readUrl, The read function takes a string as input (rather than a URL) and evaluates it according to the given mime type. For example, the following transformation reads the java string payload as a CSV file and returns it in DataWeave’s canonical format:

We can also use read on DataWeave code (using application/dw as the mime type). In fact, if the mime type is not specified, read uses application/dw by default. This allows us to create DataWeave code dynamically, just like what can be done using the DW function in MEL.

Author:
James

Comments

Contact Us

Ricston Ltd.
Triq G.F. Agius De Soldanis,
Birkirkara, BKR 4850,
Malta
MT: +356 2133 4457
UK: +44 (0)2071935107

Send our experts a message

Need Help?
Ask our Experts!