Dataset API
This document assumes you are already familiar with the Dataset API Configurator. You should only consider this document if you plan to develop your own web interface or need to construct very complex data calls not supported by the web interface.
The recommended way is to configure a call in the web-interfaces and then export the generated call definition using Output format
-> API query
.
Commonly meteoblue APIs use simple URL GET parameters like http://my.meteoblue.com/packages/basic-1?lat=47.2&lon=9.6...
. This approach is unfortunately not sufficient to query datasets dynamically. Instead a HTTP JSON request body is used:
{
"units": {
"temperature": "C",
"velocity": "km/h",
"length": "metric",
"energy": "watts"
},
"geometry": {
"type": "MultiPoint",
"coordinates": [[7.57327,47.558399,279]], // lon, lat, asl
"locationNames": ["Basel"]
},
"format": "json",
"timeIntervals": [
"2019-01-01T+00:00/2019-12-31T+00:00"
],
"queries": [{
"domain": "NEMSGLOBAL",
"gapFillDomain": null,
"timeResolution": "hourly",
"codes": [{
"code": 157,
"level": "180-0 mb above gnd"
}]
}]
}
This call can be executed with the command line tool curl:
curl \
-L -H "Content-Type: application/json" \
-d '{"units":{"temperature":"C","velocity":"km/h","length":"metric","energy":"watts"},"geometry":{"type":"MultiPoint","coordinates":[[7.57327,47.558399,279]],"locationNames":["Basel"]},"format":"json","timeIntervals":["2019-01-01T+00:00/2019-12-31T+00:00"],"timeIntervalsAlignment":"none","queries":[{"domain":"NEMSGLOBAL","gapFillDomain":null,"timeResolution":"hourly","codes":[{"code":157,"level":"180-0 mb above gnd"}]}]}' \
"http://my.meteoblue.com/dataset/query?apikey=APIKEY"
Many web-development tools like Insomnia REST client support using JSON bodies. Alternatively, the JSON query can be encoded into the URL. This will result in long URLs and quickly hit maximum URL length limits.
All calls to meteoblue APIs require a valid API key. Please contact [email protected] for more information.
More complex calls might also be declined to be executed directly and require the use of job queues. The next chapter will explain job queues in more detail.
To use the dataset API with Python, we recommend to use the meteoblue-dataset-sdk Python module. This library simplifies access to the dataset API and transparently implements job queues and decoding of data using protobuf.
JSON Query Structure
The JSON body uses various structures and arrays that are nested to build complex queries with recursive transformations. All JSON attributes are case-sensitive and use camel-case names. As in the example above, the outer JSON structure contains properties like units
, geometry
, timeIntervals
or queries
.
The following tables describe all properties and how they are integrated with other structures. Some properties address special use-cases that are not available in the web-interfaces. For completeness all API properties are documented in the next chapters.
Property | Type | Description |
---|---|---|
units | Structure: Units | Option to select units like Fahrenheit |
geometry | Structure: GeoJSON | Select polygon or points |
format | String enumeration: Format | Which output format to use |
timeIntervals | Array of Structure: TimeInterval | Define time intervals to read |
timeIntervalsAlignment | String enumeration: Alignment | How multiple time-intervals are aligned in charts |
queries | Array of Structure: Query | Per dataset queries |
oneTimeIntervalPerGeometry | Boolean | See below |
checkOnly | Boolean | Only calculate amount of required datapoints |
runOnJobQueue | Boolean | Execute this job on a queue, instead directly |
Per default API will return a data-series for each time-interval times the number of geometries. 10 elements in timeIntervals
and 20 coordinates in geometries
return 200 data-series.
If oneTimeIntervalPerGeometry
is set to true and a GeoJSON GeometryCollection
is used, the first geometry will use the first time-interval, then the second geometry and the second time-interval and so on. This is used to return for each coordinate, different time-intervals. In the web-interfaces this used in Coordinates and time
mode. An example call is available in the GeoJSON description below.
If checkOnly
is set to true
, the API will only calculate how many data points must be processed and whether a job queue must be used. runOnJobQueue
would then be required to submit the call to a job queue. More information can be found in the last chapter about job queues.
Units
If units are not set, the defaults are Celsius, km/h, metric and watts
Property | Type | Description |
---|---|---|
temperature | String | celsius or fahrenheit |
velocity | String | km/h , m/s , mph , kn or bft |
length | String | metric or imperial |
energy | String | watts or joules |
GeoJSON Geometry
Please make sure to provide all input coordinates in the correct order: "lon" -> "lat" (-> "asl")
The geometry
structure is based on GeoJSON, but extended to support features like geoname polygon id, location names and additional attributes. A geometry
could also be of type GeometryCollection
to select multiple geometries (this can be used in conjunction with oneTimeIntervalPerGeometry
).
Depending on the feature type
different geometries can be used.
Point and MultiPoint
{
"type": "Point",
"coordinates": [8.6, 47.5, 351.1] // lon, lat, asl
}
{
"type": "MultiPoint",
"coordinates": [[8.6, 47.5,351.1], [8.55, 47.37, 429]], // lon, lat, asl
"locationNames": ["Basel", "Zürich"]
}
Coordinates are defined as tuple of longitude, latitude and elevation above sea level. Elevation is optional and will be automatically resolved from an 80 m resolution digital elevation model (DEM). locationNames
can be optionally specified and will be replicated in the output.
The order of coordinates will be preserved in the output.
The coordinates of the output refer to the center of the relevant grid cell and therefore do not necessarily correspond to the input coordinates (unless the latter happen to be identical to the grid cell's center).
To ensure that the desired coordinates can be found (if no index table is available), the mode
can be used to select the preference for how the grid point is selected for Point and MultiPoint requests.
{
"type": "MultiPoint",
"coordinates": [[8.6, 47.5, 351.1], [8.55, 47.37, 429]], // lon, lat
"mode": "preferLandWithMatchingElevation" // default value
}
Four mode
-options can be chosen in the query:
Property | Description |
---|---|
preferLandWithMatchingElevation | Closest vertical distance |
preferSea | Considers grid points over the sea |
nearest | Closest horizontal distance |
includeNeighbours | Combination of all 4 grid points closest in horizontal distance |
The default grid selection mode
is preferLandWithMatchingElevation
, it will evaluate the four closest grid points and select the one that best matches the desired criteria, otherwise the nearest.
Caution: If a specific mode
is selected, the output may deviate from the desired criteria, for example: If nearest
is selected in a valley, the height at which the closest eligible grid point is located may differ greatly from that of the input coordinates.
Polygon and MultiPolygon
{
"type": "Polygon",
"coordinates": [
[[7.5,47.5],[7.5,47.6],[7.7,47.6],[7.7,47.5],[7.5,47.5]] // lon, lat
]
}
{
"type": "MultiPolygon",
"coordinates": [
[[[8.0,47.4],[8.0,47.6],[8.2,47.6],[8.2,47.4],[8.0,47.4]]], // lon, lat
[[[7.5,47.5],[7.5,47.6],[7.7,47.6],[7.7,47.5],[7.5,47.5]]] // lon, lat
],
"excludeSeaPoints": true,
"fallbackToNearestNeighbour": true
}
The first and last coordinate must be the same. Please make sure to supply a valid polygon without self-intersections.
The optional Boolean parameter excludeSeaPoints
can be set to true, to ignore grid-cells that are located on the sea.
If no grid-cells are within the polygon, the result would be empty. If fallbackToNearestNeighbour
is set to true, the result will select the nearest neighbour grid-cell instead.
Geoname Polygon
Administrative areas in the web-interfaces are based the geonames polygon database. To keep calls short and not always include the full GeoJSON polygon for each administrative area, the API can directly get a polygon from a database. Once the polygon is loaded from the database, be behavior is identical to a regular polygon API call.
{
"type": "GeonamePolygon",
"geonameid": 2345235
}
Multiple geoname polygons can also be selected in one call. Internally polygons get merged into a single polygon. If the transformation Aggregate all grid-cells
would now be used, all grid-cells of both administrative areas would be aggregated to a single data-series.
{
"type": "MultiGeonamePolygon",
"geonameids": [2345235, 312453]
}
Parameters excludeSeaPoints
and fallbackToNearestNeighbour
are also considered, if set.
Geometry Collection
Multiple geometries can also be processed in one call instead of calling the API multiple times. If the GeoJSON type GeometryCollection
is used, the API will process one geometry after another.
The previous MultiGeonamePolygon
call could be split into a collection like:
{
"type": "GeometryCollection",
"geometries": [
{ "type": "GeonamePolygon", "geonameid": 2345235 },
{ "type": "GeonamePolygon", "geonameid": 312453 }
]
}
It is important to notice, that for a GeometryCollection
all transformation are applied individually. The transformation Aggregate all grid-cells
will only aggregate grid-cell in one geometry of a geometry collection. This can be used to select multiple administrative areas in a country, use the transformation Aggregate all grid-cells
and retrieve one index for each area individually. In the example above, two data-series would be returned.
Alternatively, GeometryCollection
with the parameter oneTimeIntervalPerGeometry
allows you to select different time-intervals for each geometry. It is used in the web-interface for the coordinates and time selection mode. For the first coordinate, the first time-interval will be used, for the second coordinate the second time-interval will be used, and so on.
{
"oneTimeIntervalPerGeometry": true,
"geometry": {
"type": "GeometryCollection",
"geometries": [
{ "type": "Point", "coordinates": [8.6, 47.5, 351.1] } // lon, lat, asl
{ "type": "Point", "coordinates": [8.55, 47.37, 429] } // lon, lat, asl
]
},
"timeIntervals": [
"2015-05-05T+00:00/2016-06-06T+00:00",
"2015-05-03T+00:00/2016-06-01T+00:00"
]
}
Output Format
The attribute format
accepts the following values:
json
: Recommended JSON format (default, if not set)csv
: CSV format for large amount of locationscsvTimeOriented
: CSV format for long time-rangescsvIrregular
: CSV format for mixed time-intervals and locationsxlsx
: XLSX format for large amount of locationsxlsxTimeOriented
: XLSX format for long time-rangexlsxIrregular
: XLSX format for mixed time-intervals and locationshighcharts
: JSON output to create a highcharts graphhighchartsHtml
: HTML page that embeds the highcharts library and the chartgeoJson
: JSON output to create map with bullet pointsgeoJsonHtml
: HTML page that embeds a map library and the map jsonkml
: KML format that only includes the grid cell coordinatesnetCDF
: Recommended binary format for further scientific data analysis
Detailed information about the structure of each format can be found here in the previous format chapter.
Time Intervals
Time intervals and timezones can be specified using the ISO8601 format. The timeIntervals
attribute is an array of ISO8601 strings. Per default the web-interfaces generate time-intervals with a timezone offset, but without specifying the hour and minute.
{
"timeIntervals": [
"2015-05-01T+00:00/2015-05-02T+00:00",
"2016-05-01T+00:00/2016-05-02T+00:00"
]
}
In the intervals above, 2 full days are selected. For hourly data, the API would return 48 hourly values for each time interval. In the API syntax time-intervals could be specified to select exactly 1 hour:
{
"timeIntervals": [
"2019-01-01T00:00+00:00/2019-01-01T01:00+00:00"
]
}
Datasets and Variables
The selection of datasets and variables is specified in the attribute queries
as an array to select multiple datasets. For each dataset, specified by the domain
attribute, multiple weather variable codes can then be selected.
In this example, three variables are selected from NEMSGLOBAL
and than transformed with two transformations. In the same call, data can be selected from the dataset NEMS12
and transformed individually.
{
"queries": [
{
"domain": "NEMSGLOBAL",
"gapFillDomain": null,
"timeResolution": "hourly",
"codes": [
{"code": 11, "level": "2 m above gnd"},
{"code": 52, "level": "2 m above gnd"},
{"code": 157, "level": "180-0 mb above gnd"}
],
"transformations": [
{
"type": "valueIsAbove",
"valueMin": 30,
"returnClassification": "zeroOrOne"
},
{
"type": "aggregateTimeInterval",
"aggregation": "mean"
}
]
},
{
"domain": "NEMS12",
"gapFillDomain": null,
"codes": [ ... ],
"transformations": [...]
}
]
}
Attributes for the structure query
:
Property | Type | Description |
---|---|---|
domain | String | dataset name like NEMSGLOBAL or ERA5 |
gapFillDomain | Optional String | dataset to use to fill gaps |
timeResolution | String | hourly or daily |
codes | Array of Codes | Individual selection of weather variables. See next chapter. |
transformations | Optional array of transformations | |
allowForecast | Boolean, default true | Whether to allow forecast data |
allowHistory | Boolean, default true | Whether to allow history data |
Notes:
allowHistory
enables reads form the meteoblue archive storage. Forecasts are archived once a day and tend to be more consistent.allowForecast
enables reads from up to date forecasts which reside on SSD and are updated more frequently. Data of the last days may change slightly. This applies only to datasets which offer forecasts.timeResolution
specified the resolution to read. It can also be set todaily
although the dataset only offers hourly data to automatically calculate daily aggregations. Aggregations likemonthly
must use transformations. In the future, some datasets may offer pre-computed monthly or yearly data directly.
Once the dataset has been selected, multiple variables at different levels can be encoded into the call. The web-interfaces only use one variable per dataset for simplicity. The API is capable of selecting multiple variables per dataset at once. This could improve API call performance, because expensive spatial calculations are only performed once.
Attributes for the structure code
:
Property | Type | Description |
---|---|---|
code | Integer | Numeric variable code. E.g. 11 for temperature |
level | String | Level the variable. E.g. 2 m above gnd |
aggregation | Optional String | min , max , mean , sum to be used with daily aggregations |
gddBase | Optional Float | Lower limit for the GDD calculation. Celsius unless Fahrenheit is selected |
gddLimit | Optional Float | Upper limit |
startDepth | Optional Integer | Depth in centimeters for the soil depth aggregation |
endDepth | Optional Integer | |
slope | Optional Float | Inclination to calculate GTI. 0 = horizontal, 20 = typical value, 90 = vertical |
facing | Optional Float | East-West orientation for GTI. 90° = East, 180° = South, 270° = West |
Variable Codes
The numeric codes to select a variable from a dataset originated from NOAA GRIB 1 codes, but have been extended to include more variables.
A list of all weather variable codes at meteoblue is available as JSON API. Please note, that any individual dataset only supports a small fraction of the available codes.
Transformations
Within the query
structure an array of transformations can be specified. All transformations are processed one after another, but also modify the behavior of others like extend time-intervals or spatial contexts.
We recommend using web-interfaces to configure calls, but as a reference the API syntax for each transformation is documented below. For more details on each transformation consult the web-interfaces documentation.
Temporal Transformations Syntax
Aggregations to daily, monthly and yearly use an easy syntax. In this example 3 transformations are used with a 30-year temperature time-series:
- Calculate the daily minimum
- Use all daily minima and calculate the mean for a month. This is now the monthly mean of daily minimum temperatures.
- From all the monthly means pick the coldest monthly value. The call now returns 30 values because 30 years are used as an input
{
"transformations": [
{
"type": "aggregateDaily",
"aggregation": "min"
},
{
"type": "aggregateMonthly",
"aggregation": "mean"
},
{
"type": "aggregateYearly",
"aggregation": "min"
}
]
}
The following values are supported for the attribute aggregation
:
sum
,min
,max
,mean
,stddev
sumIgnoreNaN
,minIgnoreNaN
,maxIgnoreNaN
,meanIgnoreNaN
p10
,p25
,p50
,p75
,p90
The transformations Aggregate daily by longitude
and Aggregate each time-interval
also just use the aggregation type:
{
"transformations": [
{
"type": "aggregateDailyByLongitude",
"aggregation": "mean"
},
{
"type": "aggregateTimeInterval",
"aggregation": "mean"
}
]
}
The transformation Aggregate by day and night
additonally takes an attribute dailyNightly
:
daylightAndNighttime
: Return 2 values per day. One for daytime and one for nighttimedaylight
: Only aggregate daylight hoursnighttime
: Only aggregate nighttime hours
{
"type": "aggregateHalfDaily",
"dailyNightly": "daylightAndNighttime",
"aggregation": "mean"
}
Note: To keep the documentation compact, the examples only include the minimum JSON syntax.
Aggregate over a sliding time window
requires a nTimesteps
attributes which is an Integer for how many time-steps are used in the sliding windows aggregation.
{
"type": "timeLaggedAggregation",
"aggregation": "mean",
"nTimesteps": 3
}
Aggregate to climate normals
allows to select daily
and hourly
resolution with the attribute temporalResolution
{
"type": "aggregateNormals",
"aggregation": "mean",
"temporalResolution": "daily"
}
For temporal interpolations the transformation Interpolate temporal
expects and temporalResolution
attribute with the options: 15min
, 10min
, 5min
and 1min
{
"type": "interpolateTemporal",
"temporalResolution": "15min"
}
Value Filter Transformation Syntax
The transformations to filter values based on a threshold, use a returnClassification
to specify the return behavior:
zeroOrOne
zeroOrValue
zeroOrDelta
zeroOrOneAccumulated
zeroOrValueAccumulated
zeroOrDeltaAccumulated
zeroOrConsecutiveCount
{
"type": "valueIsAbove",
"valueMin": 30,
"returnClassification": "zeroOrOne"
},
{
"type": "valueIsBelow",
"valueMax": 10,
"returnClassification": "zeroOrOne"
},
{
"type": "valueIsBetween",
"valueMin": 10,
"valueMax": 30,
"returnClassification": "zeroOrOne"
}
The transformation Value limited to a range
takes two integers to limit clip values to a certain range.
{
"type": "valueLimitRange",
"valueMin": 5,
"valueMax": 10
}
Accumulate time-series to a running total
takes no additional attributes.
{
"type": "accumulate"
}
Spatial Transformations Syntax
The transformation Resample to a regular grid
takes a floating-point gridResolution
of greater than 0.001
, options to control interpolation and aggregation and the behavior for the disjoint area of the grid and polygon. Spatial transformation calls only work for polygon calls and not for calls based on single coordinates.
The attributes interpolationMethod
support:
linear
interpolation using triangulated irregular networksnearest
neighbor interpolation
Attribute spatialAggregation
:
mean
,min
,max
: Return NaNs if one input value is NaN.meanIgnoreNaN
,minIgnoreNaN
,maxIgnoreNaN
: Ignores NaNs if possible.
The disjointArea
of the polygon and the resampled grid can be discarded discard
or kept keep
.
{
"type": "spatialTransform",
"gridResolution": 0.5,
"interpolationMethod": "linear",
"spatialAggregation": "mean",
"disjointArea": "discard"
}
This transformation also offers an additional attribute geometry
which can be set to a MultiPoint
geometry to select individual grid-cells after a dataset has been resampled. The grid-cells are selected by a nearest neighbor search in the new regular grid. In the next example, a selected polygon would be gridded to 0.1° and afterwards 2 locations extracted.
{
"type": "spatialTransform",
"gridResolution": 0.1,
"interpolationMethod": "linear",
"spatialAggregation": "mean",
"geometry": {
"type": "MultiPoint",
"coordinates": [[7.57327,47.558399], [7.85222,47.995899]], // lon, lat
"locationNames": ["Basel","Freiburg"]
}
}
Combine Dataset Transformations Syntax
With the transformation Combine the selected data-series
the API syntax uses recursion to select another data-series. The attribute dataQuery
is now using the same structure as described above.
The attribute mathOperator
supports the following modes:
multiply
,divide
,add
,substract
maximum
,minimum
,mean
equals
,notEquals
,greaterThanEquals
,lessThanEquals
{
"type": "combineDataset",
"mathOperator": "multiply",
"dataQuery": {
"domain": "ERA5",
"gapFillDomain": null,
"timeResolution": "hourly",
"codes": [{"code": 75, "level": "high cld lay"}],
"transformations": [...]
}
}
To combine a dataset with a different resolution, resampling can also be used. The attributes accept the same values as explained above.
{
"type": "combineDatasetWithResampling",
"mathOperator": "multiply",
"interpolationMethod": "linear",
"spatialAggregation": "mean",
"dataQuery": {
"domain": "GFS05",
"gapFillDomain": null,
"timeResolution": "3hourly",
"codes": [{"code": 301, "level": "2 m above gnd"}]
}
}
Aggregate all Grid Cells Syntax
The transformation Aggregate all grid-cells
aggregates all grid-cells based on a function. The aggregation function is using the same syntax as for temporal transformations. This transformation works for coordinate as well as polygon calls. For polygon calls, the centroid coordinate will be shown in the output.
{
"type": "spatialTotalAggregate",
"aggregation": "mean"
}
For a weighted average, the transformation spatialTotalWeighted
can be used and takes the weights from a data-series specified in dataQuery
.
{
"type": "spatialTotalWeighted",
"dataQuery": {
"domain": "ERA5",
"gapFillDomain": null,
"timeResolution": "hourly",
"codes": [{"code": 301,"level": "2 m above gnd"}]
}
}
In case the weights originated from another dataset with a different grid, resampling can be used. interpolationMethod
and spatialAggregation
follow the same specifications as before.
{
"type": "spatialTotalWeightedWithResampling",
"interpolationMethod": "linear",
"spatialAggregation": "mean",
"dataQuery": {
"domain": "ERA5",
"gapFillDomain": null,
"timeResolution": "hourly",
"codes": [{"code": 301,"level": "2 m above gnd"}]
}
}
Mask out Grid Cells Syntax
To set values to NaN based on filter criteria, the transformation maskOut
uses a floating-point threshold
attribute and a aboveOrBelow
setting. The filter criteria are retrieved from another data-series which can be specified with dataQuery
.
The attribute aboveOrBelow
supports (The naming is not consistent for historic reasons):
above
: Greater than condition>
below
: Less than condition<
greaterThanEquals
Greater or equals than condition>=
lessThanEquals
Less or equals than condition<=
{
"type": "maskOut",
"aboveOrBelow": "greaterThanEquals",
"threshold": 10.0,
"dataQuery": {
"domain": "NEMSGLOBAL",
"gapFillDomain": null,
"timeResolution": "hourly",
"codes": [{"code": 256, "level": "sfc"}]
}
},
With resampling to match another grid:
{
"type": "maskOutWithResampling",
"aboveOrBelow": "greaterThanEquals",
"threshold": 10.0,
"interpolationMethod": "linear",
"spatialAggregation": "mean",
"dataQuery": {
"domain": "VHP",
"gapFillDomain": null,
"timeResolution": "daily",
"codes": [{"code": 274, "level": "sfc"}]
}
}
Downscale Grid Cells Syntax
Activating this transformation for coordinate API calls, will enable linear downscale using 3 neighboring grid-cells. This is not available for polygon calls.
{
"type": "downscaleGridCell"
}
Sign Mechanism
The meteoblue APIs support shared secrets to make API URLs temper proof or set an expire day. Because the query is submitted as a JSON POST body in an API call, the body content is not signed. If your API key requires signature, you have to calculate the MD5 sum of the POST body and set the URL GET parameter &post_body_md5=
. A signed URL may look like this:
https://my.meteoblue.com/dataset/query?apikey=MY_API_KEY&post_body_md5=070aff9477baf1844e37e68606483436&expire=1581606535&sig=6a5c276f186539bf1d0c57835c4fb0dd
Metadata API
The Metadata API provides basic information about a dataset, the time of the last update of each dataset and differentiates between preliminary (first) and final run. In case of satellite data the preliminary run is usually available after a couple of hours. A second or final run will be published days or even weeks later with improved data quality. Because there could be changes in the data, meteoblue keeps track of these dates and includes them in the metadata API.
Example API call for CHRIPS: http://my.meteoblue.com/dataset/meta?dataset=CHIRPS2
{
"name": "CHIRPS2",
"temporalResolution": "daily",
"region": "50S-50N",
"spatialResolution": "5.0 km",
"historyDataStart": "19810101T0000",
"historyDataFinalRun": "20200731T2300",
"source": "USGS & CHG",
"sourceUrl": "http://chg.geog.ucsb.edu"
}
Fields:
name
: Name of the dataset e.g. NEMSGLOBAL or CHIRPS2temporalResolution
: Native temporal resolution. E.g. hourly or dailyspatialResolution
: Spatial resolution. E.g. "5 km", but could also be a range "4-30 km"historyDataStart
: The first valid timestamp for API calls using archived historical datahistoryDataFinalRun
: The last timestamp, that will not be modified anymore by future planed updates. E.g. for CHIRPS this date is a couple of weeks in the past.region
: Extend of this dataset. E.g. global, central-asia or a latitude bound like50S-50N
source
: Provider of this datasetsourceUrl
: URL to the provider website
Job Queues
While regular API calls only take a couple of seconds, complex dataset calls can take a couple of minutes or even hours. HTTP APIs do not work well with long-running calls. This leads to timeouts on server and client-side. Before executing dataset calls, the API calculates the estimated run-time. If the approximated run-time exceeds a threshold, the API will return an error and the user must submit the call to the job queue system.
After a job is completed, the result will be uploaded to an Amazon S3 web-storage and kept for 7 days. The job-queue result is identical to a regular dataset call.
Implementing the schematics of job queues correctly needs special care. For Python meteoblue offer simple SDK to use the dataset API without having to care about job queues: meteoblue-dataset-sdk on GitHub.
Jobs "belong" to a queue. Queues are associated with API keys by meteoblue and provisioned according to performance requirements. Multiple API keys can share the same queue. Each queue will be processed by multiple "workers" which are exclusively dedicated to one queue. Workers run on multiple physical servers and synchronize with a central job queue dispatcher.
Each API key can submit up to 10 jobs in parallel to the job queue system. Additional jobs will be declined until the previous jobs are completed. This prevents single API users to over-utilize the queue system and starve resources from other applications.
Current job queue limits are (only one needs to be fulfilled):
- Data-points > 876'000: This is one year of hourly data for 100 locations.
- Number of grid-cells > 5000: This is using an approximated number of grid-cells of potential effected grid-cells by using the polygon area and grid-resolution. This approximation is necessary to quickly calculate the number of grid-cells without performing expensive polygon/grid operations.
- Spatial resampling transformation is used: This is independent of polygon size or resolution.
Running Jobs on the Queue
There are 2 options to determinate if a job must be executed on a job queue:
- The API will return an error:
{"error_message": "This job must be executed on a job-queue"}
- The dataset query JSON syntax accepts a checkOnly parameter:
{"checkOnly": true, "geometry": ..., "queries":...}
. The response JSON contains{"requiresJobQueue": true}
To start a job on the queue, the parameter runOnJobQueue
must be set to true in the request POST JSON. An API key is only necessary to start a job, but status and result can be retrieved without an API key.
curl "http://my.meteoblue.com/dataset/query/?apikey=123456789"
-H "Content-Type: application/json"
-d '{"runOnJobQueue":true,"geometry":{"type":"MultiPoint","coordinates":[[7.57327,47.558399,279]],"locationNames":["Basel"]},"format":"csv","timeIntervals":["2017-01-01T+00:00/2017-12-31T+00:00"],"queries":[{"domain":"ERA5","gapFillDomain":null,"timeResolution":"hourly","codes":[{"code":75,"level":"high cld lay"}]}]}'
If the API call is ok, the API responds with a JSON which contains the UUID of the newly submitted job. The status shows waiting
. If a worker is available, the job will be started immediately.
{
"id": "6768BAC9-2446-4A9F-A2CD-A8CCAE070919",
"queueTime": 1524063300,
"apikey": "123456",
"status": "waiting"
}
Calls to http://my.meteoblue.com/queue/status/API-ID
show after a couple of seconds running
and finally finished
{
"id": "6768BAC9-2446-4A9F-A2CD-A8CCAE070919",
"queueTime": 1524063300,
"apikey": "123456",
"status": "finished"
}
The result is uploaded to a central storage system and can be retrieved with:
curl "http://queueresults.meteoblue.com/DD7F947B-C11E-48A2-92DC-1A4A5E77DE8C"
Job States
Jobs can have the following state:
waiting
: A job is queued but has not yet been started. It should start within a couple of seconds, unless the queues are highly utilized.running
: The job is currently running. Each job is checked every couple of seconds if it is actually running.finished
: Job successfully completed. The result can now be retrieved athttp://queueresults.meteoblue.com/<id>
.deleted
: Job has been cancelled manually by the usererror
: An error occurred
To cancel a waiting or running job, send a HTTP DELETE request to /queue/delete/6768BAC9-2446-4A9F-A2CD-A8CCAE070919
. If the job is already finished, this call we delete the stored result on AWS S3.
To retrieve the JSON call of a queued or running job: http://my.meteoblue.com/queue/jobquery/API-ID
.
Errors
If an error occurs while executing a job, the error message is set in the job status. Sometimes a job will fail with the error message Job failed for unknown reasons
. In this case the application executing job, most likely ran out of memory. Generating CSV text output with large polygons or long time-intervals quickly requires many gigabytes of memory. Try to use netCDF as output format or use smaller polygons, fewer coordinates or shorter time-intervals.
{
"status": "error",
"error_message": "Job failed for unknown reasons"
}
This error is not limited to out-of-memory issues but could also indicate to a programming error which lead to a crash of the application. In case the error persists even with a smaller geographical and temporal extent, please contact us.
API Endpoint:
- Status:
http://my.meteoblue.com/queue/status/API-ID
- Delete:
http://my.meteoblue.com/queue/delete/API-ID
(only HTTP DELETE) - Result:
http://queueresults.meteoblue.com/API-ID
- Query JSON:
http://my.meteoblue.com/queue/jobquery/API-ID