pycsw-admin.py utility to dump the records as XML documents to a directory:
pycsw provides a plugin framework in which you can implement a custom profile (see Profile Plugins)
Create a ‘parent’ metadata record from which all relevant metadata records (imagery, features) derive from via the same
dc:source element of Dublin Core or
apiso:parentIdentifier element of ISO 19139:2007. Then, do a
GetRecords request, filtering on the identifier of the parent metadata record. Sample request:
The above query will search for all metadata records of the same
apiso:parentIdentifier (identified by
$identifier) within a given area of interest. The equivalent query can be done against
dc:source with the same design pattern.
Transactions are handled by an IP-based authentication list which can be set in pycsw’s configuration (in
manager.allowed_ips). Supported notations includes traditional IP address, wildcard, and CIDR.
HTTP POST requests with XML are a bit different than the traditional HTTP POST approach (key=value). An HTTP client opens a connection to the server and sends XML directly. CSW implements HTTP POST in this manner with XML requests.
There are numerous ways to make this type of request, but here are a few:
Command line tools:
No. pycsw is a headless metadata catalog. Administration is via command line. For full metadata management, applications like CKAN, GeoNode and Open Data Catalog are built with pycsw inside and provide functionality to manage metadata via a GUI.
Add metadata using
See Loading Records in the documentation.
pycsw-admin.py utility and CSW’s
Harvest operation against your own server:
Harvest operation supports asynchronous processing via the
ResponseHandler parameter. When specified, this parameter allows the request to continue while
returning the response to the client. The client will then be notified of completion via URI value of the
ResponseHandler parameter being sent.
pycsw supports both FTP and SMTP-based
FTP (result gets pushed to
SMTP (result gets emailed to
firstname.lastname@example.org. See the docs for more information on configuring
Yes. See our API docs for examples on deploying pycsw in a custom application/framework.
Most CSW client tools (e.g. QGIS MetaSearch, OWSLib, etc.) derive the CSW URL from the
GetCapabilities reponse XML, as opposed to using directly the URL you provide. Ensure that the
server.url configuration value is set to ensure the URL to be advertised in the CSW Capabilities XML.
The default result type of a
GetRecords response is a hit count, which does not show any records per se but provides a summary of the search result
To return actual records add
resulttype=results to the
A key component of the pycsw configuration is the
server.url directive. When working
with CSW servers, applications such as QGIS MetaSearch
and OWSLib read a CSW’s Capabilities XML document
to be able to ‘bind’ to the appropriate URL when making subsequent CSW requests.
For example, in the pycsw case, if your
server.url directive is set to
http://localhost:8000, but your server
is deployed to
http://example.org/pycsw, QGIS MetaSearch, OWSLib and other CSW clients will intially
http://example.org/pycsw to derive the Capabilities XML response, and then use
http://localhost:8000 for subsequent requests (such as
The end result is having a pycsw instance that is able to respond to a
request but nothing more, in tools like QGIS/MetaSearch, OWSLib, etc.
This is a feature, not a bug. What’s going on here?
Standards-wise, this is all valid and proper. OGC CSW uses OWS Common to be able to identify,
well, common constructs of service metadata in the CSW 2/3 Capabilities XML response. OWS
OperationsMetadata section defines various binding endpoints/URLs for each OWS
operation. This means you can, in theory, have a given URL for
and a different URL for
GetRecords requests. In the pycsw case, the
is used across the server’s Capabilities XML for simplicity. It is critical to ensure this URL
is consistent with your deployment as advertised and published.
A common area which this may cause errors is in the case of pycsw servers deployed with HTTPS but
server.url directives set to
http://.... Currently (2020), numerous services are or have migrated from
HTTP to HTTPS, and include web server redirects to manage traffic accordingly. Given CSW servers
include support for XML POST, a web server’s redirect may not pass along the XML payload
as part of the redirect.
Here’s a bare bones example of the differences between HTTP and HTTPS responses using curl with the same request (this mimics essentially QGIS MetaSearch/OWSLib behaviour):
# request example 1: works curl -s https://raw.githubusercontent.com/geopython/pycsw/master/tests/functionaltests/suites/default/post/GetRecords-all.xml | curl -X POST -d @- https://example.org/pycsw # request example 2: does not work curl -s https://raw.githubusercontent.com/geopython/pycsw/master/tests/functionaltests/suites/default/post/GetRecords-all.xml | curl -X POST -d @- http://example.org/pycsw
Here’s what happens in request example 2:
It is this behaviour that results in, for example, QGIS MetaSearch and OWSLib errors.
server.url accordingly will alleviate these issues and will allow pycsw to continue to
work as expected with CSW clients.