Can I use pycsw within my WSGI application?
How do I export my repository?
How do I add a custom metadata format?
How can I catalogue ‘sets’ of metadata?
How can I handle transactions safely?
How can I make CSW POST XML requests?
Does pycsw have a GUI/webapp/interface?
I have 2147 metadata records. How do I add them?
How do I add metadata from a WAF?
How do I harvest huge OGC services without getting HTTP timeout errors?
Is pycsw customizable or extensible?
Why am I getting a ‘Connection refused’ error when connecting pycsw?
When doing a GetRecords why are there no results?
My pycsw install doesn’t work at all with QGIS
Yes. pycsw can be deployed as both via traditional CGI or WSGI. You can also integrate pycsw via Django views, Pylons controllers or Flask routes.
Use the pycsw-admin.py
utility to dump the records as XML documents to a directory:
pycsw provides a plugin framework in which you can implement a custom profile (see Profile Plugins)
Create a ‘parent’ metadata record from which all relevant metadata records (imagery, features) derive from via the same dc:source
element of Dublin Core or apiso:parentIdentifier
element of ISO 19139:2007. Then, do a GetRecords
request, filtering on the identifier of the parent metadata record. Sample request:
The above query will search for all metadata records of the same apiso:parentIdentifier
(identified by $identifier
) within a given area of interest. The equivalent query can be done against dc:source
with the same design pattern.
Transactions are handled by an IP-based authentication list which can be set in pycsw’s configuration (in manager.allowed_ips
). Supported notations includes traditional IP address, wildcard, and CIDR.
HTTP POST requests with XML are a bit different than the traditional HTTP POST approach (key=value). An HTTP client opens a connection to the server and sends XML directly. CSW implements HTTP POST in this manner with XML requests.
There are numerous ways to make this type of request, but here are a few:
Python
Command line tools:
No. pycsw is a headless metadata catalog. Administration is via command line. For full metadata management, applications like CKAN, GeoNode and Open Data Catalog are built with pycsw inside and provide functionality to manage metadata via a GUI.
Add metadata using pycsw-admin.py
:
See Loading Records in the documentation.
Use the pycsw-admin.py
utility and CSW’s Harvest
operation against your own server:
harvest-waf.xml
:
The CSW Harvest
operation supports asynchronous processing via the ResponseHandler
parameter. When specified, this parameter allows the request to continue while
returning the response to the client. The client will then be notified of completion via URI value of the ResponseHandler
parameter being sent.
pycsw supports both FTP and SMTP-based ResponseHandler
processing:
FTP (result gets pushed to ftp://host/result.xml
):
SMTP (result gets emailed to you@example.com
. See the docs for more information on configuring server.smtp_host
):
Yes. See our API docs for examples on deploying pycsw in a custom application/framework.
Most CSW client tools (e.g. QGIS MetaSearch, OWSLib, etc.) derive the CSW URL from the GetCapabilities
reponse XML, as opposed to using directly the URL you provide. Ensure that the server.url
configuration value is set to ensure the URL to be advertised in the CSW Capabilities XML.
The default result type of a GetRecords
response is a hit count, which does not show any records per se but provides a summary of the search result
To return actual records add resulttype=results
to the GetRecords
request.
A key component of the pycsw configuration is the server.url
directive. When working
with CSW servers, applications such as QGIS MetaSearch
and OWSLib read a CSW’s Capabilities XML document
to be able to ‘bind’ to the appropriate URL when making subsequent CSW requests.
For example, in the pycsw case, if your server.url
directive is set to http://localhost:8000
, but your server
is deployed to http://example.org/pycsw
, QGIS MetaSearch, OWSLib and other CSW clients will intially
connect to http://example.org/pycsw
to derive the Capabilities XML response, and then use
http://localhost:8000
for subsequent requests (such as GetRecords
, GetDomain
,
GetRecordById
, etc.).
The end result is having a pycsw instance that is able to respond to a GetCapabilities
request but nothing more, in tools like QGIS/MetaSearch, OWSLib, etc.
This is a feature, not a bug. What’s going on here?
Standards-wise, this is all valid and proper. OGC CSW uses OWS Common to be able to identify,
well, common constructs of service metadata in the CSW 2/3 Capabilities XML response. OWS
Common’s OperationsMetadata
section defines various binding endpoints/URLs for each OWS
operation. This means you can, in theory, have a given URL for GetCapabilities
requests,
and a different URL for GetRecords
requests. In the pycsw case, the server.url
directive
is used across the server’s Capabilities XML for simplicity. It is critical to ensure this URL
is consistent with your deployment as advertised and published.
A common area which this may cause errors is in the case of pycsw servers deployed with HTTPS but
server.url
directives set to http://...
. Currently (2020), numerous services are or have migrated from
HTTP to HTTPS, and include web server redirects to manage traffic accordingly. Given CSW servers
include support for XML POST, a web server’s redirect may not pass along the XML payload
as part of the redirect.
Here’s a bare bones example of the differences between HTTP and HTTPS responses using curl with the same request (this mimics essentially QGIS MetaSearch/OWSLib behaviour):
# request example 1: works
curl -s https://raw.githubusercontent.com/geopython/pycsw/master/tests/functionaltests/suites/default/post/GetRecords-all.xml
| curl -X POST -d @- https://example.org/pycsw
# request example 2: does not work
curl -s https://raw.githubusercontent.com/geopython/pycsw/master/tests/functionaltests/suites/default/post/GetRecords-all.xml
| curl -X POST -d @- http://example.org/pycsw
Here’s what happens in request example 2:
GetRecords-all.xml
)It is this behaviour that results in, for example, QGIS MetaSearch and OWSLib errors.
Setting server.url
accordingly will alleviate these issues and will allow pycsw to continue to
work as expected with CSW clients.