FAQ

Can I use pycsw within my WSGI application?

How do I export my repository?

How do I add a custom metadata format?

How can I catalogue ‘sets’ of metadata?

How can I handle transactions safely?

How can I make CSW POST XML requests?

Does pycsw have a GUI/webapp/interface?

I have 2147 metadata records. How do I add them?

How do I add metadata from a WAF?

How do I harvest huge OGC services without getting HTTP timeout errors?

Is pycsw customizable or extensible?

Why am I getting a ‘Connection refused’ error when connecting pycsw?

Can I use pycsw within my WSGI application?

Yes. pycsw can be deployed as both via traditional CGI or WSGI. You can also integrate pycsw via Django views, Pylons controllers or Flask routes.

How do I export my repository?

Use the pycsw-admin.py utility to dump the records as XML documents to a directory:

pycsw-admin.py -c export_records -f default.cfg -p /path/to/output_dir

How do I add a custom metadata format?

pycsw provides a plugin framework in which you can implement a custom profile (see Profile Plugins)

How can I catalogue ‘sets’ of metadata?

Create a ‘parent’ metadata record from which all relevant metadata records (imagery, features) derive from via the same dc:source element of Dublin Core or apiso:parentIdentifier element of ISO 19139:2007. Then, do a GetRecords request, filtering on the identifier of the parent metadata record. Sample request:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" service="CSW" version="2.0.2" resultType="results" startPosition="1" maxRecords="5" outputFormat="application/xml" outputSchema="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" xmlns:gml="http://www.opengis.net/gml" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0">
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>brief</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:And>
          <ogc:PropertyIsEqualTo>
            <ogc:PropertyName>apiso:parentIdentifier</ogc:PropertyName>
            <ogc:PropertyName>$identifier</ogc:PropertyName>
          </ogc:PropertyIsEqualTo>
          <ogc:BBOX>
            <ogc:PropertyName>ows:BoundingBox</ogc:PropertyName>
            <gml:Envelope>
              <gml:lowerCorner>47 -5</gml:lowerCorner>
              <gml:upperCorner>55 20</gml:upperCorner>
            </gml:Envelope>
          </ogc:BBOX>
        </ogc:And>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>

The above query will search for all metadata records of the same apiso:parentIdentifier (identified by $identifier) within a given area of interest. The equivalent query can be done against dc:source with the same design pattern.

How can I handle transactions safely?

Transactions are handled by an IP-based authentication list which can be set in pycsw’s configuration (in manager.allowed_ips). Supported notations includes traditional IP address, wildcard, and CIDR.

How can I make CSW POST XML requests?

HTTP POST requests with XML are a bit different then the traditional HTTP POST approach (key=value). An HTTP client opens a connection to the server and sends XML directly. CSW implements HTTP POST in this manner with XML requests.

There are numerous ways to make this type of request, but here are a few:

Python

import requests
requests.post('http://demo.pycsw.org/cite/csw', data=open('/path/to/request.xml').read()).text

from owslib.util import http_post
# see owslib.util.http_post in https://github.com/geopython/OWSLib/blob/master/owslib/util.py
response = http_post('http://demo.pycsw.org/cite/csw', request=open('/path/to/request.xml').read())

Command line tools:

# pycsw-admin.py utility
pycsw-admin.py -c post_xml -u http://demo.pycsw.org/cite/csw -x /path/to/request.xml

# curl
curl -X POST -d @/path/to/request.xml http://demo.pycsw.org/cite/csw

# lwp-request
cat /path/to/request.xml | POST http://demo.pycsw.org/cite/csw

# wget
wget http://demo.pycsw.org/cite/csw --post-file=/path/to/request.xml

Does pycsw have a GUI/webapp/interface?

No. pycsw is a headless metadata catalog. Administration is via command line. For full metadata management, applications like CKAN, GeoNode and Open Data Catalog are built with pycsw inside and provide functionality to manage metadata via a GUI.

I have 2147 metadata records. How do I add them?

Add metadata using pycsw-admin.py:

# read a directory of metadata files
pycsw-admin.py -c load_records -f /path/to/default.cfg -p /path/to/records

# read a directory of metadata files, recursively
pycsw-admin.py -c load_records -f /path/to/default.cfg -p /path/to/records -r

See Loading Records in the documentation.

How do I add metadata from a WAF?

Use the pycsw-admin.py utility and CSW’s Harvest operation against your own server:

pycsw-admin.py -c post_xml -u http://localhost/csw -x /path/to/harvest-waf.xml

harvest-waf.xml:

<?xml version="1.0" encoding="UTF-8"?>
<Harvest xmlns="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-publication.xsd" service="CSW" version="2.0.2">
  <Source>http://demo.pycsw.org/waf</Source>
  <ResourceType>urn:geoss:waf</ResourceType>
</Harvest>

How do I harvest huge OGC services without getting HTTP timeouts?

The CSW Harvest operation supports asynchronous processing via the ResponseHandler parameter. When specified, this parameter allows the request to continue while returning the response to the client. The client will then be notified of completion via URI value of the ResponseHandler parameter being sent.

pycsw supports both FTP and SMTP-based ResponseHandler processing:

FTP (result gets pushed to ftp://host/result.xml):

<?xml version="1.0" encoding="UTF-8"?>
<Harvest xmlns="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-publication.xsd" service="CSW" version="2.0.2">
  <Source>http://demo.pycsw.org/waf</Source>
  <ResourceType>urn:geoss:waf</ResourceType>
  <ResponseHandler>ftp://host/result.xml</ResponseHandler>
</Harvest>

SMTP (result gets emailed to you@example.com. See the docs for more information on configuring server.smtp_host):

<?xml version="1.0" encoding="UTF-8"?>
<Harvest xmlns="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-publication.xsd" service="CSW" version="2.0.2">
  <Source>http://demo.pycsw.org/waf</Source>
  <ResourceType>urn:geoss:waf</ResourceType>
  <ResponseHandler>mailto:you@example.com</ResponseHandler>
</Harvest>

Is pycsw customizable or extensible?

Yes. See our API docs for examples on deploying pycsw in a custom application/framework.

Why am I getting a ‘Connection refused’ error when connecting to pycsw?

Most CSW client tools (e.g. QGIS MetaSearch, OWSLib, etc.) derive the CSW URL from the GetCapabilities reponse XML, as opposed to using directly the URL you provide. Ensure that the server.url configuration value is set to ensure the URL to be advertised in the CSW Capabilities XML.