OJS (Open Journal System) OAI export MARCXML: Removing empty lines

In my previous post I explained how to show author email using OJS.

I just noticed that some records were showing empty tags, like:

<datafield tag="653" ind1=" " ind2=" " >
  <subfield code="a" ></subfield>
</datafield>

You can change OAIMetadataFormat_MARC21.inc.php to avoid it. More precisely, look for the formatElement function and change it to (notice changes in lines 15 and 19):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
 * Format XML for single MARC21 element.
 * @param $tag string
 * @param $ind1 string
 * @param $ind2 string
 * @param $code string
 * @param $value mixed
 */
function formatElement($tag, $ind1, $ind2, $code, $value) {
      if (!is_array($value)) {
              $value = array($value);
      }
      $response = '';
      foreach ($value as $v) {
         if ($v != ""){
             $response .= "\t<datafield tag=\"$tag\" ind1=\"$ind1\" ind2=\"$ind2\">\n" .
                 "\t\t<subfield code=\"$code\">" . OAIUtils::prepOutput($v) . "</subfield>\n" .
                 "\t</datafield>\n";
         }
      }
      return $response;
}

If you have any other formatElement custom functions you should change em too!

OJS (Open Journal System) OAI export marcxml hacking: adding author email

Let’s see how to change the output for OAI marcxml plugin for Open Journal System.

OJS prerrequisites and considerations

For this tutorial I’ll assume you have uploaded a mag called ‘tropelias’ to your OJS and that you have the oaiMetadataFormats plugin installed and running.

The default OJS OAI base URL for that mag should be then:
http://zaguan.unizar.es/ojs/index.php/tropelias/oai?verb=

You can use the usual OAI-PMH verbs. For instance, lets see the default output for http://zaguan.unizar.es/ojs/index.php/tropelias/oai?verb=ListRecords&metadataPrefix=marcxml

Should be something like this:

<record xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" >
<leader> cam 3u </leader>
<controlfield tag="008" >"110226 2011 eng "</controlfield>
<datafield tag="042" ind1=" " ind2=" " >
<subfield code="a" >dc</subfield>
</datafield>
<datafield tag="245" ind1="0" ind2="0" >
<subfield code="a" >Apolo y Dionisos en tres obras de Thomas Mann: Muerte en Venecia, La montaña mágica, Mario y el mago.</subfield>
</datafield>
<datafield tag="720" ind1=" " ind2=" " >
<subfield code="a" >Alfonso Matute, Nuria; Universidad de Zaragoza</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" " >
<subfield code="a" ></subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" " >
<subfield code="a" ><p><em>Este art&iacute;culo pretende realizar un acercamiento a algunos temas omnipresentes en la literatura de Thomas Mann. Estos motivos se relacionan con la influencia de Nietzsche sobre el autor, y se apoyan en su concepci&oacute;n de lo apol&iacute;neo y lo dionisiaco, que en Thomas Mann se manifiesta en una lucha continua entre el caos y la contenci&oacute;n; lucha que se resuelve de forma inevitable con el triunfo del caos. A trav&eacute;s del estudio de elementos como el tiempo, los mundos irracionales, el h&eacute;roe decadente, la muerte, la enfermedad, las culturas opuestas, o la androginia y el homoerotismo, se ha intentado buscar esta conexi&oacute;n en tres obras de Thomas Mann: </em>Muerte en Venecia<em>, </em>La monta&ntilde;a m&aacute;gica <em>y </em>Mario y el mago<em>.</em></p> <p><em>Dieser Artikel versucht eine Ann&auml;herung an einige allgegenw&auml;rtige Aspekte in Thomas Mann&rsquo;s Literatur. Themen, die auf den sind Einflluss zur&uuml;ckgehen, den Nieztsche auf den Autor ausge&uuml;bt hat. Es wird vorwiegend der Gegensatz zwischen dem &ldquo;Apolinischen&rdquo; und dem &ldquo;Dionisischen&rdquo; behandelt, jenen Konzepten, die sich in Thomas Mann&rsquo;s Werken durch den ewigen Kampf zwischen M&auml;&szlig;igung und Chaos offenbaren. Dabei l&ouml;st sich der Kampf immer unvermeidlich mit dem Sieg des Chaos. Durch die Betrachtung dieser Elemente bzw. der Zeit, der unvern&uuml;nftigen Welten, dem dekadenten Held, der Opposition verschiedener Kulturen oder der Androgynie und homoerotischen Verwandtschaften, wird eine thematische Verbindung in den folgenden drei Werken von Thomas Mann gesucht: </em>Der Tod in Venedig<em>, </em>Der Zauberberg <em>und </em>Mario und der Zauberer<em>.</em></p></subfield>
</datafield>
<datafield tag="260" ind1=" " ind2=" " >
<subfield code="b" >Universidad de Zaragoza</subfield>
</datafield>
<datafield tag="720" ind1=" " ind2=" " >
<subfield code="a" ></subfield>
</datafield>
<datafield tag="260" ind1=" " ind2=" " >
<subfield code="c" >2011-02-26 00:00:00</subfield>
</datafield>
<datafield tag="655" ind1=" " ind2="7" >
<subfield code="a" ></subfield>
</datafield>
<datafield tag="856" ind1=" " ind2=" " >
<subfield code="q" >application/pdf</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2="0" >
<subfield code="u" >http://zaguan.unizar.es/ojs/index.php/tropelias/article/view/1</subfield>
</datafield>
<datafield tag="786" ind1="0" ind2=" " >
<subfield code="n" >Tropelías : Revista de Teoría de la Literatura y Literatura Comparada; ##issue.no## 15-17 (2004)</subfield>
</datafield>
<datafield tag="546" ind1=" " ind2=" " >
<subfield code="a" >es</subfield>
</datafield>
<datafield tag="787" ind1="0" ind2=" " >
<subfield code="n" ></subfield>
</datafield>
<datafield tag="500" ind1=" " ind2=" " >
<subfield code="a" ></subfield>
</datafield>
<datafield tag="500" ind1=" " ind2=" " >
<subfield code="a" ></subfield>
</datafield>
<datafield tag="500" ind1=" " ind2=" " >
<subfield code="a" ></subfield>
</datafield>
<datafield tag="540" ind1=" " ind2=" " >
<subfield code="a" ></subfield>
</datafield>
</record>

Hacking the output: step 1

We want to change this lines:

<datafield tag="720" ind1=" " ind2=" " >
  <subfield code="a" >Alfonso Matute, Nuria; Universidad de Zaragoza</subfield>
</datafield>

To:

<datafield tag="100" ind1=" " ind2=" " >
  <subfield code="a" >Alfonso Matute, Nuria; Universidad de Zaragoza</subfield>
</datafield>

This is quite easy. Follow this steps:

cd $OJS_HOME;
vi ./plugins/oaiMetadataFormats/marcxml/OAIMetadataFormat_MARC21.inc.php

Change this line:

$this->formatElement('720', ' ', ' ', 'a', $creators) .

To:

$this->formatElement('100', ' ', ' ', 'a', $creators) .

Save and exit. Should be working 🙂

Hacking the output: step 2

Now that we know which file to edit and did the previous test, lets imagine we want to change from

<datafield tag="720" ind1=" " ind2=" " >
  <subfield code="a" >Alfonso Matute, Nuria; Universidad de Zaragoza</subfield>
</datafield>

To:

<datafield tag="100" ind1=" " ind2=" " >
  <subfield code="a" >Alfonso Matute, Nuria; Universidad de Zaragoza</subfield>
  <subfield code="a" >tropelias@unizar.es</subfield>
</datafield>

In order to show author’s email.

Follow these steps:

(1) Edit OAIMetadataFormat_MARC21.inc.php:

cd $OJS_HOME;
vi ./plugins/oaiMetadataFormats/marcxml/OAIMetadataFormat_MARC21.inc.php

(2) Delete that file’s contents and paste the following code:

<?php
 
/**
 * @file plugins/oaiMetadataFormats/marcxml/OAIMetadataFormat_MARC21.inc.php
 *
 * Copyright (c) 2003-2010 John Willinsky
 ****** Modified by Miguel Martín González (miguelm[at]unizar[dot]es)
 ****** to add authors email to output
 * Distributed under the GNU GPL v2. For full terms see the file docs/COPYING.
 *
 * @class OAIMetadataFormat_MARC21
 * @ingroup oai_format
 * @see OAI
 *
 * @brief OAI metadata format class -- MARC21 (MARCXML).
 */
 
// $Id$
 
class OAIMetadataFormat_MARC21 extends OAIMetadataFormat {
        /**
         * @see OAIMetadataFormat#toXml
         */
        function toXml(&$record, $format = null) {
 
                // Changed! Comment to avoid displaying errors in the web  ----------------------------------------
                ini_set('display_errors',true);
                // ---------------------------------------------------------------------------------------------------------
 
                $article =& $record->getData('article');
                $issue =& $record->getData('issue');
                $journal =& $record->getData('journal');
                $section =& $record->getData('section');
                $galleys =& $record->getData('galleys');
 
                // Format creators
                $creators = array();
                // Changed! Lets make an array to store the emails ---------------------------
                $emails = array();
                // --------------------------------------------------------------
                $authors = $article->getAuthors();
                for ($i = 0, $num = count($authors); $i < $num; $i++) {
                        $authorName = $authors[$i]->getFullName(true);
                        $affiliation = $authors[$i]->getLocalizedAffiliation();
                        // Changed! Lets fetch the author email and store it to our emails array ------------------------
                        $emails[] = $authors[$i]->getEmail();
                        // ------------------------------------------------------------------------------------------------------
                        if (!empty($affiliation)) {
                                $authorName .= '; ' . $affiliation;
                        }
                        $creators[] = $authorName;
                }
 
                $subjects = array_merge_recursive(
                        $this->stripAssocArray((array) $article->getDiscipline(null)),
                        $this->stripAssocArray((array) $article->getSubject(null)),
                        $this->stripAssocArray((array) $article->getSubjectClass(null))
                );
                $subject = isset($subjects[$journal->getPrimaryLocale()])?$subjects[$journal->getPrimaryLocale()]:'';
                $publisher = $journal->getLocalizedTitle(); // Default
                $publisherInstitution = $journal->getSetting('publisherInstitution');
                if (!empty($publisherInstitution)) {
                        $publisher = $publisherInstitution;
                }
 
                // Format
                $format = array();
                foreach ($galleys as $galley) {
                        $format[] = $galley->getFileType();
                }
 
                // Sources contains journal title, issue ID, and pages
                $source = $journal->getLocalizedTitle() . '; ' . $issue->getIssueIdentification();
                $pages = $article->getPages();
 
                // Relation
                $relation = array();
                foreach ($article->getSuppFiles() as $suppFile) {
                        $record->relation[] = Request::url($journal->getPath(), 'article', 'download', array($article->getId(), $suppFile->getFileId()));
                }
 
                // Coverage
                $coverage = array(
                        $article->getLocalizedCoverageGeo(),
                        $article->getLocalizedCoverageChron(),
                        $article->getLocalizedCoverageSample()
                );
               $response = "<record\n" .
                        "\txmlns=\"http://www.loc.gov/MARC21/slim\"\n" .
                        "\txmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n" .
                        "\txsi:schemaLocation=\"http://www.loc.gov/MARC21/slim\n" .
                        "\thttp://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd\">\n" .
                        "\t<leader>     cam         3u     </leader>\n" .
                        "\t<controlfield tag=\"008\">\"" . date('ymd Y', strtotime($issue->getDatePublished())) . "                        eng  \"</controlfield>\n" .
                        $this->formatElement('042', ' ', ' ', 'a', 'dc') .
                        $this->formatElement('245', '0', '0', 'a', $article->getTitle($journal->getPrimaryLocale())) .
                        // Changed! Lets call a new function to format this complex output ----------------------------------------------------------
                        $this->formatElementMiguel('100', ' ', ' ', 'a', 'b', $creators, $emails) .
                        //  -----------------------------------------------------------------------------------------------------------------------------
                        $this->formatElement('653', ' ', ' ', 'a', $subject) .
                        $this->formatElement('520', ' ', ' ', 'a', $article->getLocalizedAbstract()) .
                        $this->formatElement('260', ' ', ' ', 'b', $publisher) .
                        $this->formatElement('720', ' ', ' ', 'a', strip_tags($article->getLocalizedSponsor())) .
                        $this->formatElement('260', ' ', ' ', 'c', $issue->getDatePublished()) .
                        $this->formatElement('655', ' ', '7', 'a', $section->getLocalizedIdentifyType()) .
                        $this->formatElement('856', ' ', ' ', 'q', $format) .
                        $this->formatElement('856', '4', '0', 'u', Request::url($journal->getPath(), 'article', 'view', array($article->getBestArticleId()))) .
                        $this->formatElement('786', '0', ' ', 'n', $source) .
 
                        $this->formatElement('546', ' ', ' ', 'a', $article->getLanguage()) .
                        $this->formatElement('787', '0', ' ', 'n', $record->relation) .
                        $this->formatElement('500', ' ', ' ', 'a', $coverage) .
                        $this->formatElement('540', ' ', ' ', 'a', strip_tags($journal->getLocalizedSetting('copyrightNotice'))) .
                        "</record>\n";
 
                return $response;
        }
        /**
         * Format XML for single MARC21 element.
         * @param $tag string
         * @param $ind1 string
         * @param $ind2 string
         * @param $code string
         * @param $value mixed
         */
        function formatElement($tag, $ind1, $ind2, $code, $value) {
                if (!is_array($value)) {
                        $value = array($value);
                }
                $response = '';
                foreach ($value as $v) {
                        $response .= "\t<datafield tag=\"$tag\" ind1=\"$ind1\" ind2=\"$ind2\">\n" .
                                "\t\t<subfield code=\"$code\">" . OAIUtils::prepOutput($v) . "</subfield>\n" .
                                "\t</datafield>\n";
                }
                return $response;
        }
 
         // Changed! This function is new! 
        /**
         * Format XML for complex MARC21 element (by Miguel Martin)
         * @param $tag string
         * @param $ind1 string
         * @param $ind2 string
         * @param $code1 string
         * @param $code2 string
         * @param $value mixed
         * @param $value2 mixed
         */
        function formatElementMiguel($tag, $ind1, $ind2, $code, $code2, $value, $value2) {
                if (!is_array($value)) {
                        $value = array($value);
                }
                if (!is_array($value2)){
                        $value2 = array($value2);
                }
 
                // Check that both arrays have the same length to avoid exceptions...
                if ( (count($value)) != (count($value2)) ){
                    return formatElement($tag, $ind1, $ind2, $code, $value);
                }
 
                // both arrays have the same number of elements, so we can safely proceed
                $response = '';
                $i = 0;
                foreach ($value as $v) {
                        $response .= "\t<datafield tag=\"$tag\" ind1=\"$ind1\" ind2=\"$ind2\">\n" .
                                "\t\t<subfield code=\"$code\">" . OAIUtils::prepOutput($v) . "</subfield>\n" .
                                "\t\t<subfield code=\"$code\">" . OAIUtils::prepOutput($value2[$i]) . "</subfield>\n" .
                                "\t</datafield>\n";
                        $i++;
                }
                return $response;
        }
 
}
 
?>

Save and exit. Should be working like a charm 🙂