CDS INVENIO: Check if record belongs to any collection

A record can be in “Processing” state (and show an orange warning) like this one:

cds invenio "processing" record

This “RESTRICTED: Processing record” warning shows up in two scenarios:
1) When the record belongs to restricted collections (get_restricted_collections_for_recid function returns some colls)
2) When the record is NOT in any collection. (is_record_in_any_collection function returns false)

Here is a quick script that checks which records do not belong to any collection.

from invenio.search_engine import \
                 get_restricted_collections_for_recid, \
                 is_record_in_any_collection
 
 
for i in range(40000,45000):   # check records from 40000 to 45000
    if not is_record_in_any_collection(i):
        print "El registro %s no tiene col" %i

Invenio: print records in marcxml format using invenio API

Sometimes it is useful to get the marcxml output of some records.

This is an example showing how to print recid’s from 18007 to 18200 in marcxml format using Invenio API:

from invenio.search_engine import print_record
 
salida = ''
 
for recid in range(18007,18200):
    #print "Registro %s" %recid
    salida += print_record(recid,format='xm')
 
print salida

Introducing MARCXML manipulation tool

If you have to import/export your MARCXML records in Invenio, tind.io offers this great online utility: https://tools.tind.io/xml/xml-manipulation/ that allows to manipulate marcxml.

Marcxml manipulation tool tind.io

Exporting marc is not only useful on migration processes, but also when you have to perform changes to a lot of records in your Invenio system. It is way better than performing those changes at database level.

You can export records using web interface or command line. I prefer this second method, using BibExport:

First change this config file: /opt/invenio/etc/bibexport/marcxml.cfg

The MARCXML exporting method export all the records matching a particular search query, zip them and move them to the requested folder. The output of this exporting method is similar to what one would get by listing the records in MARCXML from the web search interface.

Default configurations are given below. The job would have exported all records from the Book collection into one xml.-file and all articles with the author “Polyakov, A M” into another.

[export_job]
export_method = marcxml
[export_criterias]
books = 980__a:BOOK
polyakov_articles = 980__a:ARTICLE and author:"Polyakov, A M"

the job is run by this command:

/opt/invenio/bin/bibexport -u admin -wmarcxml

Default folder for storing is:

/opt/invenio/var/www/export/marcxml

Invenio 1: BibRank exception

I was getting several exceptions in Bibrank:

* 2015-01-15 08:32:17 -> NoOptionError: No option 'citation_loss_limit' in section: 'citation' (ConfigParser.py:618:get)

** User details
No client information available

** Traceback details 

Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/invenio/bibtask.py", line 606, in task_init
    ret = _task_run(task_run_fnc)
  File "/usr/lib64/python2.7/site-packages/invenio/bibtask.py", line 1146, in _task_run
    if callable(task_run_fnc) and task_run_fnc():
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank.py", line 159, in task_run_core
    func_object(key)
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank_tag_based_indexer.py", line 443, in citation
    return bibrank_engine(run)
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank_tag_based_indexer.py", line 356, in bibrank_engine
    func_object(rank_method_code, cfg_name, config)
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank_tag_based_indexer.py", line 68, in citation_exec
    dic, index_update_time = get_citation_weight(rank_method_code, config)
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank_citation_indexer.py", line 141, in get_citation_weight
    weights = process_and_store(updated_recids, config, chunk_size)
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank_citation_indexer.py", line 157, in process_and_store
    citation_loss_limit = int(config.get(function, "citation_loss_limit"))
  File "/usr/lib64/python2.7/ConfigParser.py", line 618, in get
    raise NoOptionError(option, section)
NoOptionError: No option 'citation_loss_limit' in section: 'citation'

** Stack frame details

This was solved by updating /opt/invenio/etc/bibrank/citation.cfg with citation_loss_limit = 50 and I also included some more options:

[...]
reference_via_doi= 999C5a
reference_via_record_id= 990C50
reference_via_isbn= 999C5i
[...]
citation_loss_limit = 50
collections =

Then it was solved 🙂

Install DISTCACHE on RHEL6 [SOLVED]

Installing distcache on RHEL6 can be tricky. Let’s see how to do it…

cd /home/miguelm/invenio113/prerequisites/distcache
wget http://sourceforge.net/projects/distcache/files/latest/download?source=files
tar -xzvf distcache-1.5.1.tar.gz
cd distcache-1.5.1

If you try to compile as is, you will get errors:

proto_fd.c: In function 'addr_parse':
proto_fd.c:177: error: 'LONG_MIN' undeclared (first use in this function)
proto_fd.c:177: error: (Each undeclared identifier is reported only once
proto_fd.c:177: error: for each function it appears in.)
proto_fd.c:177: error: 'LONG_MAX' undeclared (first use in this function)

To get rid of them, you just have to include in distcache-1.5.1/libnal/proto_fd.c as in:

#define SYS_GENERATING_LIB
 
#include <libsys/pre.h>
#include <libnal/nal.h>
#include "nal_internal.h" 
#include "ctrl_fd.h" 
#include <libsys/post.h>
#include <limits.h>
 
/**************************/
/* predeclare our vtables */
/**************************/
...

Now you can compile with no errors…

./configure --prefix=/usr
make
make install

CDS Invenio 1: The new file uploader explained

Those who have used Invenio since 0.9x version will notice a new useful websubmit element in Invenio 1.

This new useful element is Upload_Files. Older versions just had a file input element (for instance, DEMOTHE_FILE) which allowed just one file to be uploaded.

This is the code of the Upload_Files element:

"""
This is an example of element that creates a file upload interface.
Clone it, customize it and integrate it into your submission. Then add function 
'Move_Uploaded_Files_to_Storage' to your submission functions list, in order for files 
uploaded with this interface to be attached to the record. More information in 
the WebSubmit admin guide.
"""
import os
from invenio.websubmit_managedocfiles import create_file_upload_interface
from invenio.websubmit_functions.Shared_Functions import ParamFromFile
 
indir = ParamFromFile(os.path.join(curdir, 'indir'))
doctype = ParamFromFile(os.path.join(curdir, 'doctype'))
access = ParamFromFile(os.path.join(curdir, 'access'))
try:
    sysno = int(ParamFromFile(os.path.join(curdir, 'SN')).strip())
except:
    sysno = -1
ln = ParamFromFile(os.path.join(curdir, 'ln'))
 
"""
Run the following to get the list of parameters of function 'create_file_upload_interface':
echo -e 'from invenio.websubmit_managedocfiles import create_file_upload_interface as f\nprint f.__doc__' | python
"""
text = create_file_upload_interface(recid=sysno,
                                 print_outside_form_tag=False,
                                 include_headers=True,
                                 ln=ln,
                                 doctypes_and_desc=[('main','Normativa/Modificacion'),
                                                    ('additional','Texto integrado')],
                                 can_revise_doctypes=['*'],
                                 can_describe_doctypes=['main'],
                                 can_delete_doctypes=['additional'],
                                 can_rename_doctypes=['main'],
                                 sbm_indir=indir, sbm_doctype=doctype, sbm_access=access)[1]

Using this new file uploader you can upload several files at a time.

This is how the element is rendered by default:
Captura de pantalla 2014-05-22 a la(s) 13.30.43

When you add this element to your forms, you will also have to add the Move_Uploaded_Files_to_Storage function in the submission. As in:
Invenio websubmit functions

By default the file path will be stored in a 8654_u MARCXML tag, subfield code a.
The information supplied in “Description” field will be mapped to a 8564_u MARCXML tag, subfield code y.
The data entered in “Name” field will be used to rename the fulltext file.

Here is an example of how Invenio builds a 8564_u tag using this new file uploader:
Invenio websubmit functions

I wanted to add a “comment” input field to this File Uploader form. I took a look at the code and… well, Invenio guys have already thought about this!! Notice the “commentBox” div in the code:

<div id="reviseControl">
    <table class="reviseControlBrowser"></table><input type="button" onclick="display_revise_panel(this, 'add', '', true, false, true, true, true, '', '', '', true, '', '<select id=&quot;fileDoctype&quot; name=&quot;fileDoctype&quot; onchange=&quot;var idx=this.selectedIndex;var doctype=this.options[idx].value;updateForm(doctype,{\'main\': \'\'},{},{});&quot;><option value=&quot;main&quot;>Main document</option>\n<option value=&quot;additional&quot;>Figure, schema, etc.</option></select>');updateForm('main', {'main': ''}, {}, {});return false;" value="Añadir otro fichero"/></div></div>
<div id="balloon" style="display:none;">
<input type="hidden" name="fileAction" value="" />
<input type="hidden" name="fileTarget" value="" />
  <table>
    <tr>
      <td class="topleft">&nbsp;</td>
      <td class="top">&nbsp;</td>
      <td class="topright">&nbsp;</td>
    </tr>
    <tr>
      <td class="left" vertical-align="center" width="24"><img alt=" " src="../img/balloon_arrow_left_shadow.png" /></td>
      <td class="center">
        <table id="balloonReviseFile">
          <tr>
            <td><label for="balloonReviseFileInput">Escoja el fichero:</label><br/>
              <div style="display:none" id="fileDoctypesRow"></div>
              <div id="balloonReviseFileInputBlock"><input type="file" name="myfile" id="balloonReviseFileInput" size="20" /></div>
                          <!--  <input type="file" name="myfile" id="balloonReviseFileInput" size="20" onchange="var name=getElementById('rename');var filename=this.value.split('/').pop().split('.')[0];name.value=filename;"/> -->
              <div id="renameBox" style=""><label for="rename">Nombre:</label><br/><input type="text" name="rename" id="rename" size="20" autocomplete="off"/></div>
              <div id="descriptionBox" style=""><label for="description">Descripción:</label><br/><input type="text" name="description" id="description" size="20" autocomplete="off"/></div>
              <div id="commentBox" style=""><label for="comment">Comentario:</label><br/><textarea name="comment" id="comment" rows="3"/></textarea></div>
              <div id="restrictionBox" style="display:none"><select style="display:none" id="fileRestriction" name="fileRestriction"></select></div>
              <div id="keepPreviousVersions" style="display:none"><input type="checkbox" id="balloonReviseFileKeep" name="keepPreviousFiles" checked="checked" /><label for="balloonReviseFileKeep">Guardar las versiones anteriores</label>&nbsp;<small>[<a href="" onclick="alert('Puede decidir esconder o no versiones anteriores de este archivo.');return false;">?</a>]</small></div>
              <p id="warningFormats" style="display:none"><img src="http://155.210.11.63/img/warning.png" alt="Warning"/> Los formatos alternativos para la versión actual de este archivo serán borrados&nbsp;[<a href="" onclick="alert('Cuando revise un archivo, los formatos adicionales que haya subido previamente serán borrados, ya que no estarían sincronizados con el nuevo archivo.');return false;">?</a>]</p>
              <div style="text-align:right;margin-top:5px"><input type="button" value="Cancelar" onclick="javascript:hide_revise_panel();"/> <input type="submit" value="Subir"/></div>
            </td>
          </tr>
        </table>
      </td>
      <td class="right">&nbsp;</td>
    </tr>
    <tr>
      <td class="bottomleft">&nbsp;</td>
      <td class="bottom">&nbsp;</td>
      <td class="bottomright">&nbsp;</td>
    </tr>
  </table>
</div>

However this box has a display:none attribute. So, how can we make the box visible?

Easy.

If you look at the Upload_Files element code, you will notice that the function producing that output is create_file_upload_interface. That function has a lot of parameters that can be listed using:

echo -e 'from invenio.websubmit_managedocfiles import create_file_upload_interface as f\nprint f.__doc__' | python

If you want the “Comment” box to be displayed, just add the parameter can_comment_doctypes=['*'] to the code, as in:

"""
This is an example of element that creates a file upload interface.
Clone it, customize it and integrate it into your submission. Then add function 
'Move_Uploaded_Files_to_Storage' to your submission functions list, in order for files 
uploaded with this interface to be attached to the record. More information in 
the WebSubmit admin guide.
"""
import os
from invenio.websubmit_managedocfiles import create_file_upload_interface
from invenio.websubmit_functions.Shared_Functions import ParamFromFile
 
indir = ParamFromFile(os.path.join(curdir, 'indir'))
doctype = ParamFromFile(os.path.join(curdir, 'doctype'))
access = ParamFromFile(os.path.join(curdir, 'access'))
try:
    sysno = int(ParamFromFile(os.path.join(curdir, 'SN')).strip())
except:
    sysno = -1
ln = ParamFromFile(os.path.join(curdir, 'ln'))
 
"""
Run the following to get the list of parameters of function 'create_file_upload_interface':
echo -e 'from invenio.websubmit_managedocfiles import create_file_upload_interface as f\nprint f.__doc__' | python
"""
text = create_file_upload_interface(recid=sysno,
                                 print_outside_form_tag=False,
                                 include_headers=True,
                                 ln=ln,
                                 doctypes_and_desc=[('main','Normativa/Modificacion'),
                                                    ('additional','Texto integrado')],
                                 can_revise_doctypes=['*'],
                                 can_comment_doctypes=['*'],
                                 can_describe_doctypes=['main'],
                                 can_delete_doctypes=['additional'],
                                 can_rename_doctypes=['main'],
                                 sbm_indir=indir, sbm_doctype=doctype, sbm_access=access)[1]

And now the commentBox is visible 🙂

invenio submit fulltext websubmit element file uploader

This new “comment” field is mapped to 8564_u subfield z. Example:

invenio several files upload, file uploader

invenio file uploader resulting marcxml

There are several other parameters of function create_file_upload_interface still to be explored…

root@invenio1:/opt/invenio/lib/python/invenio/websubmit_functions# echo -e 'from invenio.websubmit_managedocfiles import create_file_upload_interface as f\nprint f.__doc__' | python
/usr/local/lib/python2.6/dist-packages/pkg_resources.py:2535: DeprecationWarning: functions overriding warnings.showwarning() must support the 'line' argument
  warn(stacklevel = level+1, *args, **kw)
/usr/lib/python2.6/dist-packages/Ft/Lib/ImportUtil.py:362: UserWarning: Module _hashlib was already imported from None, but /usr/local/lib/python2.6/dist-packages is being added to sys.path
  from pkg_resources import get_provider, resource_filename
 
    Returns the HTML for the file upload interface.
 
    @param recid: the id of the record to edit files
    @type recid: int or None
 
    @param form: the form sent by the user's browser in response to a
                 user action. This is used to read and record user's
                 actions.
    @param form: as returned by the interface handler.
 
    @param print_outside_form_tag: display encapsulating <form> tag or
                                   not
    @type print_outside_form_tag: boolean
 
    @param print_envelope: (internal parameter) if True, return the
                           encapsulating initial markup, otherwise
                           skip it.
    @type print_envelope: boolean
 
    @param include_headers: include javascript and css headers in the
                            body of the page. If you set this to
                            False, you must take care of including
                            these headers in your page header. Setting
                            this parameter to True is useful if you
                            cannot change the page header.
    @type include_headers: boolean
 
    @param ln: language
    @type ln: string
 
    @param minsize: the minimum size (in bytes) allowed for the
                    uploaded files. Files not big enough are
                    discarded.
    @type minsize: int
 
    @param maxsize: the maximum size (in bytes) allowed for the
                    uploaded files. Files too big are discarded.
    @type maxsize: int
 
    @param doctypes_and_desc: the list of doctypes (like 'Main' or
                              'Additional') and their description that users
                              can choose from when adding new files.
                                - When no value is provided, users cannot add new
                                  file (they can only revise/delete/add format)
                                - When a single value is given, it is used as
                                  default doctype for all new documents
 
                              Order is relevant
                              Eg:
                              [('main', 'Main document'), ('additional', 'Figure, schema. etc')]
    @type doctypes_and_desc: list(tuple(string, string))
 
    @param restrictions_and_desc: the list of restrictions (like 'Restricted' or
                         'No Restriction') and their description that
                         users can choose from when adding or revising
                         files. Restrictions can then be configured at
                         the level of WebAccess.
                           - When no value is provided, no restriction is
                             applied
                           - When a single value is given, it is used as
                             default resctriction for all documents.
                           - The first value of the list is used as default
                             restriction if the user if not given the
                             choice of the restriction. Order is relevant
 
                         Eg:
                         [('', 'No restriction'), ('restr', 'Restricted')]
    @type restrictions_and_desc: list(tuple(string, string))
 
    @param can_delete_doctypes: the list of doctypes that users are
                                allowed to delete.
                                Eg: ['main', 'additional']
                                Use ['*'] for "all doctypes"
    @type can_delete_doctypes: list(string)
 
    @param can_revise_doctypes: the list of doctypes that users are
                                allowed to revise
                                Eg: ['main', 'additional']
                                Use ['*'] for "all doctypes"
    @type can_revise_doctypes: list(string)
 
    @param can_describe_doctypes: the list of doctypes that users are
                                  allowed to describe
                                  Eg: ['main', 'additional']
                                  Use ['*'] for "all doctypes"
    @type can_describe_doctypes: list(string)
 
    @param can_comment_doctypes: the list of doctypes that users are
                                 allowed to comment
                                 Eg: ['main', 'additional']
                                 Use ['*'] for "all doctypes"
    @type can_comment_doctypes: list(string)
 
    @param can_keep_doctypes: the list of doctypes for which users can
                         choose to keep previous versions visible when
                         revising a file (i.e. 'Keep previous version'
                         checkbox). See also parameter 'keepDefault'.
                         Note that this parameter is ~ignored when
                         revising the attributes of a file (comment,
                         description) without uploading a new
                         file. See also parameter
                         Move_Uploaded_Files_to_Storage.force_file_revision
                         Eg: ['main', 'additional']
                         Use ['*'] for "all doctypes"
    @type can_keep_doctypes: list(string)
 
 
    @param can_add_format_to_doctypes: the list of doctypes for which users can
                              add new formats. If there is no value,
                              then no 'add format' link nor warning
                              about losing old formats are displayed.
                              Eg: ['main', 'additional']
                              Use ['*'] for "all doctypes"
    @type can_add_format_to_doctypes: list(string)
 
    @param can_restrict_doctypes: the list of doctypes for which users can
                             choose the access restrictions when adding or
                             revising a file. If no value is given:
                               - no restriction is applied if none is defined
                                 in the 'restrictions' parameter.
                               - else the *first* value of the 'restrictions'
                                 parameter is used as default restriction.
 
                             Eg: ['main', 'additional']
                             Use ['*'] for "all doctypes"
    @type can_restrict_doctypes : list(string)
 
    @param can_rename_doctypes: the list of doctypes that users are allowed
                           to rename (when revising)
                           Eg: ['main', 'additional']
                           Use ['*'] for "all doctypes"
    @type can_rename_doctypes: list(string)
 
    @param can_name_new_files: if user can choose the name of the files they
                         upload or not
    @type can_name_new_files: boolean
 
    @param doctypes_to_default_filename: Rename uploaded files to admin-chosen
                                 values. List here the the files in
                                 current submission directory that
                                 contain the names to use for each doctype.
                                 Eg:
                                 {'main': RN', 'additional': 'additional_filename'}
 
                                 If the same doctype is submitted
                                 several times, a"-%i" suffix is added
                                 to the name defined in the file.
 
                                 The default filenames are overriden
                                 by user-chosen names if you allow
                                 'can_name_new_files' or
                                 'can_rename_doctypes'.
    @type doctypes_to_default_filename: dict
 
    @param max_files_for_doctype: the maximum number of files that users can
                          upload for each doctype.
                          Eg: {'main': 1, 'additional': 2}
 
                          Do not specify the doctype here to have an
                          unlimited number of files for a given
                          doctype.
    @type max_files_for_doctype: dict
 
    @param create_related_formats: if uploaded files get converted to
                                     whatever format we can or not
    @type create_related_formats: boolean
 
    @param keep_default: the default behaviour for keeping or not previous
                     version of files when users cannot choose (no
                     value in can_keep_doctypes).
                     Note that this parameter is ignored when revising
                     the attributes of a file (comment, description)
                     without uploading a new file. See also parameter
                     Move_Uploaded_Files_to_Storage.force_file_revision
    @type keep_default: boolean
 
    @param show_links: if we display links to files when possible or
                         not
    @type show_links: boolean
 
    @param file_label: the label for the file field
    @type file_label: string
 
    @param filename_label: the label for the file name field
    @type filename_label: string
 
    @param description_label: the label for the description field
    @type description_label: string
 
    @param comment_label: the label for the comments field
    @type comment_label: string
 
    @param restriction_label: the label in front of the restrictions list
    @type restriction_label: string
 
    @param sbm_indir: the submission indir parameter, in case the
                      function is used in a WebSubmit submission
                      context.
                      This value will be used to retrieve where to
                      read the current state of the interface and
                      store uploaded files
    @type sbm_indir : string
 
    @param sbm_doctype: the submission doctype parameter, in case the
                        function is used in a WebSubmit submission
                        context.
                        This value will be used to retrieve where to
                        read the current state of the interface and
                        store uploaded files
    @type sbm_doctype: string
 
    @param sbm_access: the submission access parameter. Must be
                       specified in the context of WebSubmit
                       submission, as well when used in the
                       WebSubmit Admin file management interface.
 
                       This value will be used to retrieve where to
                       read the current state of the interface and
                       store uploaded files
    @type sbm_access: string
 
    @param sbm_curdir: the submission curdir parameter. Must be
                       specified in the context of WebSubmit
                       function Create_Upload_File_Interface.
 
                       This value will be used to retrieve where to
                       read the current state of the interface and
                       store uploaded files.
    @type sbm_curdir: string
 
    @param uid: the user id
    @type uid: int
 
    @return Tuple (errorcode, html)

CDS Invenio: format templates box in advanced search and search results screen [SOLVED]

Invenio 0.99x offers the option to control which output formats are enabled in searches and collections. For instance, if we take a look at the advanced search screen inside of the particular collections of my Invenio instance (more precisely, ‘Trabajos Académicos’), you’ll see this:

invenio_websearch

Why are these output formats being showed, and why in that order? Because in Websearch module, I made those formats (and just those) available in my ‘Trabajos Académicos’ collection and ordered them in that way:

invenio_websearch_webcoll

To this point, everything works as expected.

But let’s go further and perform a search using that advanced search screen in the ‘Trabajos Académicos’ collection. Let’s search for any term. For demonstration purposes, I’ll search ‘water’. This is the screen I get. Note the output formats available now…

invenio_search_engine

Every format available is being shown! Not just the output formats we defined at Websearch…. Why’s that?

If you take a look at the code, you’ll notice that first output (advanced search within a particular collection) is being produced by create_formatoptions function defined at websearch_webcoll.py. This create_formatoptions function is called from create_searchfor_advanced (also defined at websearch_webcoll.py). This is the snippet of code that produces that output (first image in this post):

569     def create_formatoptions(self, ln=CFG_SITE_LANG):
570         "Produces 'Output format options' portal box."
571 
572         # load the right message language
573         _ = gettext_set_language(ln)
574 
575         box = ""
576         values = []
577         query = """SELECT f.code,f.name FROM format AS f, collection_format AS cf
578                    WHERE cf.id_collection=%d AND cf.id_format=f.id AND f.visibility='1'
579                    ORDER BY cf.score DESC, f.name ASC"""  % self.id
580         res = run_sql(query)
581         if res:
582             for row in res:
583                 values.append({'value' : row[0], 'text': row[1]})
584         else:
585             values.append({'value' : 'hb', 'text' : "HTML %s" % _("brief")})
586         box = websearch_templates.tmpl_select(
587                    fieldname = 'of',
588                    css_class = 'address',
589                    values = values
590                   )
591         return box

The second output, the one corresponding to search results, is produced by create_search_box at search_engine.py. This function is called from function tmpl_search_box (defined at websearch_templates.py). Here is the snippet of code from search_engine.py that produces that output (third image in this post) along with the corresponding lines at search_engine.py:

 742 def create_search_box(cc, colls, p, f, rg, sf, so, sp, rm, of, ot, as,
 743                       ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3,
 744                       m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, dt, jrec, ec,
 745                       action=""):
 746 
 747     """Create search box for 'search again in the results page' functionality."""
 748 
 
...
 
 806     formats = []
 807     query = """SELECT code,name FROM format WHERE visibility='1' ORDER BY name ASC"""
 808     res = run_sql(query)
 809     if res:
 810         # propose found formats:
 811         for code, name in res:
 812             formats.append({ 'value' : code,
 813                              'text' : name
 814                            })
 815     else:
 816         formats.append({'value' : 'hb',
 817                         'text' : _("HTML brief")
 818                        })

So now we know why the outputs differ.

Every output format is stored at ‘format’ table in your database:

mysql> SELECT code,name FROM format WHERE visibility='1' ORDER BY name ASC;
+--------+------------------------------------+
| code   | name                               |
+--------+------------------------------------+
| hx     | BibTeX                             | 
| xd     | Dublin Core                        | 
| xe     | EndNote                            | 
| hb     | HTML brief                         | 
| hbgeol | HTML Brief Geol                    | 
| hcs    | HTML citesummary                   | 
| hd     | HTML detailed                      | 
| hlight | HTML Light (invocaciones externas) | 
| hm     | MARC                               | 
| xm     | MARCXML                            | 
| mets   | METS                               | 
| xn     | NLM                                | 
| hc     | photo captions ONLY                | 
| hp     | portfolio                          | 
| premis | PREMIS                             | 
| xw     | RefWorks                           | 
| untld  | Untitled                           | 
| untld2 | Untitled                           | 
+--------+------------------------------------+
18 ROWS IN SET (0.00 sec)

This table is quite simple. There is this interesting parameter called ‘visibility’:

mysql> DESC format;
+--------------+-----------------------+------+-----+---------+----------------+
| FIELD        | TYPE                  | NULL | KEY | DEFAULT | Extra          |
+--------------+-----------------------+------+-----+---------+----------------+
| id           | mediumint(9) UNSIGNED | NO   | PRI | NULL    | AUTO_INCREMENT | 
| name         | VARCHAR(255)          | NO   |     | NULL    |                | 
| code         | VARCHAR(6)            | NO   | UNI | NULL    |                | 
| description  | VARCHAR(255)          | YES  |     |         |                | 
| content_type | VARCHAR(255)          | YES  |     |         |                | 
| visibility   | tinyint(4)            | NO   |     | 1       |                | 
+--------------+-----------------------+------+-----+---------+----------------+
6 ROWS IN SET (0.01 sec)

If you set a format’s visibility to ‘0’ then that format won’t be outputed. I’ll be setting to ‘0’ some values to exlude those formats from visualization, just for demonstration purposes:

mysql> SELECT * FROM format WHERE visibility = 0;
+----+--------------------+--------+--------------------------------------------------------------------------+----------------------+------------+
| id | name               | code   | description                                                              | content_type         | visibility |
+----+--------------------+--------+--------------------------------------------------------------------------+----------------------+------------+
| 11 | Excel              | excel  | Excel csv output                                                         | application/ms-excel |          0 | 
| 12 | HTML similarity    | hs     | Very short HTML output FOR similarity box (<i>people also viewed..</i>). | text/html            |          0 | 
| 13 | RSS                | xr     | RSS.                                                                     | text/xml             |          0 | 
| 14 | OAI DC             | xoaidc | OAI DC.                                                                  | text/xml             |          0 | 
| 15 | File mini-panel    | hdfile | Used TO SHOW fulltext files IN mini-panel OF detailed record pages.      | text/html            |          0 | 
| 16 | Actions mini-panel | hdact  | Used TO display actions IN mini-panel OF detailed record pages.          | text/html            |          0 | 
| 17 | REFERENCES tab     | hdref  | Display record REFERENCES IN REFERENCES tab.                             | text/html            |          0 | 
+----+--------------------+--------+--------------------------------------------------------------------------+----------------------+------------+
7 ROWS IN SET (0.01 sec)
 
mysql> UPDATE format SET visibility=0 WHERE id='hbgeol';
Query OK, 0 ROWS affected (0.00 sec)
ROWS matched: 0  Changed: 0  Warnings: 0

After setting some of the output format visibility to 0, these are the ones who have visibility=1:

mysql> SELECT * FROM format WHERE visibility=1 ORDER BY name;
+----+------------------+--------+--------------------------------------------------------------+--------------+------------+
| id | name             | code   | description                                                  | content_type | visibility |
+----+------------------+--------+--------------------------------------------------------------+--------------+------------+
|  8 | BibTeX           | hx     | BibTeX.                                                      | text/html    |          1 | 
|  4 | Dublin Core      | xd     | XML Dublin Core.                                             | text/xml     |          1 | 
|  9 | EndNote          | xe     | XML EndNote.                                                 | text/xml     |          1 | 
|  1 | HTML brief       | hb     | HTML brief output format, used FOR SEARCH results pages.     | text/html    |          1 | 
| 18 | HTML citesummary | hcs    | HTML cite summary format, used FOR SEARCH results pages.     | text/html    |          1 | 
|  2 | HTML detailed    | hd     | HTML detailed output format, used FOR Detailed record pages. | text/html    |          1 | 
|  3 | MARC             | hm     | HTML MARC.                                                   | text/html    |          1 | 
|  5 | MARCXML          | xm     | XML MARC.                                                    | text/xml     |          1 | 
| 22 | METS             | mets   | Formato METS.                                                | text/xml     |          1 | 
| 10 | NLM              | xn     | XML NLM.                                                     | text/xml     |          1 | 
| 24 | PREMIS           | premis | PREMIS                                                       | text/xml     |          1 | 
| 19 | RefWorks         | xw     | RefWorks.                                                    | text/xml     |          1 | 
+----+------------------+--------+--------------------------------------------------------------+--------------+------------+
12 ROWS IN SET (0.00 sec)

And the corresponding output for search results (as a result of performing a search):
CDS Invenio output formats at search results

So, what to do if you want to have ALWAYS just the output formats defined for some collection, no matter if it is the “advanced search screen” or the “search results screen”. Just edit search_engine.py from this:

query = """SELECT code,name FROM format WHERE visibility='1' ORDER BY name ASC"""
    res = run_sql(query)
    if res:
        # propose found formats:
        for code, name in res:
            formats.append({ 'value' : code,
                             'text' : name
                           })
    else:
        formats.append({'value' : 'hb',
                        'text' : _("HTML brief")
                       })

To this:

#query que saca solo los output formats propios de la coleccion... (miguel)
    query = """SELECT f.code,f.name FROM format AS f, collection_format AS cf 
                WHERE cf.id_collection=%d AND cf.id_format=f.id AND f.visibility='1'
                ORDER BY cf.score DESC, f.name ASC""" % cc_colID
    #query que saca todos los formatos de salida, sean o no de la coleccion (original de invenio)
    #query = """SELECT code,name FROM format WHERE visibility='1' ORDER BY name ASC"""
    res = run_sql(query)
    if res:
        # propose found formats:
        for code, name in res:
            formats.append({ 'value' : code,
                             'text' : name
                           })
    else:
        formats.append({'value' : 'hb',
                        'text' : _("HTML brief")
                       })

And remember to run inveniocfg –update-all and then restart your Apache server:

[root@zaguan invenio]# sudo -u apache inveniocfg --update-all; /etc/init.d/httpd restart

And lets see the changes produced at the search results screen…
invenio output format list at search results screen

Hope this was useful!!

[SOLVED] Fix apple-touch-icon 404 errors

Some days ago I was checking AWStats reports in my Invenio site and I noticed some (unexpected) 404 errors. Visitors were trying to load URL’s like /iphone or /m and some images which were not linked in my site… (‘apple-touch-icon.png‘ and similar filenames).

apple-touch-icon-precomponsed.png 404 fix

This has to do with some bots coming along, assuming that my site includes a mobile version, and then trying its hand at guessing the location. In the common request-set listed above, we see the bot looking first for an “apple-touch icon,” and then for mobile content in various directories.

But what about thoses images? Take a read at: http://www.computerhope.com/jargon/a/appletou.htm

Similar to the Favicon, the apple-touch-icon.png is a file used for a web page icon on the Apple iPhone, iPod Touch, and iPad. When someone bookmarks your web page or adds your web page to their home screen this icon is used. If this file is not found these Apple products will use the screen shot of the web page, which often looks like no more than a white square.

This file should be saved as a .png, have dimensions of 57 x 57, and be stored in your home directory, unless the path is specified in the HTML using the below code.

When this file is used, by default, the Apple product will automatically give the icon rounded edges and a button-like appearance.

I wanted to fix this, so I began by testing if mod_rewrite was enabled…

[root@aneto www]# grep -R "mod_rewrite" /etc/httpd/conf/
/etc/httpd/conf/httpd.conf:LoadModule rewrite_module modules/mod_rewrite.so

The LoadModule line is uncommented, so it is enabled.

Next step would be to try a basic redirection to test mod_rewrite.

Edit $PATH_TO_INVENIO/etc/apache/invenio-apache-vhost.conf and added this lines in the VirtualHost part.

<ifmodule mod_rewrite.c>
           RewriteEngine  On
           RewriteLog "/home/apache/rewrite.log"
           RewriteLogLevel 9
           RewriteRule old.html bar.html [R]
</ifmodule>

Then restart apache…

[root@aneto ~]# /etc/init.d/httpd  restart

Then open your browser and test the redirection. You should be redirected and the /home/apache/rewrite.log should log that redirection…

[root@aneto ~]# tail -n20 -f /home/apache/rewrite.log
 
155.210.47.93 - - [05/Jul/2013:12:21:06 +0200] [155.210.47.102/sid#2b689442f0c8][rid#2b68947a7380/initial] (2) init rewrite engine with requested uri /old.html
155.210.47.93 - - [05/Jul/2013:12:21:06 +0200] [155.210.47.102/sid#2b689442f0c8][rid#2b68947a7380/initial] (3) applying pattern 'old.html' to uri '/old.html'
155.210.47.93 - - [05/Jul/2013:12:21:06 +0200] [155.210.47.102/sid#2b689442f0c8][rid#2b68947a7380/initial] (2) rewrite '/old.html' -> 'bar.html'
155.210.47.93 - - [05/Jul/2013:12:21:06 +0200] [155.210.47.102/sid#2b689442f0c8][rid#2b68947a7380/initial] (2) explicitly forcing redirect with http://155.210.47.102/bar.html
155.210.47.93 - - [05/Jul/2013:12:21:06 +0200] [155.210.47.102/sid#2b689442f0c8][rid#2b68947a7380/initial] (1) escaping http://155.210.47.102/bar.html for redirect
155.210.47.93 - - [05/Jul/2013:12:21:06 +0200] [155.210.47.102/sid#2b689442f0c8][rid#2b68947a7380/initial] (1) redirect to http://155.210.47.102/bar.html [REDIRECT/302]
155.210.47.93 - - [05/Jul/2013:12:21:06 +0200] [155.210.47.102/sid#2b46e1df80d8][rid#2b46eac962e0/initial] (2) init rewrite engine with requested uri /bar.html
155.210.47.93 - - [05/Jul/2013:12:21:06 +0200] [155.210.47.102/sid#2b46e1df80d8][rid#2b46eac962e0/initial] (3) applying pattern 'old.html' to uri '/bar.html'
155.210.47.93 - - [05/Jul/2013:12:21:06 +0200] [155.210.47.102/sid#2b46e1df80d8][rid#2b46eac962e0/initial] (1) pass through /bar.html

Now that we know that mod_rewrite is working properly, lets add some code to forbid some URL patterns (more refences)…

<ifmodule mod_rewrite.c>
           RewriteEngine  On
           RewriteLog "/home/apache/rewrite.log"
           RewriteLogLevel 9
           #RewriteRule old.html bar.html [R]
           RewriteCond %{REQUEST_URI} /iphone/?$ [NC,OR]
           RewriteCond %{REQUEST_URI} /mobile/?$ [NC,OR]
           RewriteCond %{REQUEST_URI} /mobi/?$ [NC,OR]
           RewriteCond %{REQUEST_URI} /m/?$ [NC]
           RewriteRule (.*) - [F,L]
</ifmodule>

This technique is useful for saving bandwidth and server resources, not just for non-existent mobile-ish requests, but also for any resource that you would like to block – just add a RewriteCond with the target character string of your choice. Hopefully this technique will help you run a cleaner, safer, and more secure website.

Now, what to do with those apple-touch-icon-precomposed.png and similar images which are ending in 404 errors?

First read full Apple documentation about this issue.

Then you can fix it several ways:

1) Search for those images and download them to /soft/cds-invenio/var/www/

cd /soft/cds-invenio/var/www/
wget http://gwt-touch.googlecode.com/svn-history/r86/trunk/demo-ipad-settings/war/apple-touch-icon-precomposed.png

Or you can create some personalized images using online services like http://iconifier.net/

Captura de pantalla 2013-07-05 a la(s) 14.32.55

And you’re ready to go! Logs won’t show those ugly 404 errors from now on and visitors using iphone’s will be happier 🙂

CDS Invenio 0.99.X: inveniogc ERROR [SOLVED]

Some days ago I noticed there was something wrong with inveniogc. Every time I run inveniogc -a I was getting errors like:

2013-04-17 08:31:30 --> 2013-04-17 08:31:30 --> Updating task status to ERROR.
2013-04-17 08:31:30 --> Task #21731 finished. [ERROR]

Calling inveniogc with verbose level = 9 I got some more information (var/log/bibsched_task_XXXX.log and .err files):

2013-04-17 08:29:51 --> - deleting queries not attached to any user
 
2013-04-17 08:29:51 -->   SELECT DISTINCT q.id
  FROM query AS q LEFT JOIN user_query AS uq
  ON uq.id_query = q.id
  WHERE uq.id_query IS NULL AND
  q.type <> 'p' 
 
2013-04-17 08:31:30 --> 2013-04-17 08:31:30 --> Updating task status to ERROR.
2013-04-17 08:31:30 --> Task #21731 finished. [ERROR]

The issue arised when inveniogc tried to delete user queries not attached to any user. I edited lib/python/invenioinveniogc.py and noticed the error was being produced by the output of a query result being printed. Just commented that out and inveniogc works again:

write_message("""  SELECT DISTINCT q.id\n  FROM query AS q LEFT JOIN user_query AS uq\n  ON uq.id_query = q.id\n  WHERE uq.id_query IS NULL AND\n  q.type <> 'p' """, verbose=9)
result = run_sql("""SELECT DISTINCT q.id
                    FROM query AS q LEFT JOIN user_query AS uq
                    ON uq.id_query = q.id
                    WHERE uq.id_query IS NULL AND
                          q.type <> 'p'""")
 
# write_message(result, verbose=9)

Why is this? It seems that the output buffer that write_message is using is too small to store the result of the previous query, so it fails…

CDS Invenio: batch delete records or interval of records (from python interpreter)

Sometime ago I came up with this little hack to add invenio the functionality to delete a record from command line.

If you need to delete a lot of records (i.e. in your testing/development server), you can add this other hack to bibeditcli.py:

Delete several records from invenio: the dirty way

This works, but is not necesarily the way to go. There is another way to achieve same result (records deleted) but does not over load Bibsched with a task for each record. We’ll go over that one later, though.

First thing first: lets go the dirrrrrty way:

def cli_delete_interval(recid_inicio, recid_fin):
    """
    Delete records from recid_inicio to recid_fin, both included
    You'd better make sure...
    """
    try:
        recid_inicio = int(recid_inicio)
    except ValueError:
        print "ERROR: First Record ID must be integer, not %s:" %recid_inicio
        sys.exit(1)
    try:
        recid_fin = int(recid_fin)
    except ValueError:
        print "ERROR: End record ID must be integer, not %s." %recid_fin
        sys.exit(1)
 
    if recid_inicio > recid_fin:
        print "ERROR: First record ID must be less than last record ID."
        sys.exit(1)
 
    for recid in range(recid_inicio, recid_fin):
        (record, junk) = get_record(CFG_SITE_LANG, recid, 0, "false")
        add_field(recid, 0, record, "980", "", "", "c", "DELETED")
        save_temp_record(record, 0, "%s.tmp" % get_file_path(recid))
        save_xml_record(recid)

This is how you call this new function from python.
First, navigate to $PATH_TO_INVENIO/lib/python and run your python interpreter

[miguel@mydevinvenioinstance ~]# cd /soft/cds-invenio/lib/
[miguel@mydevinvenioinstance lib]# python
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

And then, just…

>>> import invenio
>>> from invenio.bibeditcli import cli_delete_interval
>>> # the following line will delete records from ID=5125 to ID=7899 .... 
>>> # BE CAREFUL! GREAT POWER COMES WITH GREAT RESPONSIBILITY
>>> 
>>> cli_delete_interval(5125,7899)

Delete several records from Invenio: the not-so-dirty way

If you take a look at the new cli_delete_interval we just came up with, or run it over a big interval, a whole lot of new tmp files will be generated and a lot of tasks will be sent to bibsched (one for every record.). Not efficient. Not nice.

This code is better. Just one tmp file (which will be deleted upon termination) and one single task sent to bibsched.
Please notice the # EDIT HERE! part at line 13

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def cli_delete_interval(recid_inicio, recid_fin):
    """
    By: Miguel Martin 20120130 
    Goal:
      Delete records from recid_inicio to recid_fin, both included
      Creates just a tmp file and a task (just one) is sent to bibsched
    """
 
    from invenio.bibrecord import record_xml_output
    from invenio.bibtask import task_low_level_submission
 
    # EDIT HERE! FILEPATH MUST BE READABLE/WRITABLE! ######
    tmpfile = "/home/miguelm/tmp/borrado.xml" 
    # #####################################################
 
    try:
        recid_inicio = int(recid_inicio)
    except ValueError:
        print "ERROR: First Record ID must be integer, not %s:" %recid_inicio
        sys.exit(1)
    try:
        recid_fin = int(recid_fin)
    except ValueError:
        print "ERROR: End record ID must be integer, not %s." %recid_fin
        sys.exit(1)
 
    if recid_inicio > recid_fin:
        print "ERROR: First record ID must be less than last record ID."
        sys.exit(1)
 
    fd = open(tmpfile, "w")
    for recid in range(recid_inicio, recid_fin):
        (record, junk) = get_record(CFG_SITE_LANG, recid, 0, "false")
        add_field(recid, 0, record, "980", "", "", "c", "DELETED")
        fd.write(record_xml_output(record))
 
    fd.close()
    task_low_level_submission('bibupload', 'bibedit', '-P', '5', '-r', '%s' % tmpfile)
    #os.system("rm %s" % tmpfile)

Cheers!

— update 20130628 —

Delete all the records

If you want to wipe out all the existing bibliographic content of your site, for example to start uploading the documents from scratch again, you can launch:

       $ /opt/invenio/bin/dbexec < /opt/invenio/src/invenio-0.90/modules/miscutil/sql/tabbibclean.sql
       $ rm -rf /opt/invenio/var/data/files/*
       $ /opt/invenio/bin/webcoll
       $ /opt/invenio/bin/bibindex --reindex