Archive for the ‘CDS Invenio’ Category

CDS Invenio: bibrank exceptions [SOLVED]

The problem

Our Invenio was spitting exceptions like the following:

Forced traceback (most recent call last)
  File "/usr/lib64/python2.4/site-packages/invenio/bibrank.py", line
150, in task_run_core
    func_object(key)
  File
"/usr/lib64/python2.4/site-packages/invenio/bibrank_word_indexer.py",
line 1210, in word_similarity
    return word_index(run)
  File
"/usr/lib64/python2.4/site-packages/invenio/bibrank_word_indexer.py",
line 839, in word_index
    update_rnkWORD(options["table"], options["modified_words"])
Traceback (most recent call last):
  File
"/usr/lib64/python2.4/site-packages/invenio/bibrank_word_indexer.py",
line 1114, in update_rnkWORD
    Nj[j] = Nj.get(j, 0) + math.pow(Gi[t] * (1 + math.log(tf[0])), 2)
OverflowError: math range error

More precisely, this one is related to record 4573:

Error when analysing the record 4573 (((3350, 'x\x9cu\x93\xdbr\xdc
\x0c\x867\xa7\xa69\xf59z\xd9W\xd2b\xd9V\x02\x88\n\xd8\xcd\xb4/\x1f\xe4MvfA\xb90c\xe6\x1f\x1d\xf8\xf4\xeb\x7f~\xdc\xedvB\x18
F\x82\xf8\xfb\xba]\xe9J\x8f\xf6\xed\xf2C;V\x94\x08q\xc2\x7f\'\xf1\xfa,\xfeh\x87\xe3\x90j\xe9\xc3T\x99pf\t\x96\x128\x92\x03+[\xc0"\xe4\xfa\x98{m\x11\x17\xcaEN\xd2\xcbY\xbaU\t\nZe\x04s\xf5\xa5/\xf3S\xcb@\xa4\x99\xfdd\x15ZZ6\xa8\xefVB\x90B\xce\xf7\xca\x9d\x1e+\x1f\x07v\xf7\x1b\x84\xec0\x9a|\x94\x1c\x88\x99-\x81\xfb|\xd1\xcdE\xb6\x84b1\xd5\'\x81sU\xc0\x91\x95/\xd6\x80v\xa1b\xa0S\xa4\x1ed\xb1\xac\x00~a\xa1\xb2\x9ac\xc5\xf7\xd6\xdf\xd0\xc0\x06.Ba\xdb\nXV\x9e\xfa\xb7n\xf6\xa1yF\xb6\x9e:\xd7\xe8\n\xf1\xc0\xfbf\x9bl\xb2\xcaD<\x96\xaf\x80_\x17\x08\xa6\xd6\xf2\xc1b\xc3\xa9P\xe8#\x94\r\x05\x18\xd8h[\x8b\xc0D\xc6\xac\xef6S\xd5\x9c\xfbd\nt\x16\x08xdy\xb3\xccS
.FB\x95\xf6\xaf\xd8\x18\x0c\x8d\xebu?\x8cZ\xbb\x8e\xf4\xb7\x9aK\xc2\xfb\x

But a lot of records (4573, 4336, 4487, 4337, … ) were producing similar errors every time my scheduled bibrank task was run…

The fix

[Thank you so much, Samuele Kaplun!]

This is normally due to word similarities indexes that accumulated too much
errors and need to be re-balanced. You can safely discard these exceptions
(though it’s true they might quickly fill your mailbox :-) ).

Usually bibrank word similarity indexes are built in a fast way that works
most of the time, but is approximate and lead to some approximation errors
after a certain usage period.

To solve this, you usually just need to schedule (e.g. weekly) a bibrank -R to
rebalance these indexes.

Try e.g. with:

$ sudo -u apache /opt/cds-invenio/bin/bibrank -R -wwrd -s7d -uadmin

and see if this solve your problem.

(See also this linkwhere this is
also short-explained).

CDS-Invenio: remove fulltext tag from HTML detailed

If you are using CDS Invenio default template, the HTML detailed view is organized in tabs (Information, References, Citations, Discussion, Usage Statistics, Fulltext).

The last tab shows the fulltexts attatched to a record only if the have been loaded via Bibrecdoc. If you click this tab link you will be redirected to something like http://zaguan.unizar.es/record/4343/files/ (notice the /files trailing part).

cds invenio fulltext tab

If your fulltext consists of an external url -or even a local file which is directly referenced, which are usually files that have not been loaded using a websubmit form- this tab won’t be clickable (the tab will be disabled)…

cds invenio fulltext tab

… but typing the blablabla/files URL by hand will lead to an empty fulltext page. Not cool.

cds invenio fulltext tab

I do not want my users to get used to this /files link structure, so I want to delete this tab and show fulltext links just in the main (Information) tab.

cds invenio delete fulltext tab

How do you do this? Easy! Open your webstyle_templates.py file:

vi /soft/cds-invenio/lib/python/invenio/webstyle_templates.py

Search for these lines (usually around line 685):

                if not enabled:
                    out_tabs += '<li%(class)s><a>%(label)s</a></li>' % \
                                {'class':css_class,
                                 'label':label}
                else:
                    out_tabs += '<li%(class)s><a href="%(url)s">%(label)s</a></li>' % \
                                {'class':css_class,
                                 'url':url,
                                 'label':label}

And change them to:

                #if not enabled:
                if not enabled and label != _('Fulltext'):
                    out_tabs += '<li%(class)s><a>%(label)s</a></li>' % \
                                {'class':css_class,
                                 'label':label}
                #else:
                elif label != _('Fulltext'):
                    out_tabs += '<li%(class)s><a href="%(url)s">%(label)s</a></li>' % \
                                {'class':css_class,
                                 'url':url,
                                 'label':label}

Save your changes, close the editor and run:

sudo -u apache /soft/cds-invenio/bin/inveniocfg --update-all; /etc/init.d/httpd restart

Now that tab will not be visible, no matter the record has or has not a bibrecdoc fulltext:

cds invenio delete fulltext tab

The /files url will still be working though.

CDS Invenio: Internet Explorer 8, https, css and images not loading [SOLVED]

I have to admit I hate IE8. I do not use it except for testing that pages are displayed right. Some days ago a user wrote complaining about the visualization of Your options – Your submissions page: the CSS and images were not being loaded.

Internet Explorer 8 assumes that, in an SSL (https) connection, every element has to be loaded via HTTPS. If this does not happen, IE8 displays a security alert so that user can decide to load / not load the “insecure” elements.

The problem

I used httpwatch free basic version when loading the page and noticed the CSS url was starting with “http” and not “https”.

Altought mozilla does not complain about https pages loading http contents, IE8 does.

This is what happened when loading yoursubmissions.py page (the warning text is in spanish, sorry):

yoursubmissions.py issues with IE8

If you click “yes” none of the CSS nor the images are loaded and everything looks awful:

CDS Invenio, css and https

On the other side, if you hit “no” the page looks fine.

The fix

Samuele, from CDS Support Team, gave me a great advice: edit /var/www/yoursubmissions.py, look for the return page(..." line and add secure_page_p=1. This forces yoursubmissions function (defined in webstyle_templates.py) to load CSS via https.

Then run:

sudo -u apache /soft/cds-invenio/bin/inveniocfg --update-all; /etc/init.d/httpd restart

Cool! Now the CSS is working… but the gif’s won’t load. Shit!

Digging in webstyle_templates.py code (more precisely, I noticed that images were being loaded using CFG_SITE_URL instead of CFG_SITE_SECURE_URL in yoursubmissions function. So I added the CFG_SITE_SECURE_URL variable to the imports:

from invenio.config import \
     CFG_SITE_URL, \
     CFG_VERSION, \
     CFG_SITE_URL, \
     CFG_SITE_LANG, \
     CFG_SITE_SECURE_URL

and changed some CFG_SITE_URL to CFG_SITE_SECURE_URL in the image variable assignations. Then you must run:

sudo -u apache /soft/cds-invenio/bin/inveniocfg --update-all; /etc/init.d/httpd restart

I checked everything was working fine after this changes. It was, so I made something similar with:
yourapprobals.py
publiline.py
websession_weinterface.py
webbasket_templates.py
webalert_webinterface.py
webbasket_webinterface.py
webmessage_webinterface.py

I see the page correctly, but the IE8 warning is still showing up

Culprits? Some element(s) is (are) still referenced via HTTP and not HTTPS. You should use a sniffer (for instance, httpwatch) to check which are these elements.

One of the most common is Google analytics urchin file, which is usually loaded like:

<script src="http://www.google-analytics.com/urchin.js" type="text/javascript"></script>
<script type="text/javascript">
try {
  _uacct = "UA-6988718-1";
  urchinTracker();
} catch(err) {}</script>

And should be changed to:

<script src="https://ssl.google-analytics.com/urchin.js" type="text/javascript"></script>
<script type="text/javascript">
try {
  _uacct = "UA-6988718-1";
  urchinTracker();
} catch(err) {}</script>

(This lines are usually in your webstyle_templates_yoursitename.py).

*** 2010-03-12 UPDATE ***
The new google analytics code version (which calls ga.js) does this http/https thing on its own. In fact, here is an example of the call to ga.js (just like the one google provides). Thanks Samuele for pointing this out.

<script type="text/javascript">
 var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." 
: "http://www.");
 document.write(unescape("%3Cscript src='" + gaJsHost + 
"google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
 </script>
 <script type="text/javascript">
 try{
 var pageTracker = _gat._getTracker("UA-xxxxxx-x");
 pageTracker._trackPageview();
 } catch(err) {}
</script>

After every change you must remember to run:

sudo -u apache /soft/cds-invenio/bin/inveniocfg --update-all; /etc/init.d/httpd restart

Hope it helps someone ;)

CDS Invenio: Webaccess admin lib bug-fix

Some days ago I found a bug in webaccess, more precisely when dealing with roles and authorizations which involve special characters.

The bug

Lets suppose I want to restrict the execution of bibedit to role named ‘EditoresTAZ’ (existing role in my repository).

I go to http://my.repository.url/admin/webaccess/webaccessadmin.py/addauthorization
Then select role=’EditoresTAZ’, action=’runbibedit’ and check “connect editoresTAZ to runbibedit for only these argument cases: “.

When I enter a word with accents (i.e. “Trabajos académicos”, the ‘primary collection of a record’ and confirm the creation of the new authorization, there is some issue with accents. If I later check the recently created authorization the stored value for parameter collection is “Trabajos acadmicos” instead of “Trabajos académicos”. I guess this is due to some input character escaping.

The dirty fix

I have checked the accARGUMENT table in database and changed the value to ‘Trabajos académicos’ and the authorizations work fine.

The cool patch

The guys from CDS Support Team have already fixed things and the patch is in their GIT!

Take a look: cds-invenio GIT.

Working like a charm now :)

CDS Invenio: Get Record number (recid, sysno) from reference number

Table sbmSUBMISSIONS has a lot of information related to the actions performed by websubmit (new record creation, modifications, etc). It stores the user id that performed the websubmit action, the action’s type (SBI, MBI, APP), the reference or report number (i.e. BOOK–2010-005), the date, time and status of the action.

This is a great piece of information. But, surprisingly (at least it is to me) the recid (also referred as sysno) is NOT stored in that table. I guess there is a (good?) reason for this, but I am not aware of it.

Well, which is the quickest way to get recid (sysno) from a report number? This is it:

>>> from invenio.websubmit_functions.Get_Recid import get_existing_records_for_reportnumber
>>> recid = get_existing_records_for_reportnumber('TAZ-2010-078')
>>> print recid
[4404]

CDS Invenio: query database to know a tag value from a record

Which is the quickest way to know the tag’s value of a record using CDS Invenio functions? Imagine you want to know the value for the 8564_u tag of record which sysno=4403

Easy! First, run python. Then:

>>> from invenio.dbquery import run_sql
>>> from invenio.websubmit_functions.Create_Modify_Interface_TAZ import Create_Modify_Interface_getfieldval_fromDBrec
>>> value =  Create_Modify_Interface_getfieldval_fromDBrec('8564_u',4403)
>>> print value
http://aneto.unizar.es/TAZ/CPS/2010/4403/TAZ-2010-077.pdf

There is only a small snag… the database is not always updated, so this information is not absolutely consistent. If the record has been modified recently the value might not be up to date.

CDS-Invenio: Change SBI process – not referred records, restricted fulltext access

These past days I have been talking a lot about CDS-Invenio, websubmit module and Apache’s .htaccess files (part one and two).

In this blog you can also find some posts that show how to grant access to fulltexts using CDS-Invenio (refer to part one and part two). In those posts a new websubmit function was created in the approval pipeline, and new roles and permissions were defined in webaccess so that fulltext access would be allowed to some iprange only.

The goal

Well, now we have a different need: we want a non-refereed submit process (this is, no need to approve records) so that the record and its metadata can be read no matter what the consulting IP is. But (and here comes the funny part) we want the fulltext file to be accessed only by:
- the submitter
- some privileged users (a subset of ldap-authenticated users and some file-authenticated users). Refer to this post for further details.

We would also want to have the chance to “edit” (modify) those records so that access to fulltext is allowed to everyone. This modifying process will be done, if needed, some time after the record is submitted.

Define new doctype, create form page and SBI functions

First of all, lets define a new doctype called TAZ to which the mods will be applied.

Now we will make a new submit form (refer to your CDS Invenio manual) and a new submit (SBI) pipeline. Here are the functions I’ve used:

  • Create_Recid
  • Report_Number_Generation
  • Make_Dummy_MARC_XML_Record
  • Move_Files_To_Storage_TAZ
  • Make_Record
  • Insert_Record
  • Print_Success
  • Mail_Submitter

If you are familiar to CDS-Invenio you will notice that there is just a new function involved. This is Move_Files_To_Storage_TAZ. It is pretty similar to default Move_Files_To_Storage function.

First of all, lets remember what Move_Files_To_Storage function does:
When the record is created its metadata is stored in a running directory like /soft/cds-invenio/var/data/submit/storage/running/TAZ/1263304104_5564/. Lets take a look at the contents of that directory:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
. Remember that <code>Move_Files_To_Storage</code> has not been executed yet.
[root@aneto]# ls -l /soft/cds-invenio/var/data/submit/storage/running/TAZ/1263304104_5564/
total 176
-rw-r--r-- 1 apache apache   15 Jan 12 14:48 access
-rw-r--r-- 1 apache apache    3 Jan 12 14:48 act
-rw-r--r-- 1 apache apache    1 Jan 12 14:48 curpage
-rw-r--r-- 1 apache apache    4 Jan 12 14:48 DEMOTHE_TITLE
-rw-r--r-- 1 apache apache    3 Jan 12 14:48 doctype
-rw-r--r-- 1 apache apache  679 Jan 12 14:48 dummy_marcxml_rec
drwxr-xr-x 4 apache apache 4096 Jan 12 14:48 files
-rw-r--r-- 1 apache apache  532 Jan 12 14:48 function_log
-rw-r--r-- 1 apache apache    7 Jan 12 14:48 indir
-rw-r--r-- 1 apache apache   15 Jan 12 14:48 lastuploadedfile
-rw-r--r-- 1 apache apache    2 Jan 12 14:48 ln
-rw-r--r-- 1 apache apache   25 Jan 12 14:48 mainmenu
-rw-r--r-- 1 apache apache    1 Jan 12 14:48 mode
-rw-r--r-- 1 apache apache   15 Jan 12 14:48 PFC_MEM
-rw-r--r-- 1 apache apache   16 Jan 12 14:48 PFC_MEM_RENAMED
-rw-r--r-- 1 apache apache  695 Jan 12 14:48 recmysql
-rw-r--r-- 1 apache apache  205 Jan 12 14:48 rename_cmd
-rw-r--r-- 1 apache apache    4 Jan 12 14:48 SN
-rw-r--r-- 1 apache apache    1 Jan 12 14:48 startPg
-rw-r--r-- 1 apache apache    1 Jan 12 14:48 step
-rw-r--r-- 1 apache apache   17 Jan 12 14:48 SuE
-rw-r--r-- 1 apache apache   12 Jan 12 14:48 TAZ_RN

It contains all the submit-form values, as well as the fulltext files attatched to the record (/soft/cds-invenio/var/data/submit/storage/running/TAZ/1263304104_5564/files directory).

Move_Files_To_Storage moves files to a directory like /soft/cds-invenio/var/data/files/g0/409/TAZ-2010-009.pdf;1. How is this done?

Move_Files_To_Storagecreates a BibRecDoc object (this class is defined in BibDocFile.py). BibRecDoc objects have BibDoc objects inside of them. As a result of this (more precisely, as a result of the execution of _make_base_dir defined in bibdocfile.py) a new directory is created to store the fulltexts. The path is made using the CFG_WEBSUBMIT_FILEDIR variable (defined in config.py) as a basis and appending a group (g0) and a docid (409).

This BibRecDoc has consequences in how URL’s to fulltexts are built and managed (hard to explain in a few lines, so I’ll skip this).

Well, in my case I decided to change this function so that files would not be copied to that dir, but to a new one instead /t31/TAZ/SN/.

SN refers to RecId (record id) and it is a variable.

I also decided I did not want to create a BibRecDoc for my TAZ fulltext’s because I wanted to handle the fulltext-URLs creation (stored into marcxml’s 856u tag) to be like if the fulltext was outside from CDS Invenio system (external url, using CDS terms).

This allows me to avoid the Python handler (invenio.webinterface_layout defined in /soft/cds-invenio/etc/apache/invenio-apache-vhost.conf) into which invenio relies.

My Move_Files_To_Storage_TAZ is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
from invenio.bibdocfile import BibRecDocs, decompose_file, InvenioWebSubmitFileError
import os
import re
from invenio.websubmit_icon_creator import create_icon, InvenioWebSubmitIconCreatorError
from invenio.websubmit_config import InvenioWebSubmitFunctionWarning
from invenio.websubmit_functions.Shared_Functions import get_dictionary_from_string, \
     createRelatedFormats
from invenio.errorlib import register_exception
 
def Move_Files_to_Storage_TAZ(parameters, curdir, form, user_info=None):
    """
    The function moves files received from the standard submission's form through
    file input element(s).
    Websubmit_engine built the following file organization in the directory curdir/files
 
                  curdir/files
                        |
      _______________________________________________________________________________
            |                                   |                          |
      ./file input 1 element's name      ./file input 2 element's name    ....
         |                                     |
      test1.pdf                             test2.pdf
 
 
    There is only one instance of all possible extension(pdf, gz...) in each part
    otherwise we may encount problems when renaming files.
    +parameters['rename']: if given, all the files in curdir/files are renamed.
     parameters['rename'] is of the form: <PA>elemfilename[re]</PA>* where re is
     an regexp to select(using re.sub) what part of the elem file has
     to be selected.e.g <PA>file:TEST_FILE_RN</PA>
    +parameters['documenttype']: if given, other formats are created.
     It has 2 possible values: - if "picture" icon in gif format is created
                               - if "fulltext" ps, gz .... formats are created
    +parameters['paths_and_suffixes']: directories to look into and corresponding
    suffix to add to every file inside. It must have the same structure as a
     python dictionnary of the following form
     {'FrenchAbstract':'french', 'EnglishAbstract':''}
     The keys are the file input element name from the form <=> directories in curdir/files
     The values associated are the suffixes which will be added to all the files
     in e.g. curdir/files/FrenchAbstract
    +parameters['iconsize'] need only if "icon" is selected in parameters['documenttype']
    """
    global sysno
    paths_and_suffixes = parameters['paths_and_suffixes']
    rename = parameters['rename']
    documenttype = parameters['documenttype']
    iconsize = parameters['iconsize']
 
    ## Create an instance of BibRecDocs for the current recid(sysno)
# we do not want this anymore
#    bibrecdocs = BibRecDocs(sysno)
 
    paths_and_suffixes = get_dictionary_from_string(paths_and_suffixes)
    ## Go through all the directory specified in the keys
    ## of parameters['paths_and_suffixes']
    for path in paths_and_suffixes.keys():
        ## Check if there is a directory for the current path
        if os.path.exists("%s/files/%s" % (curdir, path)):
            ## Go through all the files in curdir/files/path
            for current_file in os.listdir("%s/files/%s" % (curdir, path)):
                ## retrieve filename and extension
                ## Editado por Teresa y Miguel: vamos a copiar los TAZ al /t31 y vamos a pasar del resto de cosas
                dummy, filename, extension = decompose_file(current_file)
                if extension and extension[0] != ".":
                    extension = '.' + extension
                if len(paths_and_suffixes[path]) != 0:
                    extension = "_%s%s" % (paths_and_suffixes[path], extension)
                ## Build the new file name if rename parameter has been given
                if rename:
                    filename = re.sub('<PA>(?P<content>[^<]*)</PA>', \
                                      get_pa_tag_content, \
                                      parameters['rename'])
                if rename or len(paths_and_suffixes[path]) != 0:
                    ## Rename the file
                try:
                        # Write the log rename_cmd
                        fd = open("%s/rename_cmd" % curdir, "a+")
                        fd.write("%s/files/%s/%s" % (curdir, path, current_file) + " to " +\
                                  "%s/files/%s/%s%s" % (curdir, path, filename, extension) + "\n\n")
                        ## Rename
                        os.rename("%s/files/%s/%s" % (curdir, path, current_file), \
                                  "%s/files/%s/%s%s" % (curdir, path, filename, extension))
                        fd.close()
                        ## Save the new name in a text file in curdir so that
                        ## the new filename can be used by templates to created the recmysl
                        fd = open("%s/%s_RENAMED" % (curdir, path), "w")
                        fd.write("%s%s" % (filename, extension))
                        fd.close()
 
                        fd = open("%s/SN" % (curdir) )
                        numeroreg = fd.read()
                        fd.close()
 
                        fd = open("%s/CENTRO" % (curdir) )
                        centro = fd.read()
                        fd.close()
 
                        fd = open("%s/DEMOTHE_DATE" % (curdir) )
                        year = fd.read()
                        fd.close()
 
                        ## HAsta aqui todo igual. Ahora vamos a copiarlo al /t31, que es lo que queremos
                        destino = "/t31/TAZ/%s/%s/%s" % (centro, year, numeroreg)
                        os.system("mkdir /t31/TAZ/%s/" % (centro))
                        os.system("mkdir /t31/TAZ/%s/%s/" % (centro, year))
                        os.system("mkdir /t31/TAZ/%s/%s/%s/" % (centro, year, numeroreg))
                        if path == "PFC_MEM":
                            os.system("cp %s/files/%s/%s%s %s/%s%s" % (curdir, path, filename, extension, destino, filename, extension) )
                        if path == "PFC_ANE":
                            os.system("cp %s/files/%s/%s%s %s/%s_ANE%s" % (curdir, path, filename, extension, destino, filename, extension) )
                        ## Ya esta el directorio destino creado y sabemos cual es
                        ## entonces invocamos a la funcion que crea el .htaccess en destino
                        if path == "PFC_MEM":
                            create_htaccess(curdir,destino)
 
                    except OSError, err:
                        msg = "Cannot rename the file.[%s]"
                        msg %= str(err)
                        raise InvenioWebSubmitFunctionWarning(msg) 
    return ""
 
def get_pa_tag_content(pa_content):
    """Get content for <PA>XXX</PA>.
    @param pa_content: MatchObject for <PA>(.*)</PA>.
    return: the content of the file possibly filtered by an regular expression
    if pa_content=file[re]:a_file => first line of file a_file matching re
    if pa_content=file*p[re]:a_file => all lines of file a_file, matching re,
    separated by - (dash) char.
    """
    pa_content = pa_content.groupdict()['content']
    sep = '-'
    out = ''
    if pa_content.startswith('file'):
        filename = ""
        regexp = ""
        if "[" in pa_content:
            split_index_start = pa_content.find("[")
            split_index_stop =  pa_content.rfind("]")
            regexp = pa_content[split_index_start+1:split_index_stop]
            filename = pa_content[split_index_stop+2:]## ]:
        else :
            filename = pa_content.split(":")[1]
        if os.path.exists(os.path.join(curdir, filename)):
            fp = open(os.path.join(curdir, filename), 'r')
            if pa_content[:5] == "file*":
                out = sep.join(map(lambda x: re.split(regexp, x.strip())[-1], fp.readlines()))
            else:
                out = re.split(regexp, fp.readline().strip())[-1]
            fp.close()
    return out

Relevant lines are 90 to 102 (read SN and copy the files to desired location).

The APACHE configuration

In this post you will find the explanations to the following configuration:

/soft/cds-invenio/etc/apache/invenio-apache-vhost.conf:

AddDefaultCharset UTF-8
ServerSignature Off
ServerTokens Prod
NameVirtualHost 155.210.5.41:80
<Files *.pyc>
   deny from all
</Files>
<Files *~>
   deny from all
</Files>
<VirtualHost 155.210.5.41:80>
        ServerName aneto.unizar.es
        ServerAdmin teresa@unizar.es
        DocumentRoot /soft/cds-invenio/var/www
        ErrorLog "/soft/cds-invenio/var/log/apache/ldap-error_log"
        CustomLog "/soft/cds-invenio/var/log/apache/ldap-access_log" common
        LogLevel debug
        <Directory /soft/cds-invenio/var/www>
           Options FollowSymLinks MultiViews
        </Directory>
        Alias /TAZ/ "/t31/TAZ/"
        <Directory /t31/TAZ/>
           AllowOverride AuthConfig
           AuthType Basic
           AuthBasicProvider ldap file
           AuthzLDAPAuthoritative off
           AuthName "Aneto accediendo a PDF sin aprobar"
           AuthLDAPURL "ldap://ldapmail.unizar.es/dc=unizar,dc=es?uid?sub?(objectClass=person)"
        </Directory>
        DirectoryIndex index.en.html index.html index.php
        <LocationMatch "^(/+$|/index|/collection|/record|/author|/search|/browse|/youraccount|/youralerts|/yourbaskets|/yourmessages|/yourgroups|/submit|/getfile|/comments|/error|/oai2d|/rss|/help|/journal|/openurl|/stats|/ourcode)">
           SetHandler python-program
           PythonHandler invenio.webinterface_layout
           PythonDebug On
        </LocationMatch>
        <Directory /soft/cds-invenio/var/www>
           AddHandler python-program .py .cgi
           PythonHandler mod_python.publisher
           PythonDebug On
        </Directory>
</VirtualHost>

Just as an example of .htaccess I provide the one related to recid 3481:

[root@aneto cdsadmin]# more /t31/TAZ/3481/.htaccess
# Generate your .htpasswd files using the following online service
# http://www.kxs.net/support/htaccess_pw.html
 
AuthUserFile /soft/cds-invenio/var/.htpasswd
Require user cdsadmin miguelm

The cdsadmin user will be present in ALL the .htaccess files. This user is authenticated using AuthUserFile

The second user, miguelm (authenticated with LDAP service) is the user that originally submitted the record (this name is part of the submitter-email, usually referred as SuE).

With this config user cdsadmin will be allowed to download all fulltext and SuE will be also allowed to download his fulltext, but other users won’t (and, way more, if the MBI -modification- is configured properly, these two users will be the only ones to have the chance to modify the record. Refer to function Is_Original_Submitter).

The .htaccess-creation function

You will also need to make a new function which responsible for creating the proper .htaccess file for each record.

This function will be explained in the next days, I’m still at a developing stage ;)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def create_htaccess(curdir,destino):
    """Crea el htaccess para usuarios privilegiados (cdsadmin, etc)
    que se validan contra el fichero .htpasswd general y
    el submitter que se valida contra LDAP.
    """
    # Abrir curdir/SuE y leer el valor
    fd = open("%s/SuE" % (curdir))
    SuE = fd.read()
    fd.close()
 
    # Aqui tenemos algo como SuE = miguelm@unizar.es
    # Hacemos el split por @
    user = SuE.rsplit('@',1)
    usuario = user[0]
 
    # Ya tenemos el 'usuario'- Ahora creamos el .htaccess
    htaccess = """AuthUserFile /soft/cds-invenio/var/.htpasswd
               Require user cdsadmin %s""" % (usuario)
 
    # Ahora escribimos el .htaccess donde corresponde, osea, en 'destino'
    fd = open("%s/.htaccess" % (destino), "w")
    fd.write("%s" % (htaccess))
    fd.close()

CDS-Invenio: Understanding WEBSUBMIT

A few posts back I talked about restricting access to fulltexts to an iprange. In those articles I gave some tips about how websubmit works. Now I want to do another mod to my CDS Invenio repository, so I needed to have a deeper understanding in websubmit workflow.

Before we begin this travel through websubmit it is a good idea to define some terminology I’ll be using:

Terminology

Recid: it is the registry number of a record. For instance, for record http://zaguan.unizar.es/record/2000 registry number is equals to 2000. This number is stored into marcxml’s 001 tag.

Report_Number: it is another way to identify some record. For the example above, this number is INPRO–2009-038 and it is stored into marcxml’s 037 tag. It is also called reference or rn.

access: this is a randomly-generated? (not so sure its totally random) number which is created every time you begin with an action. You can see it in the URL when, for instance, you submit a record. It is something like this: 1260870578_1753 and it is NOT stored in marcxml, but it is in the database (for instance in field id of table sbm_SUBMISSIONS).

act: the action you are executing. This parameter can be seen in your browser’s URL. Invenio comes with several pre-made actions, like:

SBI for submit new record
APP for approve submitted record
MBI for modify existing (this is, already approved!!) record
SRV for changes in attached fulltext files.

doctype: the document type to which act is applied. This can be seen in browser’s URL too. doctype refers to the string in brackets that you can see in your websumit’s admin menu. For instance, [DEMOBOO].

indir: each action is connected directly with a system’s directory. For example, for action MBI (modify existing record) the working directory is modify. This is the value of indir parameter. Further details can be read in the following section.

Actions

You can see your defined actions in
http://www.yourrepositoryname.com/admin/websubmit/websubmitadmin.py/actionlist

Each action is connected to a working directory. For instance, SBI is attached to running directory, which is located under $PATH_TO_CDS/var/data/submit/storage/running/

By default, CDS Invenio comes with an example of a referee’d doctype, which executes this functions. Along with the functions are the values of each step and score.

  Create_Recid     1 	10 	
  Report_Number_Generation 	 1 	20 	
  Make_Dummy_MARC_XML_Record	 	1 	30 	
  Move_Files_to_Storage	1 	40 	
  Mail_Submitter	1 	50 	
  Update_Approval_DB 	1 	60 	
  Send_Approval_Request	1 	70 	
  Print_Success	  1 	80
  Move_to_Pending    1       90

The SUBMIT NEW RECORD (SBI) workflow

Suppose we enter our repository, log in, and then click “submit” and select a referee’d doctype (in my case doctype=PFC). Then a form appears. What is really happening?

Well, websubmit_engine.py and websubmit_webinterface.py are working in the shadows.

Here is what really happens:
websubmit_engine creates a new access number.

websubmit_webinterface creates a new directory using several parameters. Lets see an example:

$BASE/$indir/$doctype/$access
$BASE=$PATH_TO_CDS/var/data/storage
$indir=running
$doctype=PFC
$access=1260870578_1753
 
So, the system creates $PATH_TO_CDS/var/data/storage/running/PFC/1260870578_1753

Then it copies to that directory (from now on, called curdir) all of the parameters shown in your browser’s URL (indir, doctype, access, blablabla). Click the image below to see fullsize:

cds invenio websubmit

It is important to note that none of the PFC-SBI’s functions have been still executed!.

Now the user begins to fill the submit form. When it is filled, the “submit” button is pushed. This is the moment in which PFC-SBI’s functions begin to run..

Lets see what happens before Move_To_Done is executed:

All of the form’s fields are stored in curdir. If your form has a field called PFC_AUTHOR with value ‘Miguel Martín’ a file called PFC_AUTHOR is created in curdir and it contains the string ‘Miguel Martín’. Since Create_Dummy has also been executed, a file called dummy_marcxml is also created (Make_Dummy_MARC_XML_Record functions takes into account the $doctype.tpl and $doctypeCREATE.tpl files and builds your dummy_marcxml according to that information).

Once this is done a “Congratulations! blablabla” message appears to the user, as a result of the execution of Print_Success. It seems that all of the submitting process is over, but it really isn’t.

Something ‘unexpected’ happens:
Move_to_Pending function moves your curdir/* files to /var/data/submit/storage/pending/doctype/Report_Number directory! This is, in my case:
/var/data/submit/storage/pending/PFC/PFC–2009-005

Here we can take a detailed look at Move_To_Pending.py:

import os
 
from invenio.config import CFG_WEBSUBMIT_STORAGEDIR
from invenio.websubmit_config import InvenioWebSubmitFunctionError
 
def Move_to_Pending(parameters, curdir, form, user_info=None):
    global rn
    doctype = form['doctype']
    PENDIR = "%s/pending/%s" % (CFG_WEBSUBMIT_STORAGEDIR,doctype)
    if not os.path.exists(PENDIR):
        try:
            os.makedirs(PENDIR)
        except:
            raise InvenioWebSubmitFunctionError("Cannot create pending directory %s" % PENDIR)
    # Moves the files to the pending directory
    rn = rn.replace("/","-")
    namedir = rn
    FINALDIR = "%s/%s" % (PENDIR,namedir)
    os.rename(curdir,FINALDIR)
    return ""

As shown above, this function creates, if not exists and for my example: /var/data/submit/storage/pending/PFC/.

The rn variable means Report Number, this is, PFC–2009-005. So it moves all the contents in the curdir to /var/data/submit/storage/pending/PFC/PFC–2009-005/ directory.

cds invenio websubmit

*** Edit: lets see some of MBI (modify record AFTER it is approved) pipeline.

APP (approve record)

When a record has been submitted and then approved some information is stored in /var/data/submit/storage/done/PFC/$Report_Number and a new entry is created in MySQL database, more precisely in table sbm_SUBMISSIONS.

This table stores a lot of information. Lets see what is stored for some record (for instance, for the one which Report_Number=

SELECT * FROM sbmSUBMISSIONS WHERE reference = 'PFC-2009-018';

submission database

Lets see the stored information and fields.

email who performed the action? (It was me!).
doctype the doctype of the document to which the action has been made (PFC
action the action made to that doctype (SBI, MBI, APP, SRV).
status the status of the task, in this case, finished (could be pending
id the access (number) of the task: 1245755804_15279
reference (report number or rn)
cd The date of cd. This is, the date of creation of /soft/cds-invenio/var/data/submit/storage/pending/PFC/PFC-2009-018/mainmenu
md The date of md. This is, the date of creation of /soft/cds-invenio/var/data/submit/storage/pending/PFC/PFC-2009-018 directory

The fulltext document (in pdf or whatever format you use) is stored in /soft/cds-invenio/var/data/submit/storage/pending/PFC/PFC-2009-018/files/PFC_MEM/PFC-2009-018.pdf but a compressed version is also stored into /soft/cds-invenio/var/data/files/g0/405/PFC-2009-018.pdf;1

MBI (modify AFTER approval)

The functions that are run are listed below:

    Get_Report_Number
    Get_Recid
    Is_Original_Submitter
    Create_Modify_Interface
    Get_Report_Number
    Get_Recid
    Make_Modify_Record
    Insert_Modify_Record
    Print_Success_MBI
    Send_Modify_Mail
    Move_to_Done

Wow, a lot of functions! I will comment only the ones which are more important and hard to understand.

I’ll begin with Create_Modify_Interface:

This functions reads the stored metadata of a record and creates an interface to modify the fields that user selects. Where does the Create_Modify_Interface function read the stored values? Is it from database? Is it from the system directories?. Well, it depends: this function goes into curdir and looks for a file named Create_Modify_Interface_Done.

If it exists, then the record metadata is loaded from FILES (curdir system directory) using Create_Modify_Interface_getfieldval_fromfile.

If not exists, then the record metadata is loaded from DATABASE, using Create_Modify_Interface_getfieldval_fromDBrec.

**** WILL CONTINUE WRITING THIS POST IN FUTURE DAYS ****

CDS Invenio: Configuring LDAP to login into repository

CDS Invenio software is configured by default to use internal login method. This is, if a user does not have any account, he must register into repository in order to get a valid login and password.

If your institution has LDAP (Lightweight Directory Access Protocol) you can use this login system to access the repository.

These are the steps you should follow to configure LDAP access.

Adding LDAP authentification

First of all, edit $PATH_TO_CDS/cds-invenio/lib/python/invenio/access_control_config.py.
Just after these lines:

from invenio.config import CFG_SITE_NAME, CFG_SITE_URL, CFG_SITE_LANG, \
CFG_SITE_SECURE_URL, CFG_SITE_SUPPORT_EMAIL, CFG_CERN_SITE
import cPickle

add the following line:

from invenio.external_authentication_ldap import ExternalAuthLDAP

Now you should add your new login method (LDAP) to CFG_EXTERNAL_AUTHENTICATION

CFG_EXTERNAL_AUTHENTICATION = {"%s (internal)" % CFG_SITE_NAME: (None, False), "LDAP (external)": (ExternalAuthLDAP(), True)}

This adds a new select box to your login page (my url to login is https://zaguan.unizar.es/youraccount/login) which allows the user to select the login method (internal -just as it was before modifications- or external -LDAP-). The True and False values in previous line indicate which will be the default method to login. In my case, I set LDAP (external) as default.

Now, proceed to edit $PATH_TO_CDS/cds-invenio/lib/python/invenio/external_authentication_ldap.py. This file should exist in your repository, as CERN provides it as an example. You should now query your LDAP to know how it is organised.
Mine is organised as follows:

Our DN: uid=xxx,ou=Accounts,dc=unizar,dc=es
    |
    +--ou=Accounts
    |     |
    |     |
    |     +---uid= some local id for users (ex: grfavre)
    |         cn
    |         mail=xxx@xxx.xx
    |         uidNumber
    |         gidNumber
    |
    +

So my external_authentication_ldap.py file starts like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import ldap
from invenio.external_authentication import ExternalAuth, \
                                            InvenioWebAccessExternalAuthError
 
# Our LDAP server...
CFG_EXTERNAL_AUTH_LDAP_SERVERS = ['ldap://ldapmail.unizar.es']
 
# Our query string...
CFG_EXTERNAL_AUTH_LDAP_CONTEXT = "ou=Accounts,dc=unizar, dc=es"
 
# Which parameters should be checked to auth a user
CFG_EXTERNAL_AUTH_LDAP_USER_UID  = ["uid","mail"]
 
# Which is the mail parameter name...
CFG_EXTERNAL_AUTH_LDAP_MAIL_ENTRY = 'mail'
 
# If you like LDAP users to be added to a group when they first access the system, set the following parameter. I provide this just an example, as I do NOT want users to be added to groups.
#CFG_EXTERNAL_AUTH_LDAP_GROUP_MEMBERSHIP = 'gidNumber'
#CFG_EXTERNAL_AUTH_LDAP_GROUP_UID = 'uidNumber'
#CFG_EXTERNAL_AUTH_LDAP_GROUP_NAME = 'gidNumber'
 
# Which groups should be hidden...
CFG_EXTERNAL_AUTH_LDAP_HIDDEN_GROUPS = ['users']

Now you should customize your ExternalAuthLDAP class (defined also in external_authentication_ldap.py). Mine is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
class ExternalAuthLDAP(ExternalAuth):
    """
    External authentication example for a custom LDAP-based
    authentication service.
    """
    def __init__(self):
        """Initialize stuff here"""
        ExternalAuth.__init__(self)
        self.enforce_external_nicknames = True
 
    def _ldap_try (self, command):
        """ Try to run the specified command on the first LDAP server that
        is not down."""
        for server in CFG_EXTERNAL_AUTH_LDAP_SERVERS:
            try:
                connection = ldap.initialize(server)
                return command(connection)
            except ldap.SERVER_DOWN, error_message:
                continue
        raise InvenioWebAccessExternalAuthError
 
 
    def auth_user(self, username, password, req=None):
        """
        Check USERNAME and PASSWORD against the LDAP system.
        Return None if authentication failed, or the email address of the
        person if the authentication was successful.
        Raise InvenioWebAccessExternalAuthError in case of external troubles.
        Note: for SSO the parameter are discarded and overloaded by Shibboleth
        variables
        """
        if not password:
            return None
 
        query = '(|' + ''.join (['(%s=%s)' % (attrib, username)
                                 for attrib in
                                     CFG_EXTERNAL_AUTH_LDAP_USER_UID]) \
                 + ')'
 
        def _check (connection):
            users = connection.search_s(CFG_EXTERNAL_AUTH_LDAP_CONTEXT,
                                        ldap.SCOPE_SUBTREE,
                                        query)
 
            # We pick the first result, as all the data we are interested
            # in should be the same in all the entries.
            if len(users):
                user_dn, user_info = users [0]
            else:
                return None
            try:
                connection.simple_bind_s(user_dn, password)
            except ldap.INVALID_CREDENTIALS:
                # It is enough to fail on one server to consider the credential
                # to be invalid
                return None
            # las siguientes lineas incluidas por teresa y miguel
            # evitan que si una cuenta existe en el LDAP pero no tiene
            # valor su clave 'mail' (que es la que define CFG_EXTERNAL_AUTH_LDAP_MAIL)
            # se produzca la excepcion keyError
            try:
                mail = user_info[CFG_EXTERNAL_AUTH_LDAP_MAIL_ENTRY][0]
            except KeyError:
                return None
            return user_info[CFG_EXTERNAL_AUTH_LDAP_MAIL_ENTRY][0]
 
        return self._ldap_try(_check)
 
    def user_exists(self, email, req=None):
        """Check the external authentication system for existance of email.
        @return True if the user exists, False otherwise
        """
        query = '(%s=%s)' % (CFG_EXTERNAL_AUTH_LDAP_MAIL_ENTRY, email)
        def _check (connection):
            users = connection.search_s(CFG_EXTERNAL_AUTH_LDAP_CONTEXT,
                                        ldap.SCOPE_SUBTREE,
                                        query)
            return len(users) != 0
        return self._ldap_try(_check)
 
    def fetch_user_nickname(self, username, password=None, req=None):
        """Given a username and a password, returns the right nickname belonging
        to that user (username could be an email).
        """
        query = '(|' + ''.join (['(%s=%s)' % (attrib, username)
                                 for attrib in
                                     CFG_EXTERNAL_AUTH_LDAP_USER_UID]) \
                 + ')'
        def _get_nickname(connection):
            users = connection.search_s(CFG_EXTERNAL_AUTH_LDAP_CONTEXT,
                                        ldap.SCOPE_SUBTREE,
                                        query)
            # We pick the first result, as all the data we are interested
            # in should be the same in all the entries.
            if len(users):
                user_dn, user_info = users [0]
            else:
                return None
            emails = user_info[CFG_EXTERNAL_AUTH_LDAP_MAIL_ENTRY]
            if len(emails):
                email = emails[0]
            else:
                return False
            (left_part, right_part) = email.split('@')
            nickname = left_part.replace('.', ' ').title()
 
           # if right_part != 'unizar.es' and right_part != 'celes.unizar.es':
           #     nickname += ' - ' + right_part
            return nickname.lower()
        return self._ldap_try(_get_nickname)
 
#    def fetch_user_groups_membership(self, username, password=None, req=None):
#        """Given a username and a password, returns a dictionary of groups
#        and their description to which the user is subscribed.
#        Raise InvenioWebAccessExternalAuthError in case of troubles.
#        """
#
#        query_person = '(|' + ''.join (['(%s=%s)' % (attrib, username)
#                                 for attrib in
#                                     CFG_EXTERNAL_AUTH_LDAP_USER_UID]) \
#                        + ')'
#        def _get_groups(connection):
#            users = connection.search_s(CFG_EXTERNAL_AUTH_LDAP_CONTEXT,
#                                        ldap.SCOPE_SUBTREE,
#                                        query_person)
#            if len(users):
#                user_dn, user_info = users [0]
#            else:
#                return {}
#            groups = {}
#            group_ids = user_info[CFG_EXTERNAL_AUTH_LDAP_GROUP_MEMBERSHIP]
#            for group_id in group_ids:
#                query_group = '(%s=%s)' % (CFG_EXTERNAL_AUTH_LDAP_GROUP_UID,
#                                           group_id)
#                ldap_group = connection.search_s(CFG_EXTERNAL_AUTH_LDAP_CONTEXT,
#                                                 ldap.SCOPE_SUBTREE,
#                                                 query_group)
#                if len(ldap_group):
#                    group_dn, group_infos = ldap_group[0]
#                    group_name = group_infos[CFG_EXTERNAL_AUTH_LDAP_GROUP_NAME][0]
#                    if group_name in CFG_EXTERNAL_AUTH_LDAP_HIDDEN_GROUPS:
#                        continue
#                    groups[group_id] = group_name
#            return groups
#        return self._ldap_try(_get_groups)
 
    def fetch_user_preferences(self, username, password=None, req=None):
        """Given a username and a password, returns a dictionary of keys and
        values, corresponding to external infos and settings.
 
        userprefs = {"telephone": "2392489",
                     "address": "10th Downing Street"}
 
        (WEBUSER WILL erase all prefs that starts by EXTERNAL_ and will
        store: "EXTERNAL_telephone"; all internal preferences can use whatever
        name but starting with EXTERNAL). If a pref begins with HIDDEN_ it will
        be ignored.
        """
        query = '(|' + ''.join (['(%s=%s)' % (attrib, username)
                                 for attrib in
                                     CFG_EXTERNAL_AUTH_LDAP_USER_UID]) \
                 + ')'
        def _get_personal_infos(connection):
            users = connection.search_s(CFG_EXTERNAL_AUTH_LDAP_CONTEXT,
                                        ldap.SCOPE_SUBTREE,
                                        query)
            if len(users):
                user_dn, user_info = users [0]
                return user_info
            else:
                return {}
        return self._ldap_try(_get_personal_infos)

Note
(a) the commented fetch_user_groups_membership function. I commented this because I do NOT want LDAP users to be added to any group (in your MySQL database, table usergroup).

(b) function _get_nickname builds the nickname that will be stored in MySQL database (table user). Suppose your mail is loginname@mail.com. This function has been modified so that it never adds the trailing - email.com part to stored nickname value, and
it also avoids the first character to be in capitals.
If you are wondering what do the other functions make, refer to your CDS documentation.

There is one last step you should be aware of. In $PATH_TO_CDS/cds-invenio/etc/invenio.conf there is a variable called CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS which you should set.

Refer to CDS documentation to have a deeper understanding of this variable. I will just say in my case I first set it to 4 and then noticed some problems when users which already had an internal account tried to login using LDAP. A message like “This is not the default login method for %s user” raised. Ugly.

This warning is produced in function loginUser (more precisely line 592 of webuser.py) and is related to the user_preferences defined in MySQL user table. This field is defined as a blob, so it is not easy to see what is inside using tools like phpMyAdmin, toad for mysql or similar. If you want to check what is inside this field, use get_user_preferences function (defined in $PATH_TO_CDS/cds-invenio/lib/python/invenio/webuser.py).

Once you have done all this mods, run:

sudo -u apache inveniocfg --update-all; /etc/init.d/httpd restart

Now go to your Repository’s login page and try to login using LDAP. Use your LDAP’s login (with or without @yourwebsite.com) and password. Everything should be working.

Summing up:
- Modify access_control_config.py, external_authentication_ldap.py and invenio.conf.
- Run inveniocfg --update-all. Restart your web server.

Removing the “internal” login method

I wanted to force all the users to use the same login method: LDAP. Some users already had an internal account in the repository. The best solution i could figure was:

1. Remove internal login method from CFG_EXTERNAL_AUTHENTICATION (defined in $PATH_TO_CDS/cds-invenio/lib/python/invenio/access_control_config.py).

CFG_EXTERNAL_AUTHENTICATION = {"LDAP (external)": (ExternalAuthLDAP(), True)}

2. In invenio.conf set CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS to 0:

CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS = 0

3. Edit websession_templates.py to clearly indicate the user which data has to be used to login into repository.

Questions? Tips? Bugs? Please comment!

CDS Invenio: usage graphs customization

Introduction

CDS Invenio offers the possibility to show some usage graphs of your records. For instance, refer to http://zaguan.unizar.es/record/4162/usage

cds invenio usage graphs

There you will notice several graphs. Lets focus in the last one, labelled Download user distribution.

If you installed CDS by default this last graph will show the download history of your fulltext divided by client-ip. First thing you will have to modify is your institution’s iprange. IP’s are stored in mysql using inet_aton() function.

How does the inet_aton() function work?

Suppose a client with IP=A.B.C.D downloads a fulltext. How is this information stored in your MySQL database? Easy. It is stored as inet_aton('A.B.C.D')=A*255^3+B*255^2+C*255+D.

For instance, my institution’s iprange is [155.210.0.0 ... 155.210.255.255]. So I have to calculate the inet_atom associated values:
inet_aton(’155.210.0.0′)=2614231040
inet_aton(’155.210.0.0′)=2614296575

How do I customize CDS Invenio with my institutions iprange?

1.- Edit $PATH_TO_CDS/cds-invenio/lib/python/invenio/bibrank_downloads_grapher.py and change line number 278 to change CERN’s iprange to your institution’s iprange:

Before:

    270 def create_users_analysis_graph(id_bibrec, ips):
    271     """For a given id_bibrec, classify cern users and other users
    272     Draw a percentage graphic reprentation"""
    273     cern_users = 0
    274     other_users = 0
    275     coordinates_list = []
    276     #compute users repartition
    277     for i in range(len(ips)):
    278         if 2307522817 <= ips[i] <= 2307588095 or 2156724481 <= ips[i] <= 2156789759:
    279         ...

After (note the new values, they are the ones calculated in previous section!):

1
2
3
4
5
6
7
8
9
10
11
    270 def create_users_analysis_graph(id_bibrec, ips):
    271     """For a given id_bibrec, classify cern users and other users
    272     Draw a percentage graphic reprentation"""
    273     cern_users = 0
    274     other_users = 0
    275     coordinates_list = []
    276     #compute users repartition
    277     for i in range(len(ips)):
    278         # if 2307522817 <= ips[i] <= 2307588095 or 2156724481 <= ips[i] <= 2156789759:
    279         if 2614231040 <= ips[i] <= 2614296575:
    280             ...

Then run:
sudo -u apache inveniocfg --update-all; /etc/init.d/httpd restart
and your bars will change. There is still something missing: the labels for X values are wrong! It says “CERN Users” where it should say “Your institution Users”.

2. Modify “CERN Users” to (in this case) “UZ users”: you will have to edit $PATH_TO_CDS/cds-invenio/lib/python/invenio/bibrank_grapher.py, line 165.

Before (line 165 is the one to change!):

    160     elif kind_of_graphe == 'download_users':
    161         g('set size 0.25,0.5')
    162         g('set xrange [0:4]')
    163         g('set yrange [0:100]')
    164         g('set format y "%g %%"')
    165         g("""set xtics ("" 0, "CERN\\n Users" 1, "Other\\n Users" 3, "" 4)""")
    166         g('set ytics 0,10,100')
    167         g('set boxwidth 0.7 relative')
    168         g('set style fill solid 0.25')
    169         plot_text = 'plot "%s" using 1:2 title "" with boxes lt 7 lw 2' % data_file

After (line 165 has changed!):

    160     elif kind_of_graphe == 'download_users':
    161         g('set size 0.25,0.5')
    162         g('set xrange [0:4]')
    163         g('set yrange [0:100]')
    164         g('set format y "%g %%"')
    165         g("""set xtics ("" 0, "UZ\\n Users" 1, "Other\\n Users" 3, "" 4)""")
    166         g('set ytics 0,10,100')
    167         g('set boxwidth 0.7 relative')
    168         g('set style fill solid 0.25')
    169         plot_text = 'plot "%s" using 1:2 title "" with boxes lt 7 lw 2' % data_file

Again run:
sudo -u apache inveniocfg --update-all; /etc/init.d/httpd restart

There you go, customized graphics for everyone! ;)

Buscar
Anunciarse / Advertise

Póngase en contacto conmigo utilizando el formulario de contacto

Gracias por sus consejos y sugerencias ;)
_______________________

You can contact me using the contact form

Thanks for all your tips & suggestions ;)




Bookmark!
Bookmark and Share