[NOTE: Only available in english, my apologies to spanish speakers]
[Edit note: if you want to be able to change the type of fulltext (in terms of public/private) from the APP(roval) pipeline you should check the second part of this article]
Introduction
One of the most common issues when addressing the problem of publishing scientific production (this is, for instance, published articles) in an OAI repository is that fulltexts are usually under restrictive licensing. I mean, journals are not very happy with the idea of their fulltexts being public.
But this concept is totally against my idea of OAI Repositories, in which fulltexts are supposed to be public and accesible for everyone. Most of the universities are engaged in a debate about how to solve this issue. Meanwhile, we IT people must provide a solution for the problem.
In our case the solution is simple: restrict the fulltext access of copyrighted fulltexts to University staff only (students, profesors, researchers…). Now what you might be wondering is: how to do this with CDSInvenio?
What we want to achieve is: in the submission form (and maybe in the approval form too) we show a select box which asks the submitter if the fulltext is public or private. If the document is marked as private the associated fulltext will be only available from a certain iprange (in my case I slightly modified the functionality using EZProxy so that our users can access the contents from their home).
Restrict access to some fulltexts using CDSInvenio: step by step
1. I’ve defined the new Function
called Fulltext_Status.py.
Tip: make sure that the function name is equal to the file’s name without the trailing .py
1.5 I’ve modified the template associated to submission of ART (article) files
The associated files (in my case) are:
In ART.tpl I’ve added the following line:
PRV_PUB---<:PRV_PUB:>
In ARTcreate.tpl I modified the line associated to generation of 8564_u MARC tag like:
8564u::IFDEFP(DEMOART_FILE_RENAMED,,0)---<datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://zaguan.unizar.es/record/<:SN::SN:>/files/<:DEMOART_
FILE_RENAMED::DEMOART_FILE_RENAMED:></subfield><subfield code="z">Fulltext</subfield></datafield>
which basically means write URL to file in 8564_u tag if there is an attached file.
and also added a 984__a tag (9xx tags are used for administration purposes) which shows if the record is considered as private or public:
984a::DEFP---<:PRV_PUB::PRV_PUB:>
2. I’ve added the new function to the SBI process
Tip: the position of the function is quite important. It has to be *AFTER* Move_Files_to_Storage call because it’s Move_Files_to_Storage who will allocate the bibdocs and associate them to the record.

3. And to the APP process.
Tip 1: the position of the function in the approval process is also important. It has to be just *BEFORE* the Move_to_Done function because this should be the last function ever to be called (since it packs up everything and archives it).
Tip 2: Moreover in the APP action, due to an architectural limitation of WebSubmit, function executed in a step different than the 1st one, will not have the form dictionary (Will improve the documentation on this).

In order to be able to read the parameter PRV_PUB in your function you should instead try to use the filesystem as in:
if form.has_key("PRV_PUB"):
prv_pub = form['PRV_PUB']
elif os.path.exists(os.path.join(curdir, 'PRV_PUB')):
prv_pub = open(os.path.join(curdir, 'PRV_PUB')).read()
else:
prv_pub = ""
In this way if the form element is not there you fallback on the filesystem, where, if everything went correctly, the form element should have been stored in a file just before entering step 1.
4. Then I have created a new element description
called PRV_PUB

This is just the important part:
<select name="PRV_PUB">
<option>Seleccione el carácter de su publicación:</option>
<option value="private">Private</option>
<option value="public">Public</option>
</select>
5. I’ve added this new element to the SBI and APP page
The image below just shows the SBI page (with the APP page the including process is pretty similar)

6. Next I have created the role IP_UZ
Just go to $CFG_SITE_NAME/admin/webaccess/webaccessadmin.py URL and add a new role:
allow email "miguelm@unizar.es"
allow remote_ip "155.210."
deny all
7. Then I have connected this new role
… to the viewrestrdoc with status=prv (prv is the status code I used in Fulltext_Status function, refer to section 1).

Then I submit a new element, approve it and check if I can see the fulltext pdf from my computer (155.210.XX.YY). Great, I can.
Tip: You can check the status of recently submitted files using:
/soft/cds-invenio/bin/bibdocfile --get-info --recid 3277 |
You should see something like this:
3277::::total bibdocs attached=1
3277::::total size latest version=714.1 KB
3277::::total size all files=714.1 KB
3277:225:::docname=ART--2009-009
3277:225:::doctype=DEMOART_FILE
3277:225:::status=prv
3277:225:::basedir=/soft/cds-invenio/var/data/files/g0/225
3277:225:::creation date=2009-05-13 13:07:55
3277:225:::modification date=2009-05-13 13:08:40
3277:225:::total file attached=1
3277:225:::total size latest version=714.1 KB
3277:225:::total size all files=714.1 KB
3277:225:1:.pdf:fullpath=/soft/cds-invenio/var/data/files/g0/225/ART--2009-009.pdf;1
3277:225:1:.pdf:fullname=ART--2009-009.pdf
3277:225:1:.pdf:name=ART--2009-009
3277:225:1:.pdf:status=prv
3277:225:1:.pdf:checksum=abccd8b54af1c1fb10f4ad3a7e93151a
3277:225:1:.pdf:size=714.1 KB
3277:225:1:.pdf:creation time=2009-06-15 13:28:44
3277:225:1:.pdf:modification time=2009-05-13 13:07:55
3277:225:1:.pdf:encoding=None
3277:225:1:.pdf:url=http://zaguan.unizar.es/record/3277/files/ART--2009-009.pdf
3277:225:1:.pdf:description=None
3277:225:1:.pdf:comment=Texto completo
After that I use a proxy to connect to my repository, and without being logged in into Invenio, I try to access the fulltext. A “this file is restricted” text appears. Cool, it worked!
Going further: using EZProxy
At this point we know how to restrict the access to fulltext from an iprange. If we want our users (which are validated agains an LDAP system) to be able to access the fulltexts from their home (and not only from the university’s ip’s) we can use something like EZProxy. From a high level point of view this software gives an intern IP to outside connections (only if the user is able to login to EZProxy)
I will not explain here how to install/use/configure this software because there is plenty of documentation in their website. Lets suppose we have it running already.
The steps I took to make CDS work with EZProxy were:
8. I slightly modified the Bibformat element which shows URLs
(in my case, called bfe_fulltext_light.py) so that it ends up being something like (I just show the part of main_urls and outline the important lines. Please ignore the part of dx.doi.org):
<strong>ezproxy_url = 'http://roble.unizar.es:9090/login?url=' # your URL to ezproxy</strong>
if main_urls:
last_name = ""
for descr, urls in main_urls.items():
url_list = []
urls.sort(lambda (url1, name1, format1), (url2, name2, format2): url1 < url2 and -1 or url1 > url2 and 1 or 0)
for url, name, format in urls:
last_name = name
if show_icons.lower() == 'yes':
file_icon = '<img src="%s/img/%s" alt="%s"/>'
% (CFG_SITE_URL, icon(url), _("Download fulltext"))
else:
file_icon = ''
# first of all, see if it is public, private or dx.doi.org link
<strong>pub_prv = bfo.field('984__a')</strong>
<strong>if</strong> (url.find("dx.doi.org") != -1) or <strong>(pub_prv.find("private") != -1):
url_list.append('<a '+style+' href="' + ezproxy_url + escape(url)+'">'+ \
file_icon +'</a> ')
else:
url_list.append('<a '+style+' href="' + escape(url)+'">'+ \
file_icon +'</a> ')</strong>
out += separator.join(url_list) + additional_str |
Then run:
echo "DELETE FROM bibfmt WHERE format='HB'" | /soft/cds-invenio/bin/dbexec
sudo -u apache bibreformat -c "YOUR_COLLECTION_NAME"
sudo -u apache bibsched |
Now your URL’s will be pointing to ezproxy_url instead of directly to the fulltext, which means users (which can login to EZProxy) will be available to access fulltexts from any IP.
You can see a working example in Zaguan repository.
Thanks a lot to all the CDS Support Team and specially to Samuele Kaplun.