A few posts back I talked about restricting access to fulltexts to an iprange. In those articles I gave some tips about how websubmit works. Now I want to do another mod to my CDS Invenio repository, so I needed to have a deeper understanding in websubmit workflow.
Before we begin this travel through websubmit it is a good idea to define some terminology I’ll be using:
Recid: it is the registry number of a record. For instance, for record http://zaguan.unizar.es/record/2000 registry number is equals to 2000. This number is stored into marcxml’s 001 tag.
Report_Number: it is another way to identify some record. For the example above, this number is INPRO–2009-038 and it is stored into marcxml’s 037 tag. It is also called reference or rn.
access: this is a randomly-generated? (not so sure its totally random) number which is created every time you begin with an action. You can see it in the URL when, for instance, you submit a record. It is something like this: 1260870578_1753 and it is NOT stored in marcxml, but it is in the database (for instance in field id of table sbm_SUBMISSIONS).
act: the action you are executing. This parameter can be seen in your browser’s URL. Invenio comes with several pre-made actions, like:
SBI for submit new record
APP for approve submitted record
MBI for modify existing (this is, already approved!!) record
SRV for changes in attached fulltext files.
doctype: the document type to which act is applied. This can be seen in browser’s URL too. doctype refers to the string in brackets that you can see in your websumit’s admin menu. For instance, [DEMOBOO].
indir: each action is connected directly with a system’s directory. For example, for action MBI (modify existing record) the working directory is modify. This is the value of indir parameter. Further details can be read in the following section.
You can see your defined actions in
Each action is connected to a working directory. For instance, SBI is attached to running directory, which is located under
By default, CDS Invenio comes with an example of a referee’d doctype, which executes this functions. Along with the functions are the values of each step and score.
Create_Recid 1 10
Report_Number_Generation 1 20
Make_Dummy_MARC_XML_Record 1 30
Move_Files_to_Storage 1 40
Mail_Submitter 1 50
Update_Approval_DB 1 60
Send_Approval_Request 1 70
Print_Success 1 80
Move_to_Pending 1 90
The SUBMIT NEW RECORD (SBI) workflow
Suppose we enter our repository, log in, and then click “submit” and select a referee’d doctype (in my case doctype=PFC). Then a form appears. What is really happening?
Well, websubmit_engine.py and websubmit_webinterface.py are working in the shadows.
Here is what really happens:
websubmit_engine creates a new access number.
websubmit_webinterface creates a new directory using several parameters. Lets see an example:
So, the system creates $PATH_TO_CDS/var/data/storage/running/PFC/1260870578_1753
Then it copies to that directory (from now on, called curdir) all of the parameters shown in your browser’s URL (indir, doctype, access, blablabla). Click the image below to see fullsize:
It is important to note that none of the PFC-SBI’s functions have been still executed!.
Now the user begins to fill the submit form. When it is filled, the “submit” button is pushed. This is the moment in which PFC-SBI’s functions begin to run..
Lets see what happens before Move_To_Done is executed:
All of the form’s fields are stored in curdir. If your form has a field called PFC_AUTHOR with value ‘Miguel Martín’ a file called PFC_AUTHOR is created in curdir and it contains the string ‘Miguel Martín’. Since Create_Dummy has also been executed, a file called dummy_marcxml is also created (Make_Dummy_MARC_XML_Record functions takes into account the $doctype.tpl and $doctypeCREATE.tpl files and builds your dummy_marcxml according to that information).
Once this is done a “Congratulations! blablabla” message appears to the user, as a result of the execution of Print_Success. It seems that all of the submitting process is over, but it really isn’t.
Something ‘unexpected’ happens:
Move_to_Pending function moves your curdir/* files to /var/data/submit/storage/pending/doctype/Report_Number directory! This is, in my case:
Here we can take a detailed look at Move_To_Pending.py:
from invenio.config import CFG_WEBSUBMIT_STORAGEDIR
from invenio.websubmit_config import InvenioWebSubmitFunctionError
def Move_to_Pending(parameters, curdir, form, user_info=None):
doctype = form['doctype']
PENDIR = "%s/pending/%s" % (CFG_WEBSUBMIT_STORAGEDIR,doctype)
if not os.path.exists(PENDIR):
raise InvenioWebSubmitFunctionError("Cannot create pending directory %s" % PENDIR)
# Moves the files to the pending directory
rn = rn.replace("/","-")
namedir = rn
FINALDIR = "%s/%s" % (PENDIR,namedir)
As shown above, this function creates, if not exists and for my example:
rn variable means Report Number, this is, PFC–2009-005. So it moves all the contents in the curdir to
*** Edit: lets see some of MBI (modify record AFTER it is approved) pipeline.
APP (approve record)
When a record has been submitted and then approved some information is stored in
/var/data/submit/storage/done/PFC/$Report_Number and a new entry is created in MySQL database, more precisely in table
This table stores a lot of information. Lets see what is stored for some record (for instance, for the one which Report_Number=
SELECT * FROM sbmSUBMISSIONS WHERE reference = 'PFC-2009-018';
Lets see the stored information and fields.
||who performed the action? (It was me!).
||the doctype of the document to which the action has been made (PFC
||the action made to that doctype (SBI, MBI, APP, SRV).
||the status of the task, in this case, finished (could be pending
|| the access (number) of the task: 1245755804_15279
|| (report number or rn)
|| The date of
cd. This is, the date of creation of
|| The date of
md. This is, the date of creation of
The fulltext document (in pdf or whatever format you use) is stored in
/soft/cds-invenio/var/data/submit/storage/pending/PFC/PFC-2009-018/files/PFC_MEM/PFC-2009-018.pdf but a compressed version is also stored into
MBI (modify AFTER approval)
The functions that are run are listed below:
Wow, a lot of functions! I will comment only the ones which are more important and hard to understand.
I’ll begin with Create_Modify_Interface:
This functions reads the stored metadata of a record and creates an interface to modify the fields that user selects. Where does the Create_Modify_Interface function read the stored values? Is it from database? Is it from the system directories?. Well, it depends: this function goes into curdir and looks for a file named
If it exists, then the record metadata is loaded from FILES (curdir system directory) using Create_Modify_Interface_getfieldval_fromfile.
If not exists, then the record metadata is loaded from DATABASE, using Create_Modify_Interface_getfieldval_fromDBrec.
**** WILL CONTINUE WRITING THIS POST IN FUTURE DAYS ****