Invenio 1: BibRank exception

I was getting several exceptions in Bibrank:

* 2015-01-15 08:32:17 -> NoOptionError: No option 'citation_loss_limit' in section: 'citation' (ConfigParser.py:618:get)

** User details
No client information available

** Traceback details 

Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/invenio/bibtask.py", line 606, in task_init
    ret = _task_run(task_run_fnc)
  File "/usr/lib64/python2.7/site-packages/invenio/bibtask.py", line 1146, in _task_run
    if callable(task_run_fnc) and task_run_fnc():
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank.py", line 159, in task_run_core
    func_object(key)
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank_tag_based_indexer.py", line 443, in citation
    return bibrank_engine(run)
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank_tag_based_indexer.py", line 356, in bibrank_engine
    func_object(rank_method_code, cfg_name, config)
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank_tag_based_indexer.py", line 68, in citation_exec
    dic, index_update_time = get_citation_weight(rank_method_code, config)
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank_citation_indexer.py", line 141, in get_citation_weight
    weights = process_and_store(updated_recids, config, chunk_size)
  File "/usr/lib64/python2.7/site-packages/invenio/bibrank_citation_indexer.py", line 157, in process_and_store
    citation_loss_limit = int(config.get(function, "citation_loss_limit"))
  File "/usr/lib64/python2.7/ConfigParser.py", line 618, in get
    raise NoOptionError(option, section)
NoOptionError: No option 'citation_loss_limit' in section: 'citation'

** Stack frame details

This was solved by updating /opt/invenio/etc/bibrank/citation.cfg with citation_loss_limit = 50 and I also included some more options:

[...]
reference_via_doi= 999C5a
reference_via_record_id= 990C50
reference_via_isbn= 999C5i
[...]
citation_loss_limit = 50
collections =

Then it was solved 🙂

CDS Invenio: bibrank exceptions [SOLVED]

The problem

Our Invenio was spitting exceptions like the following:

Forced traceback (most recent call last)
  File "/usr/lib64/python2.4/site-packages/invenio/bibrank.py", line
150, in task_run_core
    func_object(key)
  File
"/usr/lib64/python2.4/site-packages/invenio/bibrank_word_indexer.py",
line 1210, in word_similarity
    return word_index(run)
  File
"/usr/lib64/python2.4/site-packages/invenio/bibrank_word_indexer.py",
line 839, in word_index
    update_rnkWORD(options["table"], options["modified_words"])
Traceback (most recent call last):
  File
"/usr/lib64/python2.4/site-packages/invenio/bibrank_word_indexer.py",
line 1114, in update_rnkWORD
    Nj[j] = Nj.get(j, 0) + math.pow(Gi[t] * (1 + math.log(tf[0])), 2)
OverflowError: math range error

More precisely, this one is related to record 4573:

Error when analysing the record 4573 (((3350, 'x\x9cu\x93\xdbr\xdc
\x0c\x867\xa7\xa69\xf59z\xd9W\xd2b\xd9V\x02\x88\n\xd8\xcd\xb4/\x1f\xe4MvfA\xb90c\xe6\x1f\x1d\xf8\xf4\xeb\x7f~\xdc\xedvB\x18
F\x82\xf8\xfb\xba]\xe9J\x8f\xf6\xed\xf2C;V\x94\x08q\xc2\x7f\'\xf1\xfa,\xfeh\x87\xe3\x90j\xe9\xc3T\x99pf\t\x96\x128\x92\x03+[\xc0"\xe4\xfa\x98{m\x11\x17\xcaEN\xd2\xcbY\xbaU\t\nZe\x04s\xf5\xa5/\xf3S\xcb@\xa4\x99\xfdd\x15ZZ6\xa8\xefVB\x90B\xce\xf7\xca\x9d\x1e+\x1f\x07v\xf7\x1b\x84\xec0\x9a|\x94\x1c\x88\x99-\x81\xfb|\xd1\xcdE\xb6\x84b1\xd5\'\x81sU\xc0\x91\x95/\xd6\x80v\xa1b\xa0S\xa4\x1ed\xb1\xac\x00~a\xa1\xb2\x9ac\xc5\xf7\xd6\xdf\xd0\xc0\x06.Ba\xdb\nXV\x9e\xfa\xb7n\xf6\xa1yF\xb6\x9e:\xd7\xe8\n\xf1\xc0\xfbf\x9bl\xb2\xcaD<\x96\xaf\x80_\x17\x08\xa6\xd6\xf2\xc1b\xc3\xa9P\xe8#\x94\r\x05\x18\xd8h[\x8b\xc0D\xc6\xac\xef6S\xd5\x9c\xfbd\nt\x16\x08xdy\xb3\xccS
.FB\x95\xf6\xaf\xd8\x18\x0c\x8d\xebu?\x8cZ\xbb\x8e\xf4\xb7\x9aK\xc2\xfb\x

But a lot of records (4573, 4336, 4487, 4337, … ) were producing similar errors every time my scheduled bibrank task was run…

The fix

[Thank you so much, Samuele Kaplun!]

This is normally due to word similarities indexes that accumulated too much
errors and need to be re-balanced. You can safely discard these exceptions
(though it’s true they might quickly fill your mailbox 🙂 ).

Usually bibrank word similarity indexes are built in a fast way that works
most of the time, but is approximate and lead to some approximation errors
after a certain usage period.

To solve this, you usually just need to schedule (e.g. weekly) a bibrank -R to
rebalance these indexes.

Try e.g. with:

$ sudo -u apache /opt/cds-invenio/bin/bibrank -R -wwrd -s7d -uadmin

and see if this solve your problem.

(See also this linkwhere this is
also short-explained).