CDS Invenio: avoid duplicate content on comments and other record tabs
Using Invenio 0.99.x and interested in SEO? You should definetly try to avoid duplicate title‘s! When viewing a record (for instance, http://yourinveniopage.com/record/XXXX), you will notice several tabs on top: Information, Discussion, Usage Statistics…
These links show different content, but the same <title>> Not great for SEO purposes. Use Google’s Webmaster Tools and you will notice this (Diagnose > HTML Suggestions > Duplicate title tags)

There are several ways to avoid duplicate titles. The easiest is to use robots.txt to avoid indexing, or even add a rel=”nofollow” (or even noindex) to the link tabs.
Using nofollow (or noindex) to avoid comments pages from being indexed
You just have to add a rel=”noindex,nofollow” to the html a tag.
Edit $PATH_TO_cds-invenio/lib/python/invenio/webstyle_templates.py
Search this:
elif label != _('Fulltext') and label != _('References') and label != _('Citations'): out_tabs += '<li%(class)s><a href="%(url)s">%(label)s</a></li>' % \ {'class':css_class, 'url':url, 'label':label}
Change it to:
elif label != _('Fulltext') and label != _('References') and label != _('Citations'): out_tabs += '<li%(class)s><a href="%(url)s" rel="noindex,nofollow">%(label)s</a></li>' % \ {'class':css_class, 'url':url, 'label':label}
And, as usual, do not forget to run to see changes:
inveniocfg --update-all; /etc/init.d/httpd restart
Changing the titles of comments pages to avoid duplicate titles
What if we want bots to index these pages? No worries, it can be done by hacking the code a bit.
For instance, lets refer to comments tab. Open /cds-invenio/lib/python/invenio/webcomment_webinterface.py and look for:
title, description, keywords = websearch_templates.tmpl_record_page_header_content(req, self.recid, argd['ln'])
Lets see how this line works (refer to last line):
python >>> import urllib from invenio.webcomment import check_recID_is_in_range, \ perform_request_display_comments_or_remarks,\ perform_request_add_comment_or_remark,\ perform_request_vote,\ perform_request_report from invenio.config import CFG_SITE_LANG, \ CFG_SITE_URL, \ CFG_SITE_SECURE_URL, \ CFG_WEBCOMMENT_ALLOW_COMMENTS,\ CFG_WEBCOMMENT_ALLOW_REVIEWS from invenio.webuser import getUid, page_not_authorized, isGuestUser, collect_user_info from invenio.webpage import page, pageheaderonly, pagefooteronly from invenio.search_engine import create_navtrail_links, \ guess_primary_collection_of_a_record, \ get_colID, check_user_can_view_record from invenio.urlutils import get_client_ip_address, \ redirect_to_url, \ wash_url_argument, make_canonical_urlargd from invenio.messages import wash_language, gettext_set_language from invenio.webinterface_handler import wash_urlargd, WebInterfaceDirectory from invenio.websearchadminlib import get_detailed_page_tabs from invenio.access_control_config import VIEWRESTRCOLL from invenio.access_control_mailcookie import mail_cookie_create_authorize_action import invenio.template webstyle_templates = invenio.template.load('webstyle') websearch_templates = invenio.template.load('websearch')>>> from invenio.webcomment import check_recID_is_in_range, \ ... perform_request_display_comments_or_remarks,\ ... perform_request_add_comment_or_remark,\ ... perform_request_vote,\ ... perform_request_report >>> from invenio.config import CFG_SITE_LANG, \ ... CFG_SITE_URL, \ ... CFG_SITE_SECURE_URL, \ ... CFG_WEBCOMMENT_ALLOW_COMMENTS,\ ... CFG_WEBCOMMENT_ALLOW_REVIEWS >>> from invenio.webuser import getUid, page_not_authorized, isGuestUser, collect_user_info >>> from invenio.webpage import page, pageheaderonly, pagefooteronly >>> from invenio.search_engine import create_navtrail_links, \ ... guess_primary_collection_of_a_record, \ ... get_colID, check_user_can_view_record >>> from invenio.urlutils import get_client_ip_address, \ ... redirect_to_url, \ ... wash_url_argument, make_canonical_urlargd >>> from invenio.messages import wash_language, gettext_set_language >>> from invenio.webinterface_handler import wash_urlargd, WebInterfaceDirectory >>> from invenio.websearchadminlib import get_detailed_page_tabs >>> from invenio.access_control_config import VIEWRESTRCOLL >>> from invenio.access_control_mailcookie import mail_cookie_create_authorize_action >>> import invenio.template >>> webstyle_templates = invenio.template.load('webstyle') >>> websearch_templates = invenio.template.load('websearch') >>> title, description, keywords = websearch_templates.tmpl_record_page_header_content('http://zaguan.unizar.es/record/6765',6765,'es') >>> print title Implementación de una pasarela entre el protocolo RT-WMP y TCP/IP | Trabajos academicos >>>
How-to fix duplicate title’s in comments
Edit webcomment_webinterface.py:
Look for:
title, description, keywords = websearch_templates.tmpl_record_page_header_content(req, self.recid, argd['ln'])
Add after:
title = _("Comments") + title
And run this from commandline:
inveniocfg --update-all; /etc/init.d/httpd restart
Perform in a similar fashion to fix other ‘Statistics’ or other pages
Related posts:
- CDS Invenio: query database to know a tag value from a record
- CDS Invenio: delete duplicate records in OAI verb = ListIdentifiers
- CDS Invenio: Get Record number (recid, sysno) from reference number
- CDS-Invenio: remove fulltext tab from HTML detailed
- CDS-Invenio: Exception caught: bibindex exception: Duplicate entry ’1-TEMPORARY’ for key 1
