CDS Invenio: avoid duplicate content on comments and other record tabs

Using Invenio 0.99.x and interested in SEO? You should definetly try to avoid duplicate title‘s! When viewing a record (for instance, http://yourinveniopage.com/record/XXXX), you will notice several tabs on top: Information, Discussion, Usage Statistics…

cds invenio tabs

These links show different content, but the same <title>> Not great for SEO purposes. Use Google’s Webmaster Tools and you will notice this (Diagnose > HTML Suggestions > Duplicate title tags)
Duplicate titles CDS Invenio (SEO)

There are several ways to avoid duplicate titles. The easiest is to use robots.txt to avoid indexing, or even add a rel=”nofollow” (or even noindex) to the link tabs.

Using nofollow (or noindex) to avoid comments pages from being indexed

You just have to add a rel=”noindex,nofollow” to the html a tag.

Edit $PATH_TO_cds-invenio/lib/python/invenio/webstyle_templates.py

Search this:

elif label != _('Fulltext') and label != _('References') and label != _('Citations'):
                    out_tabs += '<li%(class)s><a href="%(url)s">%(label)s</a></li>' % \
                                {'class':css_class,
                                 'url':url,
                                 'label':label}

Change it to:

elif label != _('Fulltext') and label != _('References') and label != _('Citations'):
                    out_tabs += '<li%(class)s><a href="%(url)s" rel="noindex,nofollow">%(label)s</a></li>' % \
                                {'class':css_class,
                                 'url':url,
                                 'label':label}

And, as usual, do not forget to run to see changes:

inveniocfg --update-all; /etc/init.d/httpd restart

Changing the titles of comments pages to avoid duplicate titles

What if we want bots to index these pages? No worries, it can be done by hacking the code a bit.

For instance, lets refer to comments tab. Open /cds-invenio/lib/python/invenio/webcomment_webinterface.py and look for:

title, description, keywords = websearch_templates.tmpl_record_page_header_content(req, self.recid, argd['ln'])

Lets see how this line works (refer to last line):

python
 
>>> import urllib
from invenio.webcomment import check_recID_is_in_range, \
                               perform_request_display_comments_or_remarks,\
                               perform_request_add_comment_or_remark,\
                               perform_request_vote,\
                               perform_request_report
from invenio.config import CFG_SITE_LANG, \
                           CFG_SITE_URL, \
                           CFG_SITE_SECURE_URL, \
                           CFG_WEBCOMMENT_ALLOW_COMMENTS,\
                           CFG_WEBCOMMENT_ALLOW_REVIEWS
from invenio.webuser import getUid, page_not_authorized, isGuestUser, collect_user_info
from invenio.webpage import page, pageheaderonly, pagefooteronly
from invenio.search_engine import create_navtrail_links, \
     guess_primary_collection_of_a_record, \
     get_colID, check_user_can_view_record
from invenio.urlutils import get_client_ip_address, \
                             redirect_to_url, \
                             wash_url_argument, make_canonical_urlargd
from invenio.messages import wash_language, gettext_set_language
from invenio.webinterface_handler import wash_urlargd, WebInterfaceDirectory
from invenio.websearchadminlib import get_detailed_page_tabs
from invenio.access_control_config import VIEWRESTRCOLL
from invenio.access_control_mailcookie import mail_cookie_create_authorize_action
import invenio.template
webstyle_templates = invenio.template.load('webstyle')
websearch_templates = invenio.template.load('websearch')>>> from invenio.webcomment import check_recID_is_in_range, \
...                                perform_request_display_comments_or_remarks,\
...                                perform_request_add_comment_or_remark,\
...                                perform_request_vote,\
...                                perform_request_report
>>> from invenio.config import CFG_SITE_LANG, \
...                            CFG_SITE_URL, \
...                            CFG_SITE_SECURE_URL, \
...                            CFG_WEBCOMMENT_ALLOW_COMMENTS,\
...                            CFG_WEBCOMMENT_ALLOW_REVIEWS
>>> from invenio.webuser import getUid, page_not_authorized, isGuestUser, collect_user_info
>>> from invenio.webpage import page, pageheaderonly, pagefooteronly
>>> from invenio.search_engine import create_navtrail_links, \
...      guess_primary_collection_of_a_record, \
...      get_colID, check_user_can_view_record
>>> from invenio.urlutils import get_client_ip_address, \
...                              redirect_to_url, \
...                              wash_url_argument, make_canonical_urlargd
>>> from invenio.messages import wash_language, gettext_set_language
>>> from invenio.webinterface_handler import wash_urlargd, WebInterfaceDirectory
>>> from invenio.websearchadminlib import get_detailed_page_tabs
>>> from invenio.access_control_config import VIEWRESTRCOLL
>>> from invenio.access_control_mailcookie import mail_cookie_create_authorize_action
>>> import invenio.template
>>> webstyle_templates = invenio.template.load('webstyle')
>>> websearch_templates = invenio.template.load('websearch')
 
>>> title, description, keywords = websearch_templates.tmpl_record_page_header_content('http://zaguan.unizar.es/record/6765',6765,'es')
>>> print title
Implementación de una pasarela entre el protocolo RT-WMP y TCP/IP  |  Trabajos academicos
>>>

How-to fix duplicate title’s in comments

Edit webcomment_webinterface.py:

Look for:

title, description, keywords = websearch_templates.tmpl_record_page_header_content(req, self.recid, argd['ln'])

Add after:

title =  _("Comments") + title

And run this from commandline:

inveniocfg --update-all; /etc/init.d/httpd restart

Perform in a similar fashion to fix other ‘Statistics’ or other pages ;)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Post Navigation