Institutional Repositories

The scientific, research and higher academic institutes generate significant number of internal research publications that contain very valuable and often detailed information such as data, observations, analysis, conclusions, recommendations, principles, best practices etc. These publications appear generally as journal articles, constitute their ‘intellectual capital’. As producers of research, the institutes are expected to possess their research output and provide internal and external access to the same. The publishers, whether learned society or commercial, consider international journal as a product with considerable commercial value in the knowledge society and keeps the subscription charges inflating. Owing to this the scientists, researchers and academicians are deprived of access to important core journals. These financial constraints of subscribing to print or online copy from for-profit companies tend to lead for alternate models in the light of emergence of the Internet.


The major issue or lacuna in scientific communication of India is poor citation of our publications in international journals. Two reasons often mentioned for this situation are inadequate access to international journals and lack of quality publishing channels for local research. Thus, two important and interrelated challenges before the scientific and academic community of India, as observed by T.B.Rajasekhar (2003) are:




i. How do we improve local access to global research? And

ii. How do we improve global access to local research?




To improve access to international scientific publications, e-journal consortia have emerged, for e.g. U.G.C. INFONET, INDEST, CSIR Consortia etc. These formal means of communication are further supported by ‘blogs’ that facilitate informal means of communication among researchers and academicians. Thus the digital publishing technologies brought a fundamental change in accessing scholarly communication that offers instant and wider access to research literature. However the focus of this article is on the second problem, i.e. how to globalize Indian scientific literature through Digital or Institutional Repositories (I.R.).








The technological change opened up another means to use digital technologies for the creator to be the publisher. A significant amount of information is available in electronic form and attempts have been made to provide access to these through network file sharing, web, etc. Optimum utilization and realization of this development will provide wider access to scholarly communication and reduce the problems of periodical subscriptions. This has lead to the development of open access publishing and institutional repositories as digital collections that capture, preserve and provide access to the intellectual output of institutional community. Open Access is online access to ‘public domain’ scientific literature, without access charge to readers or libraries, dispensing with financial, technical and legal barriers. Different experts and organization have defined it variously, e.g. Budapest Initiatives, Bethesda Statement, Berlin Declaration etc. The more appropriate definition is by “International Symposium on Open Access and the Public Domain in digital Data and Information for Science” (2003). ‘Public domain’ is defined in legal terms as sources and types of data and information whose uses are not restricted by statutory Intellectual Property Laws and are available for public for use without authorization. ‘Open access’ is defined as proprietary information that is made openly and freely available on the Internet or on the other media by the rights holder, but that retains some or all of the exclusive property rights that are granted under the statutory IP Laws. There are two main forms of open access –

(i) Open access publishing journals that make their articles accessible immediately on publishing. E.g. Public Library of Science (PLoS)

(ii) Author self-archiving, where authors make copies of their articles openly accessible, generally in a subject or institutional repository. A leading proponent of this school of thought is Sevan Harnard. Indeed in this type of open access the academicians and librarians have a vital role.

Several agencies are associated in promotion of open access initiatives at global level. Some significant contributors include: Budapest OAI, SPARC, Open Society Institute (OSI), Berlin Declaration, Wellcome Trust etc. Moreover, research funders are also expecting open access to the research funded by them. E.g. U.S. National Institute of Health, Wellcome Trust etc.

Open access movement coupled with open source licensing greatly benefits the scientific community at almost no cost. This has attracted the developed and developing countries alike and more and more interoperable, institutional archives are established that are internationally accessible via Internet for free. The institutes certainly will be benefited by sharing their research with others having similar research priorities and making their research ‘visible’ internationally. Therefore, many universities in the West have established freely available online institutional repositories. In India also, many institutions are planning and developing strategies to exploit open access publishing technologies and bring the research literature accessible to global online environment.


The precedents for institutional repositories are found in the subject-based repositories that became popular - particularly in the sciences - during the 1990s as an attempt to use the power of the Internet to provide an alternative and cheaper form of access to research literature. A leading proponent of this school of thought is Stevan Harnard. These repositories were based, on the model endowed by the creators of ArXiv.org (http://arXiv.org/), a repository based at Cornell University. Subject repositories have therefore evolved primarily as archives of published material united by discipline, and some of them also encourage the archiving of pre-refereed drafts. (Yeates, 2003). In fact from 2002 the I.R. development has got momentum in research and university libraries with the availability of open source digital library software like E-prints, DSpace etc.

A major development that facilitated the development of IRs has been the inter-operability framework provided by the Open Archives Initiative (OAI - www.openarchives.org). The framework, through the protocol for metadata harvesting (OAI-PMH), enables different digital libraries to expose their metadata for purposes of automatic metadata extraction from all participating digital libraries and building various services, including a central metadata index and search system. Several open source digital library software are available for setting up and manage institutional digital repositories.

Thus Institutional repositories have emerged primarily as “a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.” (Clifford Lynch, 2003).

Scholarly Archives / Institutional repositories is an established medium to communicate peer reviewed (post prints) and non-peer reviewed scholarly literature (preprints). Crow (2002) has made a substantial case for IRs in the Scholarly Publishing & Academic Resources Coalition (SPARC, Washington), Position Paper on the Case for Institutional Repositories and discussed the rationale for institutional repositories for university and colleges and identified it as two folded. i). New Scholarly Publishing Paradigm. ii) Institutional visibility and prestige. A convenient definition from Crow is “a digital collection capturing and preserving the intellectual output of a single or multi-university community”.

Joint Information Systems Committee (JISC, 2005) explained the concept of digital repository as:

“In simplest terms, a digital repository is where digital content, assets, are stored and can be searched and retrieved for later use. A repository supports mechanisms to import, export, identify, store and retrieve digital assets. Putting digital content into a repository enables staff and institutions to then manage and preserve it, and therefore derive maximum value from it. Digital repositories may include research outputs and journal articles, theses, elearning objects and teaching materials or research data.”

Institutional repositories have been described by Crow (2002) as having four key attributes:

institutionally defined;


cumulative and perpetual; and

open and interoperable



SPARC (2002) Institutional Repositories Checklist and Resource guide stated that “Such repositories:

• Provide a critical catalyst and component in reforming the system of scholarly communication by expanding access to research, reasserting control over scholarship by the academy, and bringing heightened relevance to the institutions and libraries that support them; and

• Have the potential to serve as tangible indicators of an institution's quality and to demonstrate the scientific, societal, and economic relevance of its research activities, thus increasing the institution's visibility, status, and public value.”

Subbaiah Arunachalam (2004), states that “archiving already published research in interoperable institutional archives greatly benefits global science at almost no cost. This can be done without changing established publishing practices and offers enormous opportunities for scientific and medical research in developing country like India.” Hence by organizing institutionally generated information accessible online for on campus and global users, institutions can get several advantages.

• Improved ‘visibility’ to the intellectual out put of institutions and the results of investment.

• Interoperable repository supports the researcher’s ability to search seamlessly; facilitate interdisciplinary research and discovery.

• The heterogeneous data of repositories can be mined for new thoughts with a global platform.

• Contribute for a sound protection and preservation of institutions’ intellectual property.

• It enhances access by removing access barriers, which in turn improves research capacity including the collaborative research.

• Brings together the intellectual output and the organization which otherwise get separated in conventional publishing system.

Advantages to faculty / contributors

• Digital archive of their research publications that will be accessible anywhere through Internet

• Improved citation of research publications as the repository will be interoperable (comply with OAI-PMH) and accessible globally.

• Preservation and control of one’s own publications

Robin Yeates (2003) observations also support this analysis; he stated the advantages of I.R. for users, for institutions and for all. Thus, the institutional digital repositories have a potential role in the structure of scholarly communication. Scholars, academic institutions and their libraries and publishers are stakeholders in the emerging initiatives.

Universities and research centres through out the world are actively planning and implementing institutional repositories. The review of literature indicates that a number of projects have been undertaken to digitize the intellectual output of institutions to preserve for wider access by present and future generations. The number of repositories is increasing very fast. A survey (ARL, 2006) of I.R. activities in ARL member institutions indicated that thirty seven of 87 responded (total123) ARL member libraries have an operational Institutional repositories while 31 are planning and 19 have no immediate plans.

In U.K., Joint Information Systems Committee (JISC) Digital Repositories Programme (2005) is initiating a programme of work to assist deployment of digital repositories within the learning and research communities. The report made a number of recommendations for incorporating shared content into a more managed repository framework.

According to the list of OpenDOAR, the number of digital repositories developed and operational so far are: Africa -7; Asia 37 (India 16; Japan15); Europe 401(Germany 108; U.K.89; Netherlands 43); Australia 49; New Zealand 9; Central America 5; North America 249; Canada 29; South America 38. However the number of depositories that are registered as data providers with OAI are only 556.

A significant emphasis on the role of institutional repositories in reforming scholarly publication has been made in reports for SPARC (Scholarly Publishing & Academic Resources Coalition) (Crow, 2002) and for PALS (Publisher and Library/Learning Solutions) (Ware, 2004). The contribution of Peter Suber (2005) towards open access movement through ‘Peter Suber’s Open Access News blog’ is praiseworthy. He tried to create awareness among the academicians, librarians and management for promotion of open access. Several others focus their research on I.R. like Clifford Lynch, Bailey, Gibbons, Nixon, and Crow et al to name a few.


Basic issues to be considered while planning to develop an Institutional repositories include:

Users: Who are we aiming at IR? For any endeavour to survive it must have an interested audience which is willing to use and evolve with it. The audience is paramount for the growth of an IR. In a typical university environment the faculty and researchers are the contributors and users of I.R. Hence it is essential for LIS professionals to assess the contributions of faculty in terms of quality and quantity; create awareness among them on the role they have to play in the creation of I.R.

Content issues: “Content-related issues will be crucial to the success of these repositories, be they in universities or some other institutional setting. To date, however, there has been remarkably little consideration given to the issue of exactly what type of material might be suitable for inclusion in repositories; who should be responsible for selecting this material; or how the task of content development for repositories might relate to other content selection responsibilities managed by a library.” (Paul Genoni, 2004).

SPARC (2002) has also endorsed the need to take a broad view of the material that might possibly be included in repositories. Its Institutional Repositories Checklist and Resource Guide suggested for published material, grey literature, preprints, curriculum support and teaching material, electronic theses and dissertations to be included in digital repository collections. Crow (2002) opined that “... content may include pre-prints and other works-in-progress, peer-reviewed articles, monographs, enduring teaching materials, data sets and other ancillary research material, conference papers, electronic theses and dissertations, and gray literature.”

The content in institutional repositories needs to be more diverse than is appropriate for subject-based repositories (Genoni et al., 2004). Therefore the content of I.R. includes both formal and informal content and generally holds preprints; working papers; theses and dissertations; research and technical reports; conference proceedings; departmental and research center newsletters and bulletins; papers in support of grant applications; status reports to funding agencies; committee reports and memoranda; statistical reports; technical documentation;

How to create digital content? If the material is already in digital form then its all done. Otherwise we have to create them and control the quality throughout. This requires the expensive equipment, the trained operators. A combination of the above is most probable, but care is required of the pitfalls (like file size and format) that flow from seemingly obvious decisions at the early stages.

IPR Issues: Posting to I.R. a pre published article may involve copyright issues that has legal implications. "Three different organizations – MIT, Science Commons (through its Scholar's Copyright project), and SPARC –have worked with lawyers to develop self-sufficient addenda that address these issues. These addenda can be attached to the publishing contracts received by publishers and are likely to be legally binding." (Hirtle, 2006). Subbaiah Arunachalam focuses on the issue and opines that ‘there is no need to surrender copyright to work one has performed to a journal publisher. One can always negotiate with the publishers with alternate contracts. (LIS FORUM, 16TH December 2006)

The RoMEO Project (Rights MEtadata for Open archiving), funded by the Joint Information Systems Committee and the new Project called Partnering on Copyright investigate the rights issues surrounding the 'self-archiving' of research. Sherpa Romeo publisher copyright policies and self archiving provides a summary of permissions that are normally given as part of each publisher's copyright transfer agreement. (http://www.sherpa.ac.uk/romeo.php)

Creative Commons, a non-profit organization, provides free tools that let authors, scientists, artists, and educators easily mark their creative work (websites, scholarship, music, film, photography, literature, courseware etc.) with the freedoms they want it to carry i.e. to build a layer of reasonable, flexible copyright instead of restrictive default rules. One can use CC to change copyright terms from "All Rights Reserved" to "Some Rights Reserved." (http://creativecommons.org/)

The GNU General Public License, often called the GNU GPL is used by most GNU programs and free software packages. Its aim is to give all users the freedom to redistribute and change GNU software (e.g.GNU E Prints) and keep them as ‘copy left’ (http://www.gnu.org/licenses)

Technical Issues: Which of the technologies to use? Certainly the best of the available. But what is best is always a difficult factor with the emerging trends in high technologies. Basic system design requires server platform support and web server support.

Hardware requirements: The following is a suggestive list of pre requisites:

Computer: Intel Pentium 4, Clock Speed 3.0 Ghz;

1 MB L2 Cache; 800 MHz FSB

512 MB PC 2700 ECC DDR RAM


2X 10/100 Mbps LAN Card


Intel Advance Graphic cards; Integrated audio

Serial and Parallel I/O Ports

WINDOWS software

Scanner: A4 Size, Flat Bed, 24 Bit Colour Scanner

4800x9600 dpi; USB Connectivity

Bundled with OCR Software

Printer: HP Laser jet 1020


Software: Selection of digital library software that would provide access to full text of articles is a critical step. Open Society Institute (2004) has published “A Guide to Institutional Repository Software” that discusses software available via open source license, complies with latest version of OAI PMH. The list includes Archimede, Arno, Cern Document Server Software, DSpace, Eprints, Fedora, i-Tor, MyCoRe, OPUS.

Software available via open source license, complies with latest version of OAI PMH:

• Archimede- Canadian software solution for institutional repositories


• Arno - Academic Research in the Netherlands Online, Tilburg University, The Netherlands. http://www.uba.uva.nl/arno

• CDSware - CERN Document Server Software, Geneva


• DSpace - MIT Libraries and the HP Labs, USA


• Eprints - University of Southampton, U.K


• Fedora-digital object repository management system, University of Virginia, USA http://www.fedora.info

• i-Tor-Tools and technologies for Open Repositories, Netherlands Institute for Scientific Information Services. http://www.i-tor.org/en/toon

MyCoRe - Essen Univ. Library, Univ. of Duisburg-Essen, Germany. http://www.mycore.de/engl/index.html

• OPUS- Online Publications University of Stuttgart is used by several other German universities http://elib.uni-stuttgart.de/opus

The software features of digital repositories include:

• Capture and describe digital material using a workflow

• Provide interface for online submission of research material (Intranet)

• Provide access to this material over the web (metadata and/or full pub)

• Preserve digital material over long period of time

• Expose metadata through OAI-PMH protocol


– Default: Unqualified Dublin Core

– Other metadata standards

However in India DSpace, E-Prints, Greenstone are more in use and one should get training in the chosen software. All these have the required features of supporting all document formats, multilingual, indexing, search and retrieval, access management and security, usage management and reporting etc. The choice of software depends on the size and nature of collection, technologies and technical expertise available, nature and strength of users and the services to be offered.

Network: Intranet and internet facilities are essential for implementation of IRs to provide internal and external access. Nowadays almost all libraries have the Internet connectivity and Intranet through campus network as they are providing access to e-consortia. The existing provisions can be utilized for connecting the digital repository to the stake holders.

H.R. Issues: The most important resource for the whole exercise is staff time and expertise. Creating and running a Digital Library is complex task of technical and professional skills. It involves librarianship and computing along with a fair amount of marketing and administration. (Noerr, 2000). Such things as data and record conversion with precision, metadata creation etc are skillful operations. Skills required for demonstration of digital library for different environments. The stake holders of I.R. are libraries, faculty and researchers, and funding agencies. Hence coordination is necessary among them to achieve the expected performance.

Workflow and Process: The workflow or process of I.R. include user identification, document submission, content selection, moderation, organization, archiving, networking and interoperation, content access and usage monitoring, content maintenance and preservation.


Basic workflow of Digital Repository


User Identification / Membership

Document submission /content development

Review / moderation / approval

organization / archiving

Networking / interoperability

Content access and delivery

Usage monitoring

Contenet maintenance and preservation



Access and Use of Digital repositories: What type of delivery will be offered? Of course the web is the primary thing. Internal visibility of research output and access to the same can be made available through Intranet or the existing campus network. External access is possible through Internet. The organizations that support the cause of open access to I.Rs are:

1. Open Archive Initiative (OAI): The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. Data providers who support the OAI-PMH may choose to list their repository in the OAI registry. (http://www.openarchives.org/data/registerasprovider.htm)

OAIster is a project of the University of Michigan Digital Library Production Service. The aim is to create a collection of previously difficult-to-access, academically-oriented digital resources and updating weekly. It holds 9,931,910 records from 726 institutions (updated 21 Dec. 2006)

2. Open Directory of Open Access Repositories (OpenDOAR): OpenDOAR is an authoritative directory of academic open access repositories. It provides a simple repository list, and allows search for repositories or search repository contents. OpenDOAR is now presents a trial search service for the full-text of material held in open access repositories listed in the Directory. This has been made possible through the recent launch by Google Custom Search Engine, which allows OpenDOAR to define a search service based on the Directory holdings. (http://www.opendoar.org/index.html)

3. Scientific Commons: ScientificCommons is a project of the University of St.Gallen (Switzerland) and hosted and developed at the Institute for Media and Communications Management. It aims to provide the most comprehensive and freely available access to scientific knowledge on the internet. ScientificCommons.org aims to provide the most comprehensive and freely available access to scientific knowledge on the internet. (http://en.scientificcommons.org)

4. Registry of Open Access Repositories (ROAR), maintained by Tim Brody of University of Southampton works under cretive commons licensing policy. Earlier it was Institutional Arhives Registry and changed its name in 2006. (http://roar.eprints.org)

5. SCIRUS: Institutional repositories of digital data at universities and other research institutions may now receive deeper, more thorough indexing and full-text delivery through Elsevier’s free, sci-tech search engine, Scirus (http://www.scirus.com). The Scirus engine already reaches content at many institutional repositories, but those joining the new Scirus Repository Search service will receive more extensive and sophisticated indexing of a wider range of content. (http://www.scirus.com/ Scirus)

Standards: The efforts for development of standards for digital repositories are broad and varied. This has resulted in the emergence of impressive number of competing standards. The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate efficient dissemination of content. (http://www.openarchives.org/). Two important documents from OAI that provides guidance and standards for I.R.s are:

• Open Archives Initiative Protocol for Metadata Harvesting - Version 2.0 (2002)

• Open Archives Initiative Protocol for Metadata Harvesting Implementation Guidelines (http://www.openarchives.org/OAI/2.0/guidelines.htm)

Another important document is ‘Best Practices for OAI Data provider Implementations and shareable metadata; A joint initiative between the Digital Library Federation and the National Science Digital Library’, (2006) (http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?)

Technical Standards (V.1.1) for JISC Information Environment provides a list of the key standards and protocols that make up the JISC IE technical architecture. Further, JISC supports the work of both UKOLN and CETIS (Centre for Educational Technology and Interoperability Standards), which play active roles in the creation, maintenance and deployment of open standards.

Evaluation: As the boundaries of institutional or digital repositories become more clearly defined and expressed, there is a greater need to have useful methods for evaluating repository software applications and the role they play in the broader context of repository services. Regarding digital preservation specifically, the 2005 RLG/NARA Audit Checklist for the Certification of a Trusted Digital Repository, Draft for Public Comment (Audit Checklist) is under consideration for determining an institution's ability to be a Trusted Digital Repository. The NDIIPP-sponsored ECHO DEPository project is proposing a framework of evaluation for repository software applications based on the Audit Checklist in conjunction with a common software evaluation scoring methodology. (Kaczmarek, Joanne, 2006). The Fourth DELOS Workshop on Evaluation of Digital Libraries stressed the need for Testbeds, Measurements, and Metrics for evaluation of DLs. (Hungarian Academy of Science, 2002)

Implementation issues: There are few barriers to system implementation

– No exotic equipment

– Use existing staff expertise, that is limited

– Inadequate support from User groups and community

There are a number of hurdles for content gathering too

– Inertia of existing practice

– Need to demonstrate author benefits

6. ISSUES FOR DISCUSSION: In India there are only 17 IRs (Fernandez, 2006) out of hundreds of research and academic institutions. There are discussions to mandate open access for all public funded research output. The open access movement is gaining momentum in India also with the efforts of Subbaiah Arunachalam (MSSRF), A.R.D.Prasad (DRTC), Sukhdev Singh (NIC), Late T.B.Rajasekhara (NCSI, IISc), Francis Jayakant (NCSI, IISc) et al to name a few.

Issue 1: In spite of its immense value for the present and future information distribution, digital repositories are not getting the patronage of academic and scientific community the way it ought to be. Fernandez (2006) after an evaluation of the digital repositories in India stated “there are many institutions of national importance which do not provide open access to their research.” The reasons for this situation may be:

• Non availability of Internet Connectivity and other infrastructure at work place;

• Uncertainty of copyright issues;

• Confusion and doubts over the development and maintenance of depositories;

• Lack of technical knowledge;

• No specific advantage or immediate personal benefit;

• It is not mandatory, rather optional.

As Ramesh C.Gaur (2006) puts it there is a need for national level mechanism to promote and coordinate open access and public domain digital library systems.

Issue 2: In fact the year 2006 is year of open access mandate in many countries as eight research councils in UK, Germany’s DFG, Austria's FWF, two agencies in Australia, China's Ministry of Science and Technology, Canadian Breast Cancer Research Alliance, France's CNRS, and Infremer, Sweden's BIBSAM, and the US National Endowment for the Humanities have adopted OA mandates (Peter Suber, 2007). 100% O.A. is possible if mandated by research institutions and funders (Harnard, 2006). There is no evidence of such attempts in India by any of the government agencies or research institutes. Therefore it is high time for the academic and scientific funding agencies / institutions make it mandatory to deposit the research out put in institutional repositories. Further there is need to develop some best practices to design, develop and manage I.R.s in the context of information environment of the country. (OADL Wiki, ‘Sukhdev’s World’ (blog), Dec.19, 2006).

On their part, the LIS professionals have to be proactive to the situation and develop interoperable IRs besides creating awareness among its user community. Through digital repositories the existing library and information system can redefine its role in knowledge society and the role of librarians and information specialists.





