Download Free Software Amherst Graduate Programsrubackup

The following material is available for download from the CIIR. It is provided without warranty and without support. If there are problems accessing or using any of this material, we would appreciate being told (info at ciir.cs.umass.edu), in case we can address the issue.

Lemur Project Downloads - June 2020

Develop the knowledge and gain the practical experience to solve problems for the common good and become a leader in government, the nonprofit world, or the private sector with a one- or two-year program, open to full- and part-time students. UMass Graduate Student 1 UMass Undergraduate Student 1 UMass Faculty Member 420 UMass staffMember 106 Undergraduate Student on CHCS 106 other 93 UMass Library Staff Þ'Þmber 60 Grand Total 1000 1500 2000 Number of Records 2500 1500 1200 450K 4,315 - 4500 500K 100% 5000 Timedout 1,238 - Grand Total 1000 Status Returned status returned 0=FaIse. Folsom-based PowerSchool made a major addition on Monday with the acquisition of Schoology, a learning management system software used by millions of students a.

The Lemur Project (an NSF-funded collaboration with CMU and the CIIR) develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software. The project is best known for its Indri search engine, Lemur Toolbar, and ClueWeb09 dataset. Lemur software and datasets are available at: http://www.lemurproject.org.

Open-Retrieval Conversational Question Answering - May 2020

Chen Qu, Liu Yang, Cen Chen, Minghui Qiu, W. Bruce Croft, Mohit Iyyer

The OR-QuAC dataset enhances QuAC by adapting it to an open-retrieval setting. It is an aggregation of three existing datasets:

(1) the QuAC dataset that offers information-seeking conversations,
(2) the CANARD dataset that consists of context-independent rewrites of QuAC questions, and
(3) the Wikipedia corpus that serves as the knowledge source of answering questions.

Link to the dataset: https://ciir.cs.umass.edu/downloads/ORConvQA/

Incorporating Hierarchical Domain Information to Disambiguate Very Short Queries - Dataset - ICTIR - October 2019

Hamed Bonab, Mohammad Aliannejadi, John Foley, James Allan

This data is prepared based on 50 of NIST relevance assessors queries from TREC 2017 Core Track (https://trec-core.github.io/2017/) data collection. It has over 1.8 million New York Times (NYT) annotated articles, containing nearly every published article between January 1, 1987, and June 19, 2007. We provide Rel and RetRel query with domain pairs for each query. We also provide the NYT corpus hierarchy in xml file format. For more information please refer to the published paper.

Link to the paper: https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1313
Link to the dataset: https://ciir.cs.umass.edu/downloads/ictir19_nyt_hierarchy/

Simulating CLIR Translation Resource Scarcity using High-resource Languages - Dataset - ICTIR - October 2019

Hamed Bonab, James Allan, and Ramesh Sitaraman

This data is prepared based on 200 queries of the Cross-Language Evaluation Forum (CLEF) 2000-2003 campaign for bilingual ad-hoc retrieval tracks (http://catalog.elra.info/en-us/repository/browse/ELRA-E0008/). The Swahili and Somali queries are the translation of English queries from C001-C200 topic set. We hired a translation organization to translate the title and description of each topic into Somali and Swahili. For more information please refer to the published paper.

Link to the paper: https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1357
Link to the dataset: https://ciir.cs.umass.edu/downloads/ictir19_simulate_low_resource/

ANTIQUE: A Non-Factoid Question Answering Benchmark - May 2019

This dataset contains 2626 non-factoid questions and 34k manual judgments ranged from 1 to 4. All questions and answers are collected from a CQA website. Note that the test relevance labels were obtained through depth-k pooling (k=10).

Link to the dataset: https://ciir.cs.umass.edu/downloads/Antique/
Link to the paper: https://arxiv.org/abs/1905.08957

PsgRobust - Answer Passage Retrieval Dataset - December 2018

Keping Bi, Qingyao Ai, W. Bruce Croft

PsgRobust is an answer passage collection built based on the Robust04 collection without manual annotation. It was built for research on iterative relevance feedback based on two assumptions:

1. For the passages that are ranked in the top positions by a powerful ranker, if they are in the relevant documents, we assume that they are relevant.
2. All the passages in the non-relevant document are irrelevant.

There are 383036 passages from 22403 unique documents in total, and 6589 relevant passages in total for the 246 queries, which are from 3544 documents. For detailed information, please check the readme.txt (linked to https://ciir.cs.umass.edu/downloads/PsgRobust/readme.txt) file and our paper.

Keping Bi, Qingyao Ai, W. Bruce Croft. 'Iterative Relevance Feedback for Answer Passage Retrieval with Passage-level Semantic Match.' in the Proceedings of the European Conference on Information Retrieval (ECIR 19), Cologne, Germany, April 14-18, 2019, pp. 558-572.

Link to the dataset: https://ciir.cs.umass.edu/downloads/PsgRobust/
Link to the paper: https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1295

Citation Worthiness Dataset - SIGIR - July 2018

Hamed Bonab, Hamed Zamani, Erik Learned-Miller, James Allan

This data is prepared based on SEPID corpus (http://pars.ie/lr/sepid-corpus). All the sentence IDs are consistent with the SEPID corpus, in case you need to do other interesting research with this data. It includes sentence-level segmentation of 10,921 articles from ACL ARC 1.0, up to February 2007.

Citation worthiness dataset contains 85,778 sentences with the 'cite' label and 1,142,275 sentences with the 'not_cite' label. 10% of data (test data) randomly selected and divided based on the section name into 7 chunks. See the paper for more explanation.

Link to the paper: https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1314
Link to the dataset: https://ciir.cs.umass.edu/downloads/sigir18_citation/

Hierarchical Embedding Model for Personal Product Selection - April 2017 Software

Qingyao Ai, Bruce Croft

This software is an implementation of the Hierarchical Embedding Model deep neural network trained to learn personal product selections through user product queries and selections. The probability of a product item being purchased by a user with a query is computed using their corresponding latent representations.

The software is Python and Java based, using Python for neural net processing via Tensorflow and Java Galago indexing and search software from the Lemur Project.

See the README.txt file for this release or the following paper for more details.

DeepMerge: Merging Multiple Search Result Lists - December 2015

C.J. Lee, Bruce Croft

nfL6: Yahoo Non-Factoid Question Dataset - November 2015

Daniel Cohen, Bruce Croft

This dataset contains 87,361 questions derived from the Yahoo Webscope L6 collection and their corresponding best, and additional, answers submitted by users. Only the best answer was reviewed in determining answer quality.

The dataset file is in JSON format and may be downloaded in rar or gzip compressed format.

'Search Engines' Galago Source Code

Trevor Strohman

An old and now obsolete version (1.04) of the Galago Java source code that was referenced as a learning resource in the textbook 'Search Engines: Information Retrieval in Practice' by Croft, Metzler and Strohman (2009).

It has been moved from its former Google Code repository to here.

Web Annotated Passages - September 2015 Dataset

Liu Yang, Bruce Croft

This dataset contains 7,499 documents with four grades of sentence level relevance judgment annotations for 82 queries derived from the TREC Gov2 web collection. The dataset is archived in rar or gzipʼed tar file formats. The dataset is described and used in Keikha, Park and Croft, SIGIR 2014.

Wikipedia Bullet Points June 2013 Dataset

John Foley, James Allan

This dataset contains over 40,000 bullet-point 'facts' mined from English Wikipedia year pages in the June 2013 english XML dump. (3.3M, gzipped JSON)

Twitter June - July 2014 Dataset

Nada Naji, James Allan

This dataset consists of 71,564,914 Twitter IDs that have been automatically crawled over the period from mid June through early July 2014.

Query Facet and Facet Feedback Annotations

Weize Kong, James Allan

This dataset consists of query facet and facet feedback annotations used in work, 'Weize Kong and James Allan. Extending Faceted Search to the General Web. CIKM 2014'.

KB Bridge Entity Linking System

Jeffrey Dalton, Pat Verga, Laura Dietz

KB Bridge is an entity linking system which identifies named entities in free text and links them to entries in a semistructured knowledge base, such as Freebase or Wikipedia. See also Dalton, J. and Dietz, L., 'A Neighborhood Relevance Model for Entity Linking,' OAIR 2013.

Download Urbackup Client

Online Appendix for Entity Query Feature Expansion with Knowledge Base Links

Jeffrey Dalton, Laura Dietz, James Allan

This online appendix provides additional material for entity query feature expansion, such as additional gold standard annotations, produces rankings, entity-based features, and software. See also Dalton, J., Dietz, L. and Allan, J., 'Entity Query Feature Expansion using Knowledge Base Links,' SIGIR 2014.

Controversy Annotation Dataset

Shiri Dori-Hacohen, James Allan

This collection consists of controversy annotations for 445 webpages and 2060 Wikipedia articles. See also Dori-Hacohen, S. and Allan, J., 'Detecting Controversy on the Web,' CIKM 2013; Dori-Hacohen, S. and Allan, J., 'Automated Controversy Detection on the Web,' ECIR 2015.

Open Library

Henry Feild

This collection consists of 46,561,553 metadata records crawled from the Open Library on November 30, 2011 and click distributions over records for 22,622 queries recorded over the year October 2010 through September 2011.

RETAS OCR Evaluation Dataset

Zeki Yalniz, R. Manmatha

This dataset was created to evaluate the optical character recognition (OCR) accuracy of scanned books. It is provided here for research purposes. The dataset is extracted from books in Project Gutenberg and the Internet Archive.

Book Translation Detection Dataset

Zeki Yalniz, R. Manmatha, Kriste Krstovski, David A. Smith

These datasets were created to evaluate the effectiveness of the translation detection frameworks. The 2K dataset was created by Krstovski and Smith (2011), and the list of translation pairs was updated for use in Yalniz and Manmatha (2012). The 30-book dataset was created for use in Yalniz and Manmatha (2012). This dataset is for research purposes only.

Book Duplicate Detection Dataset

Zeki Yalniz, E. F. Can, R. Manmatha

This dataset was created to evaluate the effectiveness of the partial duplicate detection framework for scanned books proposed by Yalniz, Can and Manmatha (2011). This dataset is for research purposes only.

Searcher Frustration User Study Data

Henry Feild

This is a dataset collected during a user study of frustration during web search at the University of Massachusetts Amherst in October 2009. The study consists of query logs and sensor readings for thirty participants.

This is available under an Open Database/Database Content license. Feel free to use, redistribute, and modify the dataset, but make sure to make it available under the same license and to give due attribution in any public use of the dataset.

Word Image Data Sets

Toni Rath

Data sets containing word images from the George Washington collection with meta-data for retrieval performance evaluation. Youtube converter mp3 download music.

Stemming Class from Stemming and Cooccurrence on a Larger Corpus

Jeremy Pickens

Three sets of experiments were done, using initial classes created by (1) the Porter stemmer, (2) K-Stem, and (3) the Porter stemmer classes merged in a connected component manner with the K-Stem classes.

Event Threading Experiment

Nallapati, R., Feng, A., Peng, F., and Allan, J.

This is the experimental data from 'Event Threading within News Topics' in the Proceedings of CIKM 2004 conference, pp. 446-453.

Novelty Track, TREC

Alvaro Bolivar

This is a collection building toolkit to assemble the training set used by the CIIR in its participation in the Novelty track at TREC 2002. For details check conference proceedings.

Software

Digital Media Lab Operations Change Due to COVID-19

The Digital Media Lab has modified its offerings due to COVID-19. You can find information about these specific changes on the services pages. Please contact DML staff with any questions, dml@library.umass.edu.

Software Availability Fall 2020

download free, software Amherst Graduate Programsrubackup

Our computers will not available for the Fall 2020 semester.

Video Editing || Audio Editing || Animation & Motion Graphic || 3D printing || Presentation, Graphics and Design

Video Editing

iMovie is basic video editing software available on Macs.

Final Cut Pro Xis professional video editing software available on Macs.

Quick Guide | user forum

Adobe Premiere Pro CCis professional video editing software available on Macs and PCs.

quick guide | video tutorials | user forum

Movie Maker is basic video editing software available on PCs.

Audio Editing & Composing

Page Top

Audacity is free, open source software used for recording and editing audio.

quick guide | user forum

GarageBand is music editing software that is popular for creating music and podcasts. It has built-in audio filters that can be used for recording instruments and features over 100 virtual, synthesized instruments.

quick guide | user forum

Logic Pro X is a audio editing software application and MIDI sequencer that can be used to create high-end instrumental and voice recordings.

quick guide | user forum

Animation and Motion Graphics

Page Top

Autodesk Maya is 3D animation software for animation, modeling, simulation, rendering, and visual effects.

quick guide | tutorials | FAQ | user forum

After Effects CC is compositing and motion graphics software that is installed on the PCs in the lab.

video tutorials | user forum

Blender is the free and open source 3D creation suite. It supports the entirety of the 3D pipeline—modeling, rigging, animation, simulation, rendering, compositing and motion tracking, even video editing and game creation.

Character Animator CC Create 2D characters in Adobe Photoshop CC and Illustrator CC and bring them to life in Character Animator CC (Beta). Act out movements and record your voice using your webcam and microphone.

Motion is compositing and motion graphics software that ties seamlessly into Final Cut Pro X.

quick guide | user forum

3D Printing

Page Top

TinkerCad is free 3D design software that is web-based. It is great for beginners and has interactive tutorials.

Urbackup Server

MeshMixer is free 3D modeling and analysis software that is often used to check a file's printability.

Sculptris is free 3D digital sculpting software that is great for organic modeling.

SketchUp is free 3D modeling software that is very popular and easy to use.

Autodesk Mudbox A digital painting and sculpting software provides 3D artists with an intuitive and tactile toolset for creating and modifying 3D geometry and textures. Free-to-use for educational use.

Autodesk Fusion 360 is a professional level, free-to-use modeling software that is accessible and feature rich. Free-to-use for educational use.

Presentation, Graphics, & Design

Microsoft PowerPoint is popular presentation software used to create slideshows. Users can import images, sounds, and embed video for enhanced presentations.

quick guide | user forum

Adobe Photoshop CC is graphic design software used for editing and manipulating raster based images, like photos, scans or video. It is installed on both the iMacs and the PCs.

quick guide | video tutorials | FAQ

Adobe Illustrator CC is illustration software that is vector-based. Images can be drawn and edited from scratch using a variety of simple tools that can be easily resized, reshaped and adjusted. It is installed on both the iMacs and the PCs.

quick guide | video tutorials | FAQ

Adobe InDesign CC is desktop publishing software that can be used to create posters, flyers, brochures, magazines, newspapers and books. It is installed on both the iMacs and the PCs.

quick guide | video tutorials | user forum

Digital Media Lab

Contact Us

Email:dml@library.umass.edu
Phone: 413-545-6258

Digital Media Lab
W.E.B. Du Bois Library, 3rd Floor
University of Massachusetts Amherst
154 Hicks Way
Amherst, MA 01003