2013 CI Days
The second annual Michigan State University Cyberinfrastructure (CI) Days event was held on October 24-25, 2013.
CI Days presents an opportunity for faculty and students to understand the benefits that cyberinfrastructure can bring their scholarly pursuits, to see what others are doing with cyberinfrastructure, and to learn what resources are available on campus, across institutions, and nationally.
The event featured:
- Nationally renowned research leaders from multiple disciplines of study addressing how advanced technologies enable their scholarly work;
- Posters showcasing CI-enabled research at MSU; and
- Resource fair featuring CI resources available to MSU researchers.
What is CI?
The National Science Foundation defines CI as a collection of advanced technologies and services to support scientific inquiry. This includes:
- Computing clusters and high performance computing systems;
- Data management, data integration, and data storage systems;
- High speed networks;
- Data mining and data visualization;
- Collaboration and security tools; and
- The people who design, build, and run these systems.
CI Days is coordinated by the Institute for Cyber Enabled Research (iCER) and IT Services Research Support. CI Days is jointly sponsored by the Vice President for Research and Graduate Studies and the Acting CIO and Director for Information Technology. CI Days is based on an initiative originally supported by the National Science Foundation in 2010 to support the implementation and use of cyberinfrastructure at research institutions.
Python for Research Computing
Computing is, without a doubt, now an integral part of every research discipline. This is both an advantage and a hindrance: an advantage because more, higher quality, work is being done. A hindrance because many researchers feel somewhat inadequate in the face of the required programming such research entails.
Python is a programming language that is now being used widely in research computing because:
- it is relatively straight forward to learn and understand
- once learned is very useful. People who have learned Python can be useful in their discipline
- is widely supported in many disciplines.
This tutorial will go over the basic concepts of Python, show some examples of how it is useful and point folks to resources to do further research.
Dr. Bill Punch is director of the MSU High Performance Computing Center and co-director of the Genetic Algorithms Research and Applications Group or GARAGe. His main interests are genetic algorithms and genetic programming, including theoretical issues (parallel GA/GP) and application issues (design, layout, scheduling, etc.). He also has active research in data mining, mostly focused on intelligent search approaches based on pattern-recognition techniques and GA/GP search.
Crash Course in High Performance Computing
This two hour tutorial will give new users a "crash course" to get started using the MSU High Performance Computing Center (HPCC). This tutorial will give brief overview of key topics, from getting an account to running your first job on HPCC's advanced computing systems. Participants will do hands-on examples including:
- Apply for an account*
- Install needed software (SSH, SCP, X11)
- Transfer input files and source code
- Compile/test programs on a developer node
- Write a submission script
- Submit the job
- Access job results
This session is focused on the basic knowledge needed to start using the HPCC resources, but may have been uncertain about how to get started. Current HPCC users are also welcomed to attend.
Please bring your own laptops.
*It is strongly recommended that you create an account prior to this tutorial. To create an account, have an MSU Faculty or Staff Member go to the following page and fill out the request form https://contact.icer.msu.edu/account;please allow 2 work days for processing the new account. Temporary accounts will be available during the tutorial for users who do not have an account or need help applying for an account.
Dr. Dirk Colbry joined Michigan State University in 2009 as a research specialist within the Institute for Cyber Enabled Research. At iCER, Dr. Colbry helps the MSU community utilize Computational Infrastructure in research, through classroom instruction, one-on-one consulting and research collaboration.
An alumnus of MSU, Colbry has a Ph.D. in Computer Science and his principle areas of research include machine vision and pattern recognition (specializing in scientific imaging), and high performance computing. Dr. Colbry collaborates with scientists from multiple disciplines including, Engineering, Zoology, Mathematics, Statistics and Biology. Recent projects include research in Image Phenomics; developing a commercially-viable 3D face verification system; adapting pattern recognition processes for tire engineering; and exploring uses of face processing to help individuals who are blind in social interactions. Dr. Colbry has taught a range of courses in computer science, including microprocessors, artificial intelligence, compilers, and courses in programming and algorithm analysis.
Statistical Tools for Research at MSU
Modern research is a complex process that has become highly dependent on specialized software that allows researchers to accomplish a number of important tasks. This presentation will review the different steps in the research process beginning with study design and proceeding through data collection, processing, analysis, modeling, and preparation of scientific reports. At each step, there are a number of software resources available to researchers at MSU. These resources will be identified and discussed. A summary of these resources will be provided to participants.
In this session, you will learn how MATLAB can be used to visualize and analyze data, perform numerical computations, and develop algorithms. Through live demonstrations and examples, you will see how MATLAB can help you become more effective in your coursework as well as in research. Some of the highlights include:
• Accessing data from many sources (files, other software, hardware, etc.)
• Using interactive tools for iterative exploration, design, and problem solving
• Automating and capturing your work in easy-to-write scripts and programs
• Sharing your results with others by automatically creating reports
The The target audience for the workshop is graduate students, faculty, and academic staff at MSU interested in using statistical programs available to the MSU community. There are no prerequisites for participants.
Dr. Brian A. Maurer, Center for Statistical Training and Consulting (CSTAT), Michigan State University is Director of the Center for Statistical Training and Consulting and Professor in the Department of Fisheries and Wildlife at Michigan State University. He has an MS in Statistics and a PhD in Wildlife Ecology for University of Arizona. His research has focused on analyzing large geographic data sets on animal and plant populations to understand patterns and processes underlying biological diversity.
Mehernaz Savai holds a Bachelors in Electrical Engineering from Pune University, India and Masters in Aerospace Engineering from Purdue University, USA. Her research involved optimizing air traffic sectors for dynamic airspace configuration. After graduating she joined the MathWorks Engineering Development Group as an Application Support Engineer in 2011 and moved to Application Engineering in 2013.
Morning Keynote: Months to Minutes: Moving the Craft of IT to the Clouds
The history of university IT has been to adapt technology built for other industries and customize it to work well within each of our unique campuses. Technology staff have spend countless hours, weeks, and months creating secure functional solutions for key campuses areas, often at great expense both in development as well as ongoing management. This discussion will focus on changes needed in higher education IT to support adoption of cloud services while minimizing risk and does higher education IT need to change to take advantage of the benefits of cloud computing while meeting the unique needs of each campus.
Dr. Shelton Waggener
Shelton Waggener is the Senior Vice President of Internet2 responsible for the NET+ portfolio of services including commercial and community cloud services, middleware development and Trusted Identity in Education initiatives. Shelton leads Internet2’s pursuit of emerging cloud and advanced services above the network involving demand aggregation and brokering with campus and commercial providers, and distributed offerings among its members. Internet2 NET+ includes compute, storage, platform and software services, the integration of identity management and federation with key applications for the research and education community, deployment of collaboration infrastructure among members, including video, coordination and sharing tools, content management, and domain applications.
Afternoon Keynote: The Promise of Digital Humanities and the Future of the Liberal Arts
This talk explores the promise of the digital humanities in the larger context of the liberal arts in higher education. It argues that we are in a propitious period for the liberal arts broadly despite the narrative of decline in the humanities described most prominently in contemporary journalism. Numerous recent reports have addressed the state of the humanities, including "The Teaching of the Arts and Humanities at Harvard College: Mapping the Future" and in the American Academy's "Heart of the Matter" report. Despite the accomplishments of a twenty-year surge in the digital humanities, we need now to consider ways to organize our curriculum and liberal arts institutions for the digital age. This talk will examine alternatives for broader definitions of scholarship, teaching, and service in the liberal arts and challenge the myth of the decline of the humanities.
Dr. William G. Thomas III is the John and Catherine Angle Professor in the Humanities and Professor of History at the University of Nebraska. He currently serves as the Chair of the Department of History. He research has been funded by the National Endowment for the Humanities and the American Council of Learned Societies. He is a co-editor with Edward L. Ayers, Anne S. Rubin, and Andrew Torget of the award-winning Valley of the Shadow: Two Communities in the American Civil War. His most recent book is The Iron Way: Railroads, the Civil War, and the Making of Modern America (Yale University Press, 2011). With Patrick D. Jones he leads The History Harvest, a partner project of the Digital Public Library of America, that connects undergraduate students with communities to digitize family history. He is currently writing a history of the black and white families of early Washington, D.C. and the problem of slavery and freedom in post-Revolutionary America.
Morning Breakout Sessions
Amazon Web Services for Research and Scientific Computing
The AWS Education Team will present to attendees a background on Amazon Web Services and a technical deep dive on how researchers and Universities are moving to Amazon’s cloud for scientific computing and research. The AWS presentation will explore the Global Infrastructure AWS manages and the available services. AWS will discuss performing scientific computing on AWS for genomics and bioinformatics and highlight AWS best practices for using Galaxy, Globus Genomics and MIT StarCluster as examples. The presentation will conclude with a live HPC demonstration and time for Q&A. Attendees do not need any previous AWS experience and will leave with a better understanding of AWS’ core Infrastructure Services and how they can apply them to scientific computing and research.
Angel Pizarro: Angel Pizarro is a Senior Solutions Architect at AWS, with 14 years of experience in bioinformatics, high-throughput genomics research, and scientific computing.
KD Singh: KD Singh is a Senior Solutions Architect in the Amazon Web Services (AWS) Worldwide Public Sector team. He provides thought leadership, technical design, education, training, and architects cloud-based solutions for public sector customers including state and local governments, education (universities and K-12 schools), and other public sector agencies. He is also the subject matter expert for High Performance Computing (HPC) and Big Data Analytics domains. He has over 15 years of experience in software and hardware research, architecture, design, development and related project management.
Steve Elliott: Steve is a Senior Account Manager for the AWS Education team, working with multiple Universities on migrating to Amazon’s cloud and focusing on scientific computing and research. Steve is a Geographer by education, graduating from Ohio Wesleyan University and brings over 10 years of experience with enterprise and platform technologies, ranging from Geographic Information Systems, Server Based Computing and Cloud
Using the MSU Code Repository for Collaborative Research
Research can be difficult, but sharing research and collaborating with others on research can be even more difficult. Furthermore, how do you track who did what? What happens if the data is lost? What if you want to work off-campus or off-line? The new MSU Code Repository service helps to solve these problems by using the cross-platform git distributed version control system as its backbone and a user-friendly web interface to lower the barrier for entry to collaborating with others. This presentation will show how researchers, academics and developers can use this service to make collaboration easier.
Troy Murray, a member of the MSU family since 2000, has successfully developed applications within the healthcare, research and education fields. He is a self-professed neophyte, always working to solve problems and make things work better with new technologies that he's learning. His goal is to tip over existing silos within the MSU community that isolate people, and to bring people together to work more collaboratively. When he's not melding with his Mac he enjoys his most challenging new project, fatherhood.
Power up your Research with XSEDE as a Resource and Service for Researchers
XSEDE (Extreme Science and Engineering Discovery Environment) is a single virtual system that scientists can use to interactively share computing resources, data and expertise. People around the world use these resources and services — things like supercomputers, collections of data and new tools — to improve our planet.
Dr. Dan Stanzione, Jr. is the deputy director of the Texas Advanced Computing Center (TACC) at The University of Texas at Austin. He is the Co-Director of The iPlant Collaborative, an ambitious endeavor to build cyberinfrastructure address the grand challenges of plant science. He is also a Co-investigator for TACC’s 10 PetaFlop Stampede supercomputer, and has previously been involved on the deployment and operation of the Ranger and Lonestar supercomputers at TACC. Prior to joining TACC, Dr. Stanzione was the founding director of the Fulton High Performance Computing Initiative at Arizona State University. Before ASU, he served as an AAAS Science Policy Fellow in the Division of Graduate Education NSF and as research professor at Clemson University, his alma mater.
Research-enabling tools and technologies: Challenges and experiences from a MSU PI perspective
Moderated by Shawn Nicholson, Assistant Director for Digital Information, MSU Libraries
Dr. Robert Geier, Create for STEM.
Dr. Bradley Marks, Department of Biosystems and Agricultural Engineering.
Dr. Emilee Rader, Department of Telecommunication, Information Studies and Media.
Dr. Patricia Soranno, Department of Fisheries and Wildlife
Afternoon Breakout Sessions
Virtual Computing Laboratory (VCL) Technology
The concept of cloud computing has become a popular term to describe a flexible system that provides users with access to hardware, software, applications and services. Because there is no one generic user and the hardware, software, and services may be grouped in various combinations, this cloud computing concept quickly fractures into many individualized descriptions and perspectives. Regardless of the particular details, a cloud computing installation should provide every user with the power to seamlessly provision the hardware, operating systems, and application software across a network to provide a rich set of customizable information technology services. Whether a cloud computing implementation is applied toward education/research or commercial use, a cloud computing system should be designed around a service-oriented architecture. This design should be able to allocate resources on-demand in a location and device independent way, incorporate technical efficiency and scalability through relative centralization of infrastructure, efficiently manage cloud services, and allow either explicit or implicit self-provisioning by users to reduce administration overhead.
This presentation discusses key characteristics of a cloud environment suitable for a research intensive academic environment/institution, and provides production level examples from the NC State University private VCL cloud environment. The talk also discusses proof-of-concept implementations of very complex next-generation services - such as complex equipment and space virtualization (e.g., high-end microscopes and biomanufacturing laboratories).
Dr. Mladen Vouk
Mladen A. Vouk received Ph.D. from the King's College,University of London, U.K. He is Department Head and Professor of Computer Science, and Associate Vice Provost for Information Technology at N.C. State University, Raleigh, N.C., U.S.A.
Dr. Vouk has extensive experience in both commercial software production and academic computing. He is the author/co-author of over 300 publications. His research and development interests include software engineering(process and risk management, testing, reliability, fault-tolerance, security), scientific and cloud computing(management of scientific data and knowledge, application of engineering methods to genetics, bioinformatics, biophysics, development of numerical and scientific software-based systems, "cloud computing," parallel and grid computing, support for scientific problem-solving workflows, middleware), information technology (IT) assisted education (IT workforce, network-based education, distance learning, education workflows), andhigh-performance networks and computing (end-to-end quality of service, security, forward error correction, empirical evaluation of networking solutions, "computational clouds").
Dr. Vouk has taught courses and tutorials in cloud computing, scientific workflow management, software engineering, software testing, software reliability and fault-tolerance, software process and risk management, networking, data structures, operating systems, numerical software, and programming languages.
Dr. Vouk is the co-founder of the NC State's cloud computing solution called Virtual Computing Laboratory. Dr. Vouk is closely associated with the Computer ScienceComputer-Based Education Laboratory, and theUndergraduate and Graduate Networking Laboratories. He is the co-founder, former co-director and current member of the Computer Science Software Systems and Engineering Laboratory. He is the founder, former director, and current member of the NC State Multimedia and Networking Laboratory. He is a member and former Technical Director of the N.C. State Center for Advanced Computing and Communication. Dr. Vouk is an associate graduate faculty member in the Department of Electrical and Computer Engineering at NC State, he is an Affiliated Faculty in theBioMedical Engineering Department, he is a member of NC State Information Security Faculty, NC State GenomicsFaculty, NC State Bioinformatics Program Faculty, and NC State Operations Research Program Faculty.
Dr. Vouk is a member, former chairman, and former secretary of the IFIP Working Group 2.5 on Numerical Software, and a recipient of the IFIP Silver Core award. He is an IEEE Fellow, recipient of the IEEE Distinguished Service Award and the IEEE Gold Core Award, and a member of IEEE Reliability, Communications, Computer, and Education Societies, and of the IEEE TC on Software Engineering. He is a member of ASEE, a senior member ofASQ, and a member of ACM, and Sigma Xi. For over 10 years, he was an associate editor of IEEE Transactions on Reliability. He is a member of the Editorial Board for theJournal of Computing and Information Technology.
Research Data Management-as-a-Service with Globus Online
Managing massive volumes of data throughout their lifecycle is rapidly becoming an inhibitor to research progress, due in part to the complex and costly IT infrastructure required – infrastructure that is typically out of reach for the hundreds of thousands of small and medium labs that conduct the bulk of scientific research.
Globus Online is a powerful system that aims to provide easy-to-use services and tools for research data management – as simple as the cloud-hosted Netflix for streaming movies, or Gmail for e-mail – and make advanced IT capabilities available to any researcher with access to a Web browser. Globus Online provides software-as-a-service (SaaS) for research data management, including data movement, storage, sharing, and publication.
We will describe how researchers can deal with data management challenges in a simple and robust manner. Globus Online makes large-scale data transfer and synchronization easy by providing a reliable, secure, and highly-monitored environment with powerful and intuitive interfaces. Globus also provides federated identity and group management capabilities for integrating Globus services into campus systems, research portals, and scientific workflows.
New functionality includes data sharing, simplifying collaborations within labs or around the world. Tools specifically built for IT administrators on campuses and computing facilities give additional features, controls, and visibility into users’ needs and usage patterns.
Dr. Steve Tuecke is Deputy Director at The University of Chicago's Computation Institute (CI), where he is responsible for leading and contributing to projects in computational science, high-performance and distributed computing, and biomedical informatics.
Prior to CI, Steve was co-founder, Chief Technology Officer, and on the board of Univa Corporation from 2004-2008, and also served as Univa's first Chief Executive Officer. Univa provides open source and proprietary software for the high-performance computing and cloud computing markets. Steve helped lead Univa through several new product launches, multiple venture capital investment rounds, and the acquisition of United Devices. He continues to serve on Univa's board and as CTO advisor.
Prior to Univa, Steve co-founded the Globus Project, with Dr. Ian Foster and Dr. Carl Kesselman. He was responsible for managing the architecture, design, and development of Globus software, as well as the Grid and Web Services standards that underlie it.
He began his career in 1990 as a software engineer for Foster in the Mathematics and Computer Science division at Argonne National Laboratory. In 1995, Tuecke helped create the Distributed Systems Laboratory at Argonne which, under his management and technology leadership, became the premier Grid research and development group in the world. In 2001, Tuecke focused on Globus architecture and design, creating Grid and Web Services standards, and expanding corporate relationships.
In 2002, Tuecke received Technology Review magazine's TR100 award, which recognized him as one of the world's top 100 young innovators. The same year, he also was named to Crain's Chicago Business "Forty Under 40" and described as one of the Chicago area's "best and brightest." In 2003, he was named (with Foster and Kesselman), by InfoWorld magazine as one of its Top 10 Technology Innovators of the year.
Tuecke graduated summa cum laude with a B.A in mathematics and computer science from St. Olaf College.
Image Phenomics and Animal Behavior
The explosion of image data afforded by digital technologies creates important opportunities and challenges for biology. The opportunities arise from the ability to quantify biological patterns that provide insights into the biological processes that generate these phenotypes. More images mean more
data and hence greater ability to discern variation and to correlate it with other relevant information about the phenotype. Unfortunately, as easy as digital images are to acquire, the sheer quantity of data creates a major bottleneck for analysis. Lacking adequate tools, biologists commonly use manual techniques (including hired students) to annotate images, a slow process subject to variations in quality, detail, and inter-observer bias. It is common for behavioral researchers to accumulate hundreds of hours of video, but ultimately to use only a fraction of it for analysis. Thus there is a critical need for flexible computational tools that can assist researchers in automating the analysis of large datasets of digital images.
A variety of tools have been developed for filtering or for low-level detection and quantification of image features, but they fall short of solving the core problem that many biologists face: measuring and understanding the structure of variation in images. This problem has been tackled for specific target systems involving either static images or video. These solutions typically rely upon a prioricomputational models of the target system, models that may be incomplete and in any case are hard to adapt for other biological systems. A further difficulty is that biologists typically do not have the
expertise to develop such systems.
To tackle these various challenges, we are exploring a novel integration of biological, mathematical, statistical and computational tools applied to biological images. The approach is data-driven, using image data to build the models of phenotypic patterns rather than using models to determine which features are selected. Our goal is then to use cutting-edge mathematical concepts to develop biologically-informed metrics suitable for describing and explaining patterns of variation in phenotypes. Overall, our goal is to develop an integrated computational framework in which the researcher is “in the loop,” providing feedback for computational tools that both speed up automation and model patterns of variation that may elude the human eye.
Dr. Fred Dyer is a Professor of Zoology at Michigan State University and is currently Chairperson of the Zoology Department. He received his Ph.D. from Princeton University in 1984, and was a post-doctoral research associate at Yale University before coming to MSU in 1986. His research focuses on animal behavior. A major theme in his work is to understand the evolution and survival value of complex behavior. He was one of the original members of MSU's interdepartmental graduate program in Ecology, Evolutionary Biology, and Behavior, and evolution is a major emphasis of his teaching at the graduate and undergraduate levels. He has had extensive experience living and working in the tropics, including Guatemala, Peru, India, and Thailand.