eXist: the indispensable guide
eXist has powered history.state.gov, my office’s public website, since its launch. I discovered eXist seven years ago, in 2007, and I still use it everyday in my work. I have watched the software grow both in power and ease of use, and I have taught others how to use it in the fields I work in: digital humanities, scholarly publishing, and government. But in both learning and teaching eXist, I have long lamented that eXist lacked a proper book of its own: a well-conceived and executed on-ramp for new users and a comprehensive guide for practitioners at all levels. Adam Retter and Erik Siegel have given us just that in eXist: A NoSQL Document Database and Application Platform, from O’Reilly.
eXist before eXist
In 2007, the stars were in alignment for beginners like me to learn and exploit eXist. That January, the XQuery programming language had achieved full 1.0 status as a W3C Recommendation, signaling that the language was ready for prime time. This high level standard built specifically for querying and manipulating XML largely abstracted away the underpinnings of computer systems, allowing someone with understanding of and need to work with XML to become proficient—even without the computer science background traditionally needed to make computers do one’s bidding. That June, the standard bearer of technology publishing, O’Reilly Press, published Priscilla Walmsley’s comprehensive introduction to XQuery. And XQuery was already available in a number of software packages. Among these, the oldest and most mature package, developed with the needs of the TEI and digital humanities communities in mind, was eXist.
With Walmsley’s manual on the XQuery language on my bookshelf, all I needed was a guide for developing XQuery-driven applications with eXist. I needed an on-ramp, a reference for true beginners like me. eXist came chock-full of technical documentation geared toward seasoned software developers, but it was sorely lacking in tutorials or guides for beginners. Besides an interactive tool for testing simple queries against an XML edition of several Shakespeare plays, the closest thing to a tutorial on creating a full website was the source code driving eXist’s own documentation site. But expecting a beginner to learn to program basic database-driven web apps by reading source code has about as much chance of success as a kindergartener learning arithmetic from a calculus textbook.
Despite lacking a book, I and many others have thrived with eXist, thanks to its vibrant and knowledgable user community. Many of my days at work that first year ended in frustration and a desperate email to the mailing list inquiring about the roadblock I had hit, but by the next morning I had received a reply explaining the solution. Also, building upon our early successes, my colleagues and I secured resources to receive in-person instruction from experts in XML technologies, such as Dan McCreary and C.M. Sperberg-McQueen, as well as the assistance and guidance of eXist’s core developers, including creator Wolfgang Meier, co-founder of eXist Solutions consultancy, which supplies enterprise support and development services. Community members, myself now included, do their best to improve the project’s documentation, write tutorials, lecture, and answer questions on online forums. And eXist’s own documentation and facilities for learning have improved drastically since 2007.
Still, as helpful and vital as these resources were and are, the absence of a book has loomed large. Given a field (in this case, a technology stack) as powerful and potentially complex as eXist, even seasoned practitioners need a reference guide to areas outside their expertise. And beginners need a straightforward text upon which to build the foundation of their knowledge.
The book
Finally, with the publication of Retter and Siegel’s eXist, we have that on-ramp, that practical companion to Walmsley’s XQuery1, to guide you in applying XQuery and XML to develop real-world desktop or web applications, soup-to-nuts. Far from just a beginners guide, its ambitious, comprehensive, even encyclopedic coverage of core through advanced aspects of eXist will earn a lasting space on your bookshelf.
The first chapters walk you through download and installation of the software, offering tips for every major platform eXist supports—Mac, PC, and Linux. It explains how to navigate the built-in documentation and resources, how to get data into and out from eXist, and how to connect eXist to popular tools for XML and XQuery work, such as oXygen. By the end of Chapter 3 (“eXist 101”), you’ll have built a searchable, browse-able website around a collection of Shakespeare plays encoded as XML. These lessons and apply it to their own project as a simple proof of concept. On-ramp? Check.
In the remainder of the book, Retter and Siegel methodically survey all aspects of eXist, offering material of both immediate utility and long term reference value. Far from a dry technical catalog, the authors identify the best practices that have emerged from a broad consensus of eXist users. These chapters can be read out of order, as driven by the reader’s needs during a project’s lifecycle. Essential chapters cover how to use eXist’s various indexes to speed queries, how to craft queries for maximum efficiency, and how to configure the server and troubleshoot problems. Another chapter explains how to use eXist’s unix-inspired permissions system to control user access to resources and code, again with compelling examples based on a publishing workflow with disgruntled employees and semi-trusted external partners. Another provides a sober audit of eXist’s attack surfaces—aspects of the software that need to be given special consideration when moving eXist from a desktop system to a public server on the Internet. Throughout, the book provides better diagrams and more comprehensive descriptions of eXist’s internals than eXist’s own documentation, often filling in the gaps where no documentation existed in the first place. If some of these examples sound esoteric, rest assured that at some point when you are using eXist, you will need to use this information yourself or provide it to someone (e.g., a system administrator) who will. It’s all there, along with pointers to additional resources.
As much as I wish this book had been available when I first used eXist, the book I had hoped for in 2007 would certainly have needed a drastic revision by 2014. Many features in eXist have been dramatically improved in the past several years. These include the addition of: a free browser-based XQuery editor called eXide, a Apache Lucene-backed full text and range indexing system, a tightening and comprehensive upgrade to the security and access control system, two URL rewriting and API design frameworks, the betterFORM and XSLTForms frameworks for building interactive forms, and a data replication system based on ApacheMQ for spreading data across many servers instead of just one (addressing concerns about eXist’s scalability and “single point of failure”). At the same time, eXist has morphed from a rather bloated assortment of community contributions atop the database core, to a streamlined system with modular extensions and a packaging system for libraries and applications. The book treats these additions not as an appendix but as part of the comprehensive introduction and guide to eXist, circa 2014.
Speaking of progress, an important caveat for readers is that the book covers eXist version 2.1, but version 2.2 was released just last month. Unlike some software you may use, whose frequent updates are accompanied by whole number version upgrades, a point release upgrade like this 2.1 to 2.2 is a major event for eXist. The new version remains compatibile with the old one, but the new one offers significant refinements and additions. For example, version 2.2 did away with the “old” web-based server administration tool, which the book refers to in several places for key functions, and replaced it with a new application called “Monex,” which offers these functions in addition to many new features and a slick new interface. One of Retter’s own additions to 2.2, SetUID and SetGID, overcomes a limitation of the security model in 2.1 and earlier mentioned in the book, but the new feature couldn’t be included in the book. The book is clear that such change is to be expected, but readers may be confused or, perhaps worse, unaware of some of the changes that the book could not take into account. In time, assuming the book is sufficiently successful in the marketplace, the authors will surely account for these changes in a new edition. But in the meantime, readers will have to monitor announcements from the eXist developers about these changes. Such is the price of progress in the open source community, where raw enthusiasm and itch, not just corporate priorities, drive rapid evolution.
eXist is a pleasure to read. The authors write in clear, plain English and employ humor judiciously. The book offers insights into how eXist works and how to get things done in eXist available nowhere else. Complete versions of the code introduced in the book are available for free download on GitHub. The code samples are compelling, not perfunctory, and worth downloading and exploring. The index is comprehensive. The electronic edition (DRM-free when purchased direct from O’Reilly) is exceptionally well-produced, with few typos that I’m sure will be fixed shortly—at least in the electronic editions. O’Reilly offers its typical bundle discount for buying the print and electronic editions together, or, if purchased separately, a discount for “upgrading” from one edition to get the other.
The eXist community owes Adam Retter and Erik Siegel, as well as the book’s contributors and reviewers a huge debt of gratitude. Writing and producing it surely took thousands of hours of labor (not including the decades of combined experience that shaped the contributors’ perspectives), even though authors of technical publications know their efforts aren’t likely to yield direct returns close to the time invested. Perhaps this book will be different. The book should be assigned in classrooms and digital humanities workshops. It is essential for anyone looking to learn eXist. For many young software developers and scholars, eXist offers a unique and compelling set of capabilities, and this book will help them harness its power to build great things. I look forward to the next seven years of eXist.
-
Retter and Siegel are clear that their book is not an introduction to XQuery. They recommend that all eXist users still buy Priscilla Walmsley’s XQuery as an introduction to the language—and I second this recomendation. A second edition of XQuery is scheduled to be released in 2015, with coverage of the 3.0 (hopefully even 3.1) version of the language. ↩