(This is a talk I gave as part of the Global Philology Project’s Named Entities workshop in Leipzig in January 2017.)
As some of you are already well aware, Syriaca.org is a suite of digital reference works for the study of cultural heritage linked to the Syriac language. I am the lead co-editor of the geographic module, The Syriac Gazetteer, and I contributed to the early stages of the database of authors. In this presentation I will give an overview of Syriaca.org, some methodological difficulties we have encountered, and our attempts to overcome those. My purpose is first to solicit ideas from you for overcoming some of these difficulties, and secondly to suggest that these challenges are not accidental, but inherent in attempts to identify named concepts, whether digitally or using other means. Names are not qualities of things, but instead naming is a linguistic strategy always situated in particular contexts, discursive, social, and historical. Therefore naming must always be interpreted in particular contexts. There can be no mathematical identity relation over the domain of named historical concepts, and naming must be seen as an action rather than as a status. This is not a weakness of names for human and scholarly purposes. Rather, this reflects the cultural richness, flexibility, and strength of naming as a discursive strategy which communicates, in particular contexts, more meaning than merely identifying what is referenced.
But first, let me introduce you to Syriaca.org. Syriac is a Middle Eastern language, a dialect of Aramaic. Originally spoken in upper Mesopotamia, it became a language of merchants and churchmen from Palestine to Iran, with outposts in India, Central Asia, and even China. Mongolian, Uyghur, and Sogdian alphabets were derived from the Syriac script. Syriac is still used today as a religious language in Turkey, Syria, Iraq, India, and the diaspora, although those uses are endangered by diasporic acculturation, language politics, and the violence in the contemporary Middle East. Despite this long and illustrious career, however, Syriac entered western academic study as an adjunct to biblical studies, classical philology, and comparative Semitics. More recently, it has received increasing attention from ecclesiastical historians, early Islamic studies, late antique historians, and anthropologists.
Syriaca.org is a suite of reference works pertaining to Syriac cultural heritage and the modern academic study of Syriac history. It is a collaborative project, with an editorial board and contributions from dozens of scholars. Syriaca.org started from the observation that the reference works that exist for Syriac, apart from lexica, were almost all authored by individual German orientalists so long ago that they have lapsed from copyright, and no single scholar alive today has the breadth of knowledge to replace these. Our goal was to create a structure for recording information about core entities (authors, saints, places, and texts), which could be incrementally expanded by contributions from various scholars in the field. Following the example of the Pleiades gazetteer, we minted Uniform Resource Identifiers (URIs) for people, places, written works, and manuscripts related to Syriac. We have interpreted “relevance” broadly, including anything mentioned in a Syriac text, anything mentioned in scholarship which discusses Syriac, and all places where such scholarship is produced, over the last two millennia. And once we have identified an entity as related to Syriac, we are interested in all information about it, not only information from Syriac sources. So we have information from Greek, Latin, and Arabic sources, and we are very interested in additional contributions, for example, from Hebrew or Chinese.
Syriaca.org is divided into modules, one for each type of entity. The geographical database was published first, in 2014, in part because places seemed somewhat more straightforward than the complexities of works and authorship. In 2016, the Syriac Biographical Dictionary was published as the master collection of all person records in our database, with sub-volumes for saints and authors. As the first textual reference work, a database of saints’ lives was published alongside the saints’ module. The BHSE is the first sub-volume of the New Handbook of Syriac Literature, designed to be a reference guide for all Syriac literature. Our scholarly model requires citations of authorities and provenance of assertions, so Syriaca.org’s bibliographic module maintains records and URIs for all citations. Two other modules are in active development: one is a union catalog of Syriac manuscripts, and the other uses a factoid approach to encode assertions about named entities in a text.
These different modules are bound together not merely by their connections to the Syriac language and the human relationships of the team members. They also share a set of methodological approaches. At first, we began with the naïve assumption that it would be relatively straightforward to compile lists of persons, places, and works, and to assign URIs to each. We quickly encountered definitional issues, no doubt parallel to those many of you have faced in your respective domains. Out of these difficulties and discussions, we reached certain technical and theoretical decisions which characterize Syriaca.org as a whole. Thus, each module of Syriaca.org publishes its information in TEI XML, with a future goal to serialize it in RDF. Each module links to URIs minted by other projects, such as the Virtual International Authority File for persons, and for geography, Pleiades and DBpedia, to facilitate data discovery and disambiguation. There is also a theoretical approach common to most Syriaca modules, that the entities which are the subjects of Syriaca records are typically named concepts rather than the physical entities that might once have existed. Thus the URI http://syriaca.org/place/139 identifies not directly or solely the city of Mosul, but also or even primarily the widely shared concepts that people have had of that city. http://syriaca.org/person/13 refers not only to the historical Syriac poet Ephrem, but also to the wide range of ideas later people have developed about him.
We came to this named concept approach due to the confluence of difficulties defining each variety of entities, and the limitations of a geographic approach characterized exclusively by Geographic Information Systems, GIS. One of the earliest definitional difficulties, which arose in sharpest form for the Gazetteer, was the question of what defines a place, or what distinguishes one place from another. GIS is a powerful computational tool for analyzing space, but it relies upon an extensive set of measurements of known precision, and therefore it works best for modern applications and for archaeological sites. It works less well for applications with mixed levels of precision and unavailable measurements, and not at all for places which cannot be located. As others here have pointed out, the Cartesian approach to space required by GIS does not match the ways in which historical texts refer to places. Texts name places, they do not delineate their boundaries. So as editors of the Syriac Gazetteer, we came to be interested in the concepts named in texts rather than the precise delineation of location. In this we followed the “un-GIS” approach, so-called by Tom Elliott and Sean Gillies, for the development of Pleiades. A similar move was necessary for dealing with records of Syriac saints and (to a lesser degree) authors, whose named concepts do not always correspond precisely to historical persons, such as in the case of pseudepigraphy or developing legends. Perhaps this approach is most significant when dealing with literary works, in which case the shared scholarly construct of “the work” unifies debates about the precise readings of particular manuscripts, quotations, and editions. Yet this “named concept” approach raised an additional series of challenges, many of which we have not resolved to our entire satisfaction, which I will spend the rest of my presentation discussing.
The first difficulty posed by the notion of named entities or named concepts is that bare names are never sufficient to specify the referent. The problem of homonymy is familiar to all: different places, and even more commonly different people, often share names. Even famous names that one might assume to be unique, such as Baghdad, have come to be shared: there are towns named Baghdad in Iran, Afghanistan, Pakistan, and Australia, in addition to provinces in the Ottoman Empire and the modern Republic of Iraq named after the capital city. Among persons, this can be even more trying. Syriaca.org has a record for a deacon of Edessa named Ḥabīb, floruit 6th C (http://syriaca.org/person/43). The US Library of Congress has an entry for “Abibus of Edessa,” but since this Abibus died in 322, these two people are clearly distinct, despite their shared name. This homonymy is what drove Syriaca.org to assign URIs in the first place, to enable disambiguation, so that one can specify which person named Ḥabīb or which place named Baghdad is intended. Syriaca.org further introduced place types to distinguish cities and regions which share a name, as well as between regions, secular administrative provinces, and ecclesiastical administrative dioceses, which were usually conceptually parallel but whose boundaries could develop independently.
Yet homonymy is not merely a technical challenge to unambiguous identification. The phenomenon of shared names raises deeper questions about how reference and disambiguation work. The fact that entities often share names implies that to identify which entity is named in a text, or in another database, requires more than the name, even the “complete” name. It typically requires a human reader who is aware of the context of the passage, and of the text as a whole. Indeed, in some cases the context is insufficient even for an expert reader to identify which entity is referred to. Ancient and medieval authors often did not distinguish sharply between a large city and its surrounding region, or between a region as an extent of ground and the administrative structures which claimed jurisdiction over that extent. Nor is this a problem exclusive to pre-digital texts, because despite claims to the contrary, URIs never fully replace ambiguous natural language names. The intrinsic arbitrariness of URIs means that humans must rely upon other features of the record, other names, to identify what the URI refers to, and even modern databases sometimes have insufficient information for disambiguation. To return to the curious case of Ḥabīb of Edessa, the Deutsche Nationalbibliotek has an entry for Abibus Edessenus, with no date. Is this “Abibus” the same as that of the Library of Congress, or the one referred to by Syriaca.org? Should we link our URI for Ḥabīb of Edessa with the German library’s Abibus Edessenus, or not? Such ambiguities may never be resolved, or even resolvable, and will affect all attempts to link data about named entities. We need some means to indicate uncertainties and ambiguities in all assertions of identity, even those involving URIs, because of the inherent ambiguity and uncertainty of our source texts.
Another challenge is that the boundaries between concepts are not stable over time. For example, places may merge. The city of Mosul (http://syriaca.org/place/139), newly founded by early Muslims, was initially distinct from the older town of Nineveh (http://syriaca.org/place/144) across the Tigris River. But at a later period Mosul engulfed Nineveh. They were two distinct places at one time and became identical at another. As a different example, sometimes events in the life of a settlement caused the entire population to move a short or long distance while retaining the place name, as when the Mamluks moved Tripoli (http://syriaca.org/place/203) several kilometers inland after conquering it in the thirteenth century. Syriaca.org mints distinct URIs for places that were originally distinct and then merged, which raises the question which URI should be used for the later combined place. Syriaca.org does not typically mint multiple URIs for places whose location changed by a small distance, if they retained their original name, but how might one distinguish between the old site and the new? We face the problem of places splitting as well as merging over time. This problem of mismatched conceptual boundaries is even more acute across languages. Is the region known as Beth Nahrīn in Syriac the same place as the regions termed Mesopotamia in Greek, or al-Jazīra in Arabic? While all three names refer to the area between the Tigris and the Euphrates, the northern and southern boundaries delimited by those names do not line up exactly. Syriac authors often used the term “Nineveh” for Mosul throughout the medieval period, although Arabic authors never did.
The same failure to maintain boundaries between conceptual places also happens with conceptual people. In the Syriac tradition, the works of three authors named Isḥāq came to be amalgamated into a single corpus, and Isḥāq of Amid (http://syriaca.org/person/17), Isḥāq of Antioch (http://syriaca.org/person/33), and Isḥāq of Edessa (http://syriaca.org/person/34) all came to be known under the single designation, Isḥāq the Syrian. Translation issues also arise with concepts of persons: when a text asserts that the semi-legendary St. George and the legendary Muslim al-Khiḍr are the same, while another text asserts that they are different, what are we to consider the named concept to which we assign a URI? Should we include information about al-Khiḍr in the record on St. George, or assign separate URIs to both and record that some texts assert their identity?
The approach of “named concepts” also runs into the problem that scholars are often interested in entities without names. We know of a convent for nuns in the northern Iraqi city of Balad, but we do not know what it was called. But since nunneries were very rare in the Syriac Middle East, the existence of this institution is significant, even in its anonymity. Similarly, many texts allude to individuals who played key roles in historical events, but whose names were not recorded. Anonymous persons and places can be significant. Yet the number of anonymous individuals referred to in texts can quickly grow into an unwieldy commitment. So Syriaca.org has decided that we need to be able to mint URIs for select unnamed persons and places, but we have not yet formulated any policy for distinguishing which are important enough to merit inclusion in our databases of named concepts. Our default approach has been to evaluate this on an individual basis, depending on the utility of a particular encoding.
One final challenge confronting this approach is that named concepts are not just objectively “out there” waiting to be captured, but are often themselves byproducts of scholarly investigation. The degree of specificity of an identification often depends upon the purpose for which the identification is made. In scholarship on persons, some scholars need to consider both the historical person and how she or he was subsequently regarded, while other scholars are interested only in one or the other. For example, the fourth-century poet Ephrem, never himself a monk, became a monk within a century after his death, but only after coenobitic monasticism was introduced into his region of upper Mesopotamia. “The monk Ephrem” is a significant figure for understanding the Nachleben of Ephrem’s poetry in the Syriac churches, and yet it is a deeply misleading figure for understanding those same texts in the milieu in which they were composed. This is common for places as well. The Turkish city of Malatya today is a modern displacement of the older classical and medieval city of Malaṭya, Greek Melitene. The former site of the city, long known as Eski (“Old”) Malatya, is 11 km north of today’s city center. For many scholarly investigations, the fact that Melitene/Malaṭya is on the upper Euphrates is sufficient, but for precise questions regarding developing urban structures, for example, it is important to look at the correct site, rather than modern Malatya. The fact that degree of precision is dependent upon the intended use of the data has meant that Syriaca.org has not attempted to resolve every case like Malaṭya into separate URIs for old and new. Indeed, our inability to predict all future scholarly questions makes even an attempt in that direction futile. Database entries for named concepts are always pragmatic, provisional, and open to further elucidation in the service of future scholarly investigations. But this very openness and intentionality raises questions regarding interoperability and mutual recognition.
The issue underlying all of these challenges is the issue of determining when two historical sources refer to “the same” concept, or by extension when two URIs identify “the same” concept. There can be no mathematical identity relation for concepts. Mathematical identity relations are very useful, guaranteeing commutativity (if A = B, then B = A), and even more importantly transitivity (if A = B and B = C, then A = C as well). But mathematical equations like these are context-free assertions, true always, everywhere, and for everyone. But we have discovered that the identification of named entities is always reliant upon the contexts, discursive, social, and historical, not only upon the names themselves. The truth or falsity of “Malaṭya = Melitene” depends on time, language, and purpose, as much as it depends upon the names involved. All solutions to these issues are therefore also likely to be pragmatic, provisional, contextual, and dependent upon the purposes for which they were developed. Syriaca.org has not yet determined how best to resolve them for our purposes, but we warmly welcome your suggestions and feedback.