THE UNCENSORED VERSION
Being a conversation between dragonmaster Gabriel Bodard, and your beloved DataNinjas, about the SNAP:DRGN project and Social Network Analysis
Cross-posted to SNAP:DRGN
Gabriel Bodard: So, damsels, given my newbie SNA state, tell me: what is Social Network Analysis, and how is it useful for prosopography projects?
Silke Vanbeselaere: Social Network Analysis (SNA) is basically the study of relationships between people through network theory. First used in sociology, it’s now become popular in many other disciplines, with a budding group of enthusiasts (*exuberant roar*) in (ancient) history.
What it does, is focus on relations (of whatever kind) instead of on the actors individually. Through visualisation of the network graph and the network statistics, information can be obtained about the structure of the network and the roles of the individuals in it.
The visualisation of these network graphs can be especially interesting for prosopography projects as it can help disambiguate people. Individuals are represented by nodes and their relationships are represented by ties or links between those nodes. Instead of dealing with one source at a time, the network allows you to see the whole of the relationships.
GB: Can you perhaps illustrate that with an example and how it could help us?
Yanne Broux: Off the top of my head: one of the things the extremely nifty disambiguation methods we developed could help you out with is the identification of high-ranked Roman officials across the different datasets. Consuls were often mentioned in dating formulas, and procurators, proconsuls, legati and the like were pretty mobile, so chances are they appear in texts across the empire. They’ll light up like Christmas trees once we shape them into a network.
What is the SNAP:DRGN project about, Gabby, and what sort of prosopographical data (especially relationships) does it contain?
GB: I’m glad you asked me that (YB: dear reader, have you noticed the oh so true spontaneity of this conversation by now? *sniggers*). SNAP:DRGN stands for “Standards for Networking Ancient Prosopographies: Data and Relations in Greco-Roman Names” (not an artificial backronym at all!). Very briefly, the aim of the project is to bring together person-records from as many online prosopographies of the ancient world as possible, using linked data to record only the most basic information (person identifiers, names, citations, date, place and hopefully relationships with other persons). We only plan to store this very summary data, along with links back to the richer records in the contributing data source, and enable annotation on top of that.
SV: Ok, and what will people be able to do with this limited data, then?
GB: In particular, scholars will be able to (1) join together records originally from different databases that clearly refer to the same person; (2) point out relationships between persons, e.g. person XYZ from this database is the daughter of person ABC from that one; and (3) annotate their own texts (archaeological or library records, etc.) to disambiguate a personal name using SNAP as an authority list.
At the moment there are relatively few co-references between the datasets, i.e people who appear in more than one database, in SNAP (although there will be plenty between the library catalogues), and the only explicitly encoded personal relationships are the ones imported from the Trismegistos database (yes, you rule), but we’re working to improve both of these things. How amazing does that sound to you?
YB: Well, Gabby, I’m glad you ask. Basically, what we need for SNA is a link between the people and the texts in which they appear. Now, I have no idea how sophisticated these other datasets are, but to avoid confusion/ mistakes/ whatever kind of apocalyptic disaster, you need unique identifiers, both for your individuals, and for your texts. Trismegistos Texts is now slowly expanding beyond the Egyptian borders, so perhaps we already have some of the texts incorporated in the other datasets, and then it should be pretty easy to link them (I hope. Oh god, Mark, please don’t make me do this, it sounds so boring). But I suspect that for most of the data, new identifiers will have to be created.
GB: Unique identifiers for all persons we have, of course. SNAP mints URIs for all persons we have data for, whether they had dereferenceable (YB: is this even a word, Gabby?!) URIs in the source datasets or not. In some cases we have identifiers for texts too (TM uses Papyri.info URIs, as you know); in other cases, we’ve had to hope that parsing text strings will be sufficiently unambiguous to be useful. (We’ve identified a few hundred co-references between LGPN and TM using text strings.) We also have a lot of persons from library catalogues (VIAF, the British Museum, Zenon and Trismegistos Authors) among whom co-references ought to be plentiful.
So this seems to be a little circular at the moment, doesn’t it? One of the things SNA might help with is identifying co-references, which in turn will help us build a graph of relationships. But you’re telling me that SNA isn’t really feasible on our data until we have a much better graph of co-references, relationships, and text co-occurrences. Is there anything useful we could do together in the meantime?
YB: Go out for a drink?
GB: Excellent idea!
Three hours later…
GB: So, where were we? Oh yes, so, anything useful we can do regarding SNAP? Snap, snap snappety snap. Dragons breathe fire. I wish I could breathe fire.
(next to Yanne, Silke is slowly sliding off her chair)
YB: Well Gabby, I’m glad you ask. *hiccup*
Since we are enriching Trismegistos, by adding new texts from around the Mediterranean, by identifying individuals in the Egyptian texts, and by adding extra information such as titles, ethnics and status designations, and at the same time you are enriching SNAP, we are actually feeding into each other symbiotically, like … (v. long pause) … like … (*v. pensive look*) … like … (grabs computer and starts Googling) … like meat ants and leafhoppers that find each other over sugary sap in the Australian outback!
And hey, Silke, what about that “Structural Equivalence” hoodoo you’ve been learning in London, could that be of any help?
SV (from under her desk, slurring):We-ell, that’ss a fery intreshting conssept that ekssploress the ssoshal e-viroment of a persson, but assush, iss no ssuitable to uze on the data that we would be pressented with in ShNAP-p-p!
GB: Are there any improvements you can suggest for the Trismegistos database to perform your SNA hocus-pocus?
YB: Well, Gabby, I’m glad you ask. Since Trismegistos is pure perfection: no enhancements needed there.
Okaaay… Maybe there’s just a teeny tiny little bit of room for improvement when it comes to titles. You see, we hardly have any. Asking the computer to retrieve them, like we did for the names, proved to be next to impossible, and it’s a hell of a lot of work if you have to go through some 500,000 attestations manually. I’ve already gone through more than 10,000 of them while working on my double names and municipal officials, so I’ve done my share, methinks. Also, it’s not exactly easy to standardize titles, what with all the different languages in Egypt and all. But I guess that if one of the other datasets has a list or something we could look into, that might help us out a bit… Ideas for extraction from the texts are also welcome, as long as they don’t involve child labor. The Belgian government is pretty strict on that.
GB: So, in a hopefully not too distant future, when all these relationships are implemented through SNAP:DRGN, how can the participating projects in turn be of service to you and other researchers who would like to use SNA? When SNAP is ready for SNA to be performed on it, what questions will you ask of it?
SV: Prosopographies are the ideal datasets for SNA research as those datasets of people have been formed or selected because of some common features (mentioned in the same source, part of an ethnical/social group, time fellows…). Once the technical infrastructure is in place, it will be relatively straightforward to convert the virtual two-mode networks linking texts and the people appearing in them into the one-mode networks (person – person) needed for actual SNA.
YB *whispering to Silke*: That’s not a question. He wants questions. Give him questions! What if he’s, like, North Korean secret police or something?! For God’s sake, give him questions, woman!
SV (*screaming in terror*): WHY DID THE CHICKEN CROSS THE ROAD??
YB: Oh god, we’re toast! *makes a run for it*