TAI Safety Bibliographic Database

By Jess_Riedel @ 2020-12-22T16:03 (+61)

Authors: Jess Riedel and Angelica Deibel

Cross-posted to LessWrong
 

In this post we present the first public version of our bibliographic database of research on the safety of transformative artificial intelligence (TAI).  The primary motivations for assembling this database were to:

  1. Aid potential donors in assessing organizations focusing on TAI safety by collecting and analyzing their research output.
  2. Assemble a comprehensive bibliographic database that can be used as a base for future projects, such as a living review of the field.

The database contains research works motivated by, and substantively informing, the challenge of ensuring the safety of TAI, including both technical and meta topics.  This initial version of the database has attempted comprehensive coverage only for traditionally formatted research produced in 2016-2020 by organizations with a significant safety focus (~360 items). The database also has significant but non-comprehensive coverage (~570 items) of earlier years, less traditional formats (e.g., blog posts), and non-safety-focused organizations. Usefully, we also have citation counts for essentially all the items for which that is applicable.

The core database takes the form of a Zotero library. Snapshots are also available as Google Sheet, CSV, and Zotero RDF. (Compact version for easier human reading: Google Sheet, CSV.)

The rest of this post describes the composition of the database in more detail and presents some high-level quantitative analysis of the contents.  In particular, our analysis includes:

In 2020 we observe a year-over-year drop in technical safety research, but not meta safety research, which we do not understand (Figure 1).  We suggest some possible causes, but without a convincing explanation we must caution against drawing strong conclusions from any of our data.

If you are interested in building on this work, we encourage you to contact us (or just grab the data from the links above). Please see the section “Feedback & Improvements”.

Composition

Inclusion & categorization

We use "paper" and “item” interchangeably to refer to any written piece of research, such as an article, book, blog post, or thesis. For this initial version of the database, we have divided all papers into two subject areas: technical safety (e.g. alignment, amplification, decision theoretic foundations) and meta safety (e.g., forecasting, governance, deployment strategy).  

Our inclusion criteria do not represent an assessment of quality, but we do require that the intended audience is other researchers (as opposed to the general public). Our detailed criteria for including and categorizing papers can be found in the Appendix. 

Safety organizations

Where appropriate, papers were associated with one or more of the following organizations that have an explicit focus, at least in part, on the safety of transformative artificial intelligence: AI Impacts, AI Safety Camp, BERI, CFI, CHAI, CLR, CSER, CSET, DeepMind, FHI, FLI, GCRI, GPI, Median Group, MIRI, Open AI, and Ought. We refer to all these as “safety organizations” hereafter.

Note that AI Impacts, AI Safety Camp, BERI, and MIRI use unconventional research/funding mechanisms and particular care must be taken when using our data to assess their impact. Further detail on this issue can be found in the section “Caveats” in the Appendix.

Coverage

The papers in our database can be partitioned by four binary properties: 

For this initial version of our database we aimed for near comprehensive coverage of traditionally formatted research produced by safety organizations since 2016, i.e., “Traditional” AND “Safety org” AND “Recent” AND (“Tech” OR “Meta”). As discussed in the later subsection “Papers over time”, we are probably nevertheless missing many papers from 2020 for various reasons, but we expect our coverage of 2020, and indeed of any given year, to improve over time.

Table 1: We aimed for comprehensive coverage in the light blue region: “Traditional”, “Safety org”, “Tech” & “Meta”, “Recent”.  Here, “Recent” means since 2016, inclusive.

Future versions of this database may aim for comprehensive coverage in other categories; see “Feedback & improvements” below.

Analysis

Our analysis in this section is largely restricted to the categories for which we are attempting comprehensive coverage, i.e., the ~360 items of TAI safety research produced since 2016 by safety organizations and intended for traditional review, indicated by the blue background in Table 1.

Warning: The drawbacks to quantifying research output by counting papers or citations are massive.  At best, citations are a measure of the amount of discussion about a paper, and this is a poor measure of importance for many of the same reasons that the most discussed topics on Twitter are not the most important.  

Top papers

To get a sense of the most read and discussed TAI safety research, we list in Tables 2 and 3 the most cited papers for each of the years 2016-2020.

The set of TAI safety papers at the top of the rankings is very dependent on one's criteria for considering whether a paper is about TAI safety.  Some papers that are on the borderline of TAI safety (e.g., being more closely connected to near-term topics like autonomous vehicles and DL interpretability) have wider audiences and can dominate the citation counts.  Nevertheless, it seems useful to have an inclusive list of most cited papers to get a sense of what’s out there. One can start at the top of the list and move down, ignoring those papers that are not sufficiently devoted to TAI safety.

 

Table 2. The top cited TAI technical safety research for years 2016-2020, including work produced by non-safety organizations. Sorry it’s an image.

 

Table 3. The top cited TAI meta safety research for years 2016-2020, including work produced by non-safety organizations.  I really do feel bad about the fact that it's an image.

If you want to go deeper into these lists, open up the compact version of the database and sort by year and number of cites: Google Sheet, CSV.

Papers over time

In this section we look at the number of TAI safety research papers produced each year in the span 2016-2020.  As shown in Figure 1, we find a surprising drop in the number of technical safety works in 2020 relative to 2019.  This potentially indicates an effect of the pandemic, a large number of missing papers from this year, a shift in research format, and/or something else; we’re not sure.  In any case, this lowered our confidence in the current state of this dataset and dissuaded us from looking at how the output of individual orgs changes over time until we can understand the drop better.  The drop was seen in many organizations, including the largest contributors to TAI technical safety (CHAI, FHI, Deepmind, and Open AI), so it appears to be systemic.

Figure 1. The number of traditionally formatted TAI safety research papers produced by safety organizations from 2016 to 2020.

Because we don’t have a satisfactory explanation to this, we simply list some some complicating considerations:

It’s possible that there’s some genuine trend here, with research moving away from the relatively narrow category of technical safety research published in non-blog-post form and supported by safety organizations, and instead towards meta-safety research, towards being published as blog posts, or towards being supported by other organizations or done without organizational support. It’s also possible that the pandemic had an impact on the amount of research done (or published) in 2020 (the drop in technical safety conference papers from 2019 to 2020 makes this a tempting conclusion to jump to, but, again, meta safety conference papers actually increased). However, we don’t think our data provides strong evidence on any of these points. 

Separately, we didn't make a corresponding plot of citations in different years because these go down with time; younger papers haven’t had time to accumulate as many cites.  (Our automated solution for scraping Google Scholar for citation data means we don’t have access to when citations are made.)

Collaboration between organizations

The following matrix illustrates the extent to which different AI-safety organizations collaborate on research. 

Table 4. How the safety organizations collaborate on traditionally-formatted TAI safety research, 2016-2020.  The left half of each off-diagonal box contains the fraction of all technical safety papers associated with the organization for that row that were also associated with the organization for that column.  The right half is the same for meta safety.  The intensity of the green color of each box is set by the fraction of all safety papers (both technical and meta combined).  Likewise, each on-diagonal box contains the fractions of papers where the organization did not collaborate with any of the other organizations. Box halves are left empty when the organization on that row had no papers of the corresponding type.

 

Output by organization

In Figure 2 we break down the research papers associated with each organization into different types: conference papers, books, manuscripts, etc.  Recall that for this analysis we are ignoring web content not intended for review (e.g., blog posts).  

Figure 2. Type of item produced for traditionally-formatted TAI safety research by the safety organizations. Note that “manuscript” is the only type of item that has not (yet) undergone peer or editorial review.

In Figures 3 and 4 we display the total number of papers and citations associated with each organization.  We call these “fractional papers” and “fractional cites” to emphasize that we have divided credit for each item and its citations equally to all the organizations that are associated with it.  Needless to say, for any given item the support it received from different organizations is unlikely to be equal, but we didn’t have a way to account for that with a reasonable amount of work.

Figure 3. The total number of research papers produced by each safety organization between 2016 and 2020, with credit for multi-organization papers divided equally.

 

Figure 4. The total number of citations accumulated by each safety organization between 2016 and 2020, with credit for citations to multi-organization papers divided equally. Because of the large disparity in citation counts, the plots in the second row have been zoomed in to show detail.

 

This concludes our analysis.

Feedback & improvements

If there is sufficient interest, we would like to make some of the following improvements in the future:

Want to help?

Accurately classifying 1,500+ papers is not an easy task, and there will be mistakes.  Please bring them to our attention.  You can email us or just write down your suggestion here. If you are interested in looking at a batch of papers that we found ambiguous and recommending a classification, please contact us, especially if you are an AI safety researcher.

Please let us know if you find this database useful and how you would like it to be improved.  We are certainly willing to adjust our inclusion criteria based on strong reasons, but it is important we do not let the database grow to an unmanageable size.

Simple comments like "this database informed my donation decision" or "I found an interesting AI Safety paper here that I didn't know about before" are very useful and help us decide whether to put more effort into this in the future.  

Please also contact us if you are interested in helping to improve this database more generally, either one-off or in an on-going capacity.  We think that maintaining and expanding the database for a year would be an excellent side-project for a grad student in AI safety to help them gain a broader perspective on the field and contribute to the community. In particular, there seems to be some interest for collecting brief summaries and assessments of the most important TAI safety papers in one place, and this database could be a foundation for that.

We also could use help modifying the Zotero Scholar Citations plugin to more intelligently avoid tripping Google Scholar’s rate-limit ban.

Add your organization?

If you would like your organization included in future versions of the database, please send us a list of papers that ought to be associated with your organization, either as text file, Excel file, CSV file, or Google Sheet.  It is not necessary to give us the complete bibliographic information, just enough for us to reliably uniquely identify the works.  (For instance, a list of DOI and arXiv numbers is fine.) Optionally, you may also include suggested categorization and an explanation for why you believe a certain paper fits our criteria, which would be especially helpful for papers where that might be ambiguous.

Acknowledgements

We thank Andrew Critch, Dylan Hadfield-Menell, and Larks for feedback. We also thank the organizations who responded to our inquiries and helpfully provided additions and corrections to our database. We are, of course, responsible for the remaining mistakes.

Appendices

Inclusion & categorization

This section includes more details on what sort of research we include in our database, and how it is categorized.  

These are our inclusion criteria:

Here are how research papers are classified as technical safety vs. meta safety:

Note that near the popular level, or at the very speculative level, it's sometimes hard to distinguish meta safety and technical safety.

Here is how we define the different item types:

Criteria for our quantitative analysis

We err on the side of inclusion in our database, but when we analyze it quantitatively in this post we exclude some categories that are hard to cover comprehensively. This makes the conclusions we draw less dependent on the vagaries of what managed to make it into the database.  In particular, the following items are excluded:

Dates

Our convention is to (1) date manuscripts (preprints) using the date on which they first appeared and (2) date published articles by the publication date.  This has the unfortunate property that the date must be updated when a manuscript gets published, potentially biasing our data in weird ways.  In particular, it makes judging whether a published article has a lot of citation given its age difficult, since a longer time between preprint and publication will yield more citations.  (See for instance "Cheating Death in Damascus" by Levinstein & Soares, which was available since at least 2017 but published only this year.)  It also means there will appear to be more papers produced in the last year relative to earlier years, even with constant research output in the field, since the number of papers attributed to a fixed year (say, 2017) can decrease over time as things are published and have their dates updated.

Unfortunately, this bias seems hard to avoid since the alternative is to track down the year that a preprint appeared for all published articles by hand, and using that year would be a non-standard convention in any case.

Discoverability

Our data might be less complete for 2020 compared to previous years because papers released recently by researchers at these organizations have not yet been reported to the organization’s management; that would mean the papers are not listed on the organization website and the organizations don't know about them when we ask by email.  (Almost all the organizations replied when we asked them for help filling in papers missing from our database.) A related factor is that many papers in our database were found in existing review articles and bibliographies (see “Sources” in the Appendix).  This process necessarily only turns up papers released before those sources were written.

Unfortunately, some of this effect is probably unavoidable. It could potentially be reduced by finding (and confirming organization affiliation for) all papers by authors associated with each organization we cover, but this would be quite laborious, especially since there is a fair amount of author migration.  Such effort would be mostly made useless the next year when authors had finished reporting their 2020 papers to the orgs.  

Therefore, we have elected not to account for this effect.  We encourage organizations to improve their collection of this data, and we caution the reader to put even less faith in the 2020 numbers than earlier years.

Organizations

Here are the organizations that we associated with papers in our database:

In general, we associated a paper with an institution when, in the paper, the organization was listed as an author affiliation or was explicitly acknowledged as providing funding. In a few cases we associated a paper with an organization because the organization told us the author was highly likely to be supported by the organization during preparation of the paper, or removed such association because the organization told us the support provided was very minimal during preparation of the paper.

Many papers are associated with multiple organizations. When necessary to "count" the credit, we gave each organization equal weight. Of course we expect that sometimes one organization deserved much more credit, in some sense, but it wasn't feasible for us to determine this.

Caveats

Here are some comments on individual organizations that are important for interpreting our data, especially with respect to the organizations’ overall impact:

Zotero library details

The information in this section may be useful if you are going through the details of our Zotero library.

Getting a copy

The Zotero library is here. Unfortunately it is not possible to export a copy of the library from the web interface.  If you would like an up-to-date copy of the library, please contact us. If you are satisfied with a static snapshot from the day we released this post, you can download it as Google Sheet, CSV, and Zotero RDF. Note that Zotero libraries can easily be imported into Mendeley, but going the other direction is much harder.

Tagging details

The actual Zotero library contains both papers that satisfy our inclusion criteria (tagged “TechSafety” and “MetaSafety”) and papers that we imported from other review articles and bibliographies but decided did not satisfy our criteria (“NotSafety”).  We also marked papers that we decided on a first pass were borderline (tagged “AmbiguousSafety”) so that they can easily be re-considered in the future.  Papers associated with a safety organization are tagged as such, while papers not associated with any safety organization are tagged “Other-org”.  We also have some miscellaneous tags for a few blog posts that are unlikely to pass our criteria (tagged “non-notable”),  AI Safety Camp papers produced during the AI Safety Research Program (tagged “AISRP2019”), and AI Impacts web pages that are not featured articles (tagged “AI-Impacts-NotFeatured”).

For technical reasons within Zotero, all web content is formally categorized as a “blog post”, and all white papers are formally categorized as a “report”.

Citation count scraping

To scrape citation count information from Google Scholar, we use the Zotero Scholar Citation plugin written by Anton Beloglazov and Max Kuehn.  Citation information is recorded in the “extra” field and prefaced by “ZSCC:” (which stands for Zotero Scholar Citation Count).  When this information is unavailable (“ZSCC: NoCitationData“) we take the data by hand and denote it with “ACC:” or “JCC:”.  In a couple cases we noticed that the plugin was mistaking a paper from another published paper, so we used “JCCoverride:” to denote hand counts that are correct.

Copyright

Added May 29, 2021: We release the Zotero database under the Creative Commons Attribution-ShareAlike 4.0 International License.  In short, the means you are free to use, modify, and reproduce the database for anything so long as you cite us and release any derivative works under the same license.

Sources

The following sources are not traditional review articles, but are good sources of TAI safety research articles and summaries thereof (“maps”):

Here are some AI Safety review articles:

Here are the online lists of sponsored publications or researchers maintained by the safety organizations, a few of which are filtered by AI safety.  

Unfortunately, they are only sporadically updated and difficult to consume using automated tools.  We encourage organizations to start releasing machine-readable bibliographies to make our lives easier.


Larks @ 2020-12-22T17:20 (+19)

I just wanted to say thanks very much to Jess and Angelica for putting all this together in addition to the analytics above, they were extremely helpful in providing me with lists of relevant papers from relevant organisations that I would have likely missed otherwise. 

jsteinhardt @ 2020-12-22T23:11 (+4)

Thanks for curating this! You sort of acknowledge this already, but one bias in this list is that it's very tilted towards large organizations like DeepMind, CHAI, etc. One way to see this is that you have AugMix by Hendrycks et al., but not the Common Corruptions and Perturbations paper, which has the same first author and publication year and 4x the number of citations (in fact it would top the 2019 list by a wide margin). The main difference is that AugMix had DeepMind co-authors while Common Corruptions did not.

I mainly bring this up because this bias probably particularly falls against junior PhD students, many of whom are doing great work that we should seek to recognize. For instance (and I'm obviously biased here), Aditi Raghunathan and Dan Hendrycks would be at or near the top of your citation count for most years if you included all of their safety-relevant work.

In that vein, the verification work from Zico Kolter's group should probably be included, e.g. the convex outer polytope [by Eric Wong] and randomized smoothing [by Jeremy Cohen] papers (at least, it's not clear why you would include Aditi's SDP work with me and Percy, but not those).

I recognize it might not be feasible to really address this issue entirely, given your resource constraints. But it seems worth thinking about if there are cheap ways to ameliorate this.

Also, in case it's helpful, here's a review I wrote in 2019: AI Alignment Research Overview.

Jess_Riedel @ 2020-12-22T23:52 (+12)

Thanks Jacob.  That last link is broken for me, but I think you mean this?

 You sort of acknowledge this already, but one bias in this list is that it's very tilted towards large organizations like DeepMind, CHAI, etc.

Well,  it's biased toward safety organizations, not large organizations.  (Indeed, it seems to be biased toward small safety organizations over larges ones since they tend to reply to our emails!)  We get good coverage of small orgs like Ought, but you're right we don't have a way to easily track individual unaffiliated safety researchers and it's not fair.

I look forward to a glorious future where this database is so well known that all safety authors naturally send us a link to their work when its released, but for now the best way we have of finding papers is (1) asking safety organizations for what they've produced and (2) taking references from review articles.  If you can suggest another option for getting more comprehensive coverage per hour of work we'd be very interested to hear it (seriously!).

For what it's worth, the papers by Hendrycks are very borderline based on our inclusion criteria, and in fact I think if I were classifying it today I think I would not include it.  (Not because it's not high quality work, but just because I think it still happens in a world where no research is motivated by the safety of transformative AI; maybe that's wrong?) For now I've added the  papers you mention by Hendrycks, Wong, and Cohen to the database, but my guess is they get dropped for being too near-term-motivated when they get reviewed next year.

More generally, let me mention that  we do want to recognize great work, but our higher priority is to (1) recognize work that is particularly relevant to TAI safety and (2) help donors assess safety organizations. 

Thanks again!  I'm adding your 2019 review to the list.

 

jsteinhardt @ 2020-12-23T05:48 (+7)

Well,  it's biased toward safety organizations, not large organizations.

Yeah, good point. I agree it's more about organizations (although I do think that DeepMind is benefiting a lot here, e.g. you're including a fairly comprehensive list of their adversarial robustness work while explicitly ignoring that work at large--it's not super-clear on what grounds, for instance if you think Wong and Cohen should be dropped then about half of the DeepMind papers should be too since they're on almost identical topics and some are even follow-ups to the Wong paper).

Not because it's not high quality work, but just because I think it still happens in a world where no research is motivated by the safety of transformative AI; maybe that's wrong?

That seems wrong to me, but maybe that's a longer conversation. (I agree that similar papers would probably have come out within the next 3 years, but asking for that level of counterfactual irreplacibility seems kind of unreasonable imo.) I also think that the majority of the CHAI and DeepMind papers included wouldn't pass that test (tbc I think they're great papers! I just don't really see what basis you're using to separate them).

I think focusing on motivation rather than results can also lead to problems, and perhaps contributes to organization bias (by relying on branding to asses motivation). I do agree that counterfactual impact is a good metric, i.e. you should be less excited about a paper that was likely to soon happen anyways; maybe that's what you're saying? But that doesn't have much to do with motivation.

Also let me be clear that I'm very glad this database exists, and please interpret this as constructive feedback rather than a complaint.

Jess_Riedel @ 2020-12-23T15:39 (+9)

for instance if you think Wong and Cohen should be dropped then about half of the DeepMind papers should be too since they're on almost identical topics and some are even follow-ups to the Wong paper).

Yea, I'm saying I would drop most of those too.

I think focusing on motivation rather than results can also lead to problems, and perhaps contributes to organization bias (by relying on branding to asses motivation).

I agree this can contribute to organizational bias.

I do agree that counterfactual impact is a good metric, i.e. you should be less excited about a paper that was likely to soon happen anyways; maybe that's what you're saying? But that doesn't have much to do with motivation.

Just to be clear: I'm using "motivation" here in the technical sense of "What distinguishes this topic for further examination out of the space of all possible topics?", i.e., is the topic unusually likely to lead to TAI safety results down the line?" (It's not anything to do with the author's altruism or whatever.)

I think what would best advance this conversation would be for you to propose alternative practical inclusion criteria which could be contrasted the ones we've given.

Here's how is how I arrived at ours. The initial desiderata are:

  1. Criteria are not based on the importance/quality of the paper. (Too hard for us to assess.)

  2. Papers that are explicitly about TAI safety are included.

  3. Papers are not automatically included merely for being relevant to TAI safety. (There are way too many.)

  4. Criteria don't exclude papers merely for failure to mention TAI safety explicitly. (We want to find and support researchers working in institutions where that would be considered too weird.)

(The only desiderata that we could potentially drop are #2 or #4. #1 and #3 are absolutely crucial for keeping the workload manageable.)

So besides papers explicitly about TAI safety, what else can we include given the fact that we can't include everything relevant to safety? Papers that TAI safety researchers are unusually likely (relative to other researchers) to want to read, and papers that TAI safety donors will want to fund. To me, that means the papers that are building toward TAI safety results more than most papers are. That's what I'm trying to get across by "motivated".

Perhaps that is still too vague. I'm very in your alternative suggestions!

jsteinhardt @ 2020-12-23T06:00 (+9)

Also in terms of alternatives, I'm not sure how time-expensive this is, but some ideas for discovering additional work:

-Following citation trails (esp. to highly-cited papers)

-Going to the personal webpages of authors of relevant papers, to see if there's more (also similarly for faculty webpages)

Jess_Riedel @ 2020-12-23T15:43 (+5)

Sure, sure, we tried doing both of these. But they were just taking way too long in terms of new papers surfaced per hour worked. (Hence me asking for things that are more efficient than looking at reference lists from review articles and emailing the orgs.) Following the correct (promising) citation trail also relies more heavily on technical expertise, which neither Angelica nor I have.

I would love to have some collaborators with expertise in the field to assist on the next version. As mentioned, I think it would make a good side project for a grad student, so feel to nudge yours to contact us!