Häggström hävdar

fredag 19 april 2024

Future of Humanity Institute 2005-2024

The news that University of Oxford's Future of Humanity Institute (FHI), after nearly two decades of existence, closed down earlier this week (Tuesday, April 16) made me very sad. The institute was Nick Bostrom's brainchild, and it was truly pioneering in terms of formulating some of the most profound and important questions about how to ensure a flourshing future for mankind, as well as in beginning the work of answering them. Their work has more or less uninterruptedly been at the forefront of my mind for more than a decade, and although I only visited their physical headquarters twice (in 2012 and 2016), it is clear to me that it was a uniquely powerful and creative research environment.

In the first draft of this blog post I used the acronym RIP in the headline, but decided to change that, because I wish that what remains from the institute - the minds and the ideas that it fostered - will not rest in peace, but instead continue to sparkle and help create a splendid future. They can do this at the many subsequent research institutes and think tanks that FHI helped inspire, such as The Centre for the Study of Existential Risk in Cambridge, The Future of Life Institute in Massachussetts, The Global Priorities Institute in Oxford, and The Mimir Center at the Institute for Future Studies in Stockholm. And elsewhere.

My friend Anders Sandberg was a driving force at the institue almost from the start and then until the very end. His personal memoir of the institute, entitled Future of Humanity Institute 2005-2024: Final Report offers a summary and many wonderful glimpses from their successful work, including a generous collection of photographs.¹ Reading it is great consolation at this moment. Along with the successes, Anders also tells us briefly about the institute's downfall:

Starting in 2020, the Faculty [of Philosophy] imposed a freeze on fundraising and hiring. Unfortunately, this led to the eventual loss of lead researchers and especially the promising and diverse cohort of junior researchers, who have gone on to great things in the years since. While building an impressive alumni network and ecosystem of new nonprofits, these departures severely reduced the Institute. In late 2023, the Faculty of Philosophy announced that the contracts of the remaining FHI staff would not be renewed. On 16 April 2024, the Institute was closed down. [p 19] Later, on p 60-61, he offers three short paragraphs about what failings on the FHI's side may have led to such harsh treatment from the Faculty. What he offers is hardly the full story, and I have no specific insight into their organization that can add anything. Still, let me offer a small speculation, mostly based in introspection into my own mind and experience, about the kind of psychological and social mechanisms that may have contributed:

If you are an FHI kind of person (as I am), it will likely seem to you that lowering P(doom) by as little as a ppm is so obviously urgent and important that it appears superfluous and almost perverse to argue for such work using more traditional academic measuring sticks and rituals. That may lead you to ignore (some of) those rituals. If this clash of cultures continues for sufficiently long without careful intervention, the relations to the rest of the university are likely to decline and eventually collapse.

Footnote

1) See also his latest blog post.

tisdag 2 april 2024

Interviewed about AI risk in two new episodes of The Evolution Show

Three years ago, shortly after the release of the first edition of my book Tänkande maskiner, I was interviewed about AI risk by Johan Landgren in his YouTube podcast The Evolution Show. The amount of water under the bridges since then has been absolutely stupendous, and the issue of AI risk has become much more urgent, so last month Johan decided it was time to record two more episodes with me:

In his marketing of our discussion he put much emphasis on "7 years" as a timeline until the decisive AI breakthrough that will make or break humanity. I'm not sure I even mentioned that figure explicitly in our conversations, but admittedly it was implicit in some of the imagery I held forth. Still, I should emphasize that timelines are extremely uncertain, to the extent that an exact figure like 7 years needs to be taken with a huge grain of salt. It could happen in 2 years, or 5, or 10, or - provided either some severe unforseen technical obstacle or a collective decision to pause the development of frontier AI - even 20 years or more. This uncertainty subtracts nothing, however, from the urgency of mitigating AI existential risk.

Another part of Johan's marketing of our conversation that I'd like to quote is his characterization of it as "kanske det viktigaste jag haft i mitt liv" ("perhaps the most important one in my entire life"). This may or may not be an exagerration, but I do agree with him that the topics we discussed are worth paying attention to.

tisdag 26 mars 2024

What Max Tegmark really said about AI risk

In a recent article in the Australian online magazine Quillette, Swedish AI pundit Mathias Sundin portrays his more framous compatriot Max Tegmark as a doomsayer. He quotes Tegmark as saying the following in the Swedish public radio show Sommar on August 1, 2023:

I’ve been thinking a lot about life and death lately. Now, it’s probably my turn next in my family. But I guess the rest of humanity will perish about the same time — after over a hundred thousand years on our planet. I believe the artificial intelligence that we’re trying to build will probably annihilate all of humanity pretty soon. The quote is authentic and the translation from Swedish is fine. But look at Sundin's comment immediately after the quote:

There were no ifs or buts and no “10 percent risk” or other disclaimers — just the promise of certain doom. The careful reader might here notice the contrast between Sundin's talk of “certain doom”, and Tegmark's use of the word “probably” — which normally signifies not certainty but uncertainty.¹ If the reader trusts that Sundin writes in good faith, he will likely conclude that the word “probably” is used here by Tegmark not in its real meaning, but as meaningless sentence filler and random noise, and that if one looks carefully at what else he has to say in the Sommar show, Sundin's summary about “the promise of certain doom” turns out to be accurate.

So let's have a look then, at what Tegmark has to say in Sommar, to determine whether Sundin's summary statement is warranted. (Spoiler: it is not.)

Less than half a minute after the passage quoted by Sundin, Tegmark says the following:

This is already an indication that, far from preaching “certain doom”, Tegmark thinks that “a more hopeful and inspiring future” is possible, and wants to engage his listeners in the great project of navigating towards such a future.

Soon after that, Tegmark goes on to compare an AI catastrophe wiping out humanity to a more down-to-Earth and familiar event: the death at old age of his own parents. He feels his parents had full lives, and thinks the possibility of an AI apocalypse compares unfavorably to this:

“Our future can become so much better than our past” are hardly the words of someone who predicts “certain doom”, and the entire passage is on the contrary rather hopeful. Later in the show, Tegmark spells out his hopeful vision about a flourishing future in slighly more detail:

⁴

A more striking counterexample to Sundin's claim that Tegmark offers “no ifs and buts” in his talk about AI risk would be hard to imagine. And there's more:

⁵

And this:

⁶

It is clear from these quotes that far from predicting “certain doom”, Tegmark thinks a bright future awaits us if only we get our act together and play our cards right. It is furthermore clear that he doesn't even think this right way of playing our cards necessarily invloves giving up on the project of building superintelligent AI. To the contrary, such AI plays a key role in the grand visions for the future that he paints. It's just that he understands that there are risks involved and that we therefore need to proceed with a suitable level of caution, something that he judges the leading AI developers to lack. Hence his talk about the need to change course.

This is a highly reasonable position (and one that I share),⁷ but conveying it accurately goes counter to Sundin's ambition of painting Tegmark in the worst possible light. Instead, Sundin shamelessly decides to gamble on the readers of Quillette not knowing Swedish and therefore not being able to check his story, and to simply lie about what Tegmark says in Sommar.

What else can be said about Sundin's Quillette article? Well, it is quite bad. As is sadly typical for large parts of the AI debate as a whole and for his own writings in particular, Sundin is not interested in engaging seriously with the arguments of his opponents, and the article is full of inaccuracies and misrepresentations. I spent so much ink in 2023 — one newspaper op-ed and two blog posts — on Sundin's poor writing that frankly speaking I am sick of it, and so I will not say anything more about his Quillette article. I will not even comment on his choice (which is especially bizarre given the long and tedious email exchange we had about it in December last year) to continue to grossly misrepresent the point of my Lex Luthor thought experiment.

This is mostly not about Sundin himself, but more intended as an example of the abysmally low level of discourse among the category of accelerationists and AI risk deniers that he represents. I do worry, however, that him being chosen in December last year as a member of the Swedish government's AI commission is a small sign of governmental dysfunction.

Footnotes

1) The word used in the Swedish original is “antagligen” (01:37 into the radio show), which has very similar connotations in the present context as the English “probably”.

2) The Swedish original, beginning at 02:05: Vad kan vi göra idag, för att ändra riktning mot en mer hopfull och inspirerande framtid?

3) The Swedish original, at 03:48: Men om hela mänskligheten dör ut, för att vi klantat till det med artificiell intelligens, så känner jag däremot att vi varken levt färdigt eller dött värdigt. Snarare skulle det kännas tragiskt och onödigt, som om ett barn av misstag cyklar över kanten på ett stup, trots många varningar. Vi är så unga ur ett kosmiskt perspektiv, med miljarder lovande år framför oss. Om vi undviker det där stupet så kan vår framtid dessutom bli så mycket bättre än vårt förflutna. Vi är på vippen att äntligen ta kontroll över vårt eget öde, och lösa många av de största problem som vi människor hittills gått bet på.

4) The Swedish original, at 54:45: Jag har pratat mycket om hotet från artificiell intelligens, men vad händer om vi lyckas ändra kurs, och hamnar i en framtid med en superintelligens under vår kontroll, som inte används illvilligt, är felriktad eller konkurrerar med oss? Först och främst blir det som julafton för alla som gillar forskning och teknik, eftersom superintelligensens forskning blir begränsad inte av vad vi människor kan lista ut, utan bara av vad som är fysiskt möjligt. Upptäckter som jag som tonåring trodde skulle ta tusentals år eller en evighet skulle kunna ske under vår livstid. Både min morbror Erik och hans dotter Anna dog t.ex. av cancer. Jag är övertygad om att all cancer går att bota, problemet är att vi människor hittills inte lyckats lista ut hur. Jag tror att superintelligens snabbt skulle hitta botemedel för alla sjukdomar, och lyfta alla ur fattigdom, stabilisera klimatet, och lösa alla de klassiska världsproblemen om vi hittills gått bet på.

5) The Swedish original, at 56:10: Min AI-aktivism drivs inte bara av att jag vill undvika problem utan också av att jag vill bevara de här inspirerande möjligheterna.

6) The Swedish original, at 01:04:00: Vi behöver inte konkurrera med AI. Det är bara Moloch som försöker lura i oss det. Det är nämligen vi som bygger AI, och om vi ser till att vi behåller kontrollen så blir det vi och inte AI som bestämmer var skåpet skall stå.

7) For a summary statement of where I stand on AI issues, readers who know Swedish are encouraged to consult the bonus chapter (downloadable for free) in the 2023 edition of my book Tänkande maskiner. For those who prefer English, there are various video recordings of talks I've given that can serve a similar purpose; see here, here and here.

torsdag 29 februari 2024

On OpenAI's report on biorisk from their large language models

Aligning AIs with whatever values it is we need them to have in order to ensure good outcomes is a difficult task. Already today's state-of-the-art Large Language Models (LLMs) present alignment challenges that their developers are unable to meet, and yet they release their poorly aligned models in their crazy race with each other where first prize is a potentially stupendously profitable position of market dominance. Over the past two weeks, we have witnessed a particularly striking example of this inability, with Google's release of their Gemini 1.5, and the bizarre results of their attempts to make sure images produced by the model exhibit an appropriate amount of demographic diversity among the people portrayed. This turned into quite a scandal, which quickly propagated from the image generation part of the model to the likewise bizarre behavior in parts of its text generation.¹

But while the incident is a huge embarrassment to Google, it is unlikely to do much real damage to society or the world at large. This can quickly change with future more capable LLMs and other AIs. The extreme speed at which AI capabilies are currently advancing is therefore a cause for concern, especially as alignment is expected to become not easier but more difficult as the AIs become more capable at tasks such as planning, persuasion and deception.² I think it's fair to say that the AI safety community is nowhere near a solution to the problem of aligning a superhumanly capable AI. As an illustration of how dire the situation is, consider that when OpenAI in July last year announced their Superalignment four-year plan for solving the alignment problem in a way that scales all the way up to such superhumanly capable AI, the core of their plan turns out to be essentially "since no human knows how to make advanced AI safe, let's build an advanced AI to which we can delegate the task of solving the safety problem".³ It might work, of course, but there's no denying that it's a huge leap into the dark, and it's certainly not a project whose success we should feel comfortable hinging the future survival of our species upon.

Given the lack of clear solutions to the alignment problem on the table, in combination with how rapidly AI capabilities are advancing, it is important that we have mechanisms for carefully monitoring these advances and making sure that they do not cross a threshold where they become able to cause catastrophe. Ideally this monitoring and prevention should come from a state or (even better) intergovernmental actor, but since for the time being no such state-sanctioned mechanisms are in place, it's a very welcome thing that some of the leading AI developers are now publicizing their own formalized protocols for this. Anthropic pioneered this in September last year with their so-called Responsible Scaling Policy,⁴ and just three months later OpenAI publicized their counterpart, called their Preparedness Framework.⁵

Since I recorded a lecture about OpenAI's Preparedness Framework last month - concluding that the framework is much better than nothing, yet way too lax to reliably protect us from global catastrophe - I can be brief here. The framework is based on evaluating their frontier models on a four-level risk scale (low risk, medium risk, high risk, and critical risk) along each of four dimensions: Cybersecurity, CBRN (Chemical, Biological, Radiological, Nuclear), Persuasion and Model Autonomy.⁶ The overall risk level of a model (which then determines how OpenAI may proceed with deployment and/or further capabilities development) is taken to be the maximum among the four dimensions. All four dimensions are in my opinion highly relevant and in fact indipensable in the evaluation of the riskiness of a frontier model, but the one to be discussed in what follows is CBRN, which is all about the model's ability to create or deploy (or assisting the creation or deployment of) non-AI weapons of mass destruction.

The Preparedness Framework report contains no concrete risk analysis of GPT-4, the company's current flagship LLM. A partial such analysis did however appear later, in the report Building an early warning system for LLM-aided biological threat creation released in January this year. The report is concerned with the B aspect of the CBRN risk factor - biological weapons. It describes an ambitious study in which 100 subjects (50 biology undergraduates, and 50 Ph.D. level experts in microbiology and related fields) are given tasks relevant to biological threats, and randomly assigned to have either access to both GPT-4 and the Internet, or just to the Internet. The question is: does GPT-4 access make subjects more capable at their tasks? It seems yes, but the report remains inconclusive.

It's an interesting and important study, and many aspects of it deserve praise. Here, however, I will focus on two aspects where I am more critical.

The first aspect is how the report goes on and on about whether the observed positive effect of GPT-4 on subjects' skills in biological threat creation is statistically significant.⁷ Of course, this obsession with statistical significance is shared with a kazillion other research reports in virtually all empirically oriented disciplines, but in this particular setting it is especially misplaced. Let me explain.

In all scientific studies meant to detect whether some effect is present or not, there are two distinct ways in which the result can come out erroneous. A type I error is to deduce the presence of a nonzero effect when it is in fact zero, while a type II error is is to fail to recognize a nonzero effect which is in fact present. The concept of statistical significance is designed to control the risk for type I errors; roughly speaking, employing statistical significance methodology at significance level 0.05 means making sure that if the effect is zero, the probability of erroneously concluding a nonzero effect should be at most 0.05. This gives a kind of primacy to avoiding type I errors over avoiding type II errors, laying the burden of proof on whoever argues for the existence of a nonzero effect. This makes a good deal of sense in a scientific community where an individual scientific career tends to consist largely of discoveries of various previously unknown effects, creating an incentive that in the absence of a rigorous system for avoiding type I errors might overwhelm scientific journals with a flood of erroneous claims about such discoveries.⁸ In a risk analysis context such as in the present study, however, it makes no sense at all, because here it is type II errors that mainly need to be avoided - because they may lead to irresponsible deployment and global catastrophe, whereas consequences of a type I error are comparatively trivial. The burden of proof in this kind of risk analysis needs to be laid on whoever argues that risk is zero or negligible, whence the primary focus on type I errors that follows implicitly from a statistical significance testing methodology gets things badly backwards.⁹

Here it should also be noted that, given how widely useful GPT-4 has turned out to be across a wide range of intellectual tasks, the null hypothesis that it would be of zero use to a rogue actor wishing to build biological weapons is highly far fetched. The failure of the results in the study to exhibit statistical significance is best explained not by the absence of a real effect but by the smallness of the sample size. To the extent that the failure to produce statistical significance is a real problem (rather than a red herring, as I think it is), it is exacerbated by another aspect of the study design, namely the use of multiple measurements on each subject. I am not at all against such multiple measurements, but if one is fixated on the statistical significance methodology, it leads to dependencies in the data that force the statistical analyst to employ highly conservative¹⁰ p-value calculations, as well as to multiple inference adjustments. Both of these complications lead to worse prospects for statistically significant detection of nonzero effects.

The second aspect is what the study does to its test subjects. I'm sure most of them are fine, but what is the risk that one of them gets a wrong idea and picks up inspiration from the study to later go on to develop and deploy their own biological pathogens? I expect the probability to be low, but given the stakes at hand, the expected cost might be large. In practice, the study can serve as a miniature training camp for potential bioterrorists. Before being admitted to the study, test subjects were screened, e.g., for criminal records. That is a good thing of course, but it would be foolish to trust OpenAI (or anyone else) to have an infallible way of catching any would-be participant with a potential bioterrorist living in the back of their mind.

One possible reaction OpenAI might have to the above discussion about statistical aspects is that in order to produce more definite results they will scale up their study with more participants. To which I would say please don't increase the number of young biologists exposed to your bioterrorism training. To which they might reply by insisting that this is something they need to do to evaluate their models' safety, and surely I am not opposed to such safety precautions? To which my reply would be that if you've built something you worry might cause catastrophe if you deploy it, a wise move would be to not deploy it. Or even better, to not build it in the first place.

Footnotes

1) See the twin blogposts (first, second) by Zvi Mowshowitz on the Gemini incident for an excellent summary of what has happened so far.

2) Regarding the difficulty of AI alignment, see, e.g., Roman Yampolskiy's brand-new book AI: Unexplainable, Unpredictable, Uncontrollable for a highly principled abstract approach, and the latest 80,000 Hours Podcast conversation with Ajeya Cotra for an equally incisive but somewhat more practically oriented discussion.

3) As with so much else happening in the AI sphere at present, Zvi Mowshowitz has some of the most insightful comments on the Superalignment announcement.

4) I say "so-called" here because the policy is not stringent enough to make their use of the term "responsible" anything other than Orwellian.

5) The third main competitor, besides OpenAI and Anthropic, in the race towards superintelligent AI is Google/DeepMind, who have not publicized any corresponding such framework. However, in a recent interview with Dwarkesh Patel, their CEO Demis Hassabis assures us that they employ similar frameworks in their internal work, and that they will go public with these some time this year.

6) Wisely, they emphasize the preliminary nature of the framwork, and in particular the possibility of adding further risk dimensions that turn out in future work to be relevant.

7) Statistical significance is mentioned 20 times in the report.

8) Unfortunately, this has worked less well than one might have hoped; hence the ongoing replication crisis in many areas of science.

9) The burden of proof location issue here is roughly similar to the highly instructive Fermi-Szilard disagreement regarding the development of nuclear fission, which I've written about elsewhere, and where Szilard was right and Fermi was wrong.

10) "Conservative" is here in the Fermian rather than the Szilardian sense; see Footnote 9.

torsdag 11 januari 2024

Video talk about the Grace et al surveys on what AI researchers think about AI risk

There is much disagreement out there regarding what future progress in AI may have in store for us, including AI risk. What could then be more natural than asking AI researchers what they think? This is what Katja Grace and various collaborators have done in a series of surveys in 2016, 2022 and 2023. The results are interesting, partly in unexpected ways. Back in 2017, Katja gave an excellent talk about the first of those surveys, at a Gothenburg meeting that I hosted. Today I recorded a video lecture in which I comment on her findings back then and in the later surveys, with particular emphasis on whether and how much we can trust these AI researchers to make sound judgements about our future with AI:

fredag 5 januari 2024

Video talk on OpenAI's Preparedness Framework

Here's my latest video lecture on what OpenAI does regarding AI risk and AI safety, following up on the ones from December 2022 and March 2023. This time I focus on the so-called Preparedness Framework that they announced last month, and criticize it along lines heavily influenced by Zvi Mowshowitz writing on the same topic.

torsdag 21 december 2023

Om öppen källkod i Ny Teknik

Öppen källkod är bra! Det tycker i alla fall informatikforskarna Johan Magnusson och Claire Ingram Bogusz, at döma av deras debattartikel i Ny Teknik i förrgår. I många fall - rentav de flesta - håller jag med dem om den öppna källkodens förfräfflighet, men i vissa AI-sammanhang är öppen källkod tvärtom riktigt riktigt illa. Detta fick mig att ta till pennan och författa en replik, vilken igår publicerades i samma tidning under rubriken Öppen källkod för AI-modeller är en samhällsrisk, och inleds på följande vis:

Ny Teknik

Ändå är det fel, menar jag, att på Magnussons och Ingram Bogusz vis ensidigt och onyanserat hylla den öppna källkodens tillämpning på AI-området. Om öppen källkod tillämpades på de största och mest kraftfulla AI-modellerna – som exempelvis Open AI:s GPT-4, Anthropics Claude 2, och Google Deepminds Gemini – och på de ännu starkare AI som kan väntas släppas under 2024, skulle det innebära allvarliga och enligt min mening oacceptabla samhällsrisker.

AI är nämligen inte som andra tekniker.

Läs den spännande fortsättningen på min replik i Ny Teknik!