Spanish English French German Italian Portuguese
Social Marketing
HomeGeneralCybersecurityOpenAI improves security and gives the board veto power...

OpenAI improves security and gives board veto power over risky AI

OpenAI is expanding its internal security processes to defend against the threat of potentially risky or harmful AI. A new “security advisory group” will sit above the technical teams and make recommendations to leaders — and to the board, which has been given veto power. Of course, and given the precedents, whether you will actually use it is another question entirely.

Typically, the ins and outs of policies like these don't need coverage, as in practice they amount to a bunch of closed-door meetings with obscure roles and streams of responsibility that outsiders will rarely be privy to. While that's likely true here as well, the recent leadership fight and evolving debate over AI risks warrant a look at how the world's leading AI development company is addressing security considerations .

in a new document y entry in your blog OpenAI discusses its updated “Readiness Framework,” which likely received a bit of a shakeup after the November reorganization that removed the two most “slowdownist” board members: Ilya Sutskever (still at the company in a somewhat different role). changed) and Helen. Toner (completely out).

The main goal of the update seems to be to show a clear path to identify, analyze and decide what to do about the "catastrophic" risks inherent in the models they are developing. This is how they define it:

“By catastrophic risk we mean any risk that could generate hundreds of billions of dollars in economic damage or cause serious harm or death to many people; This includes, among others, existential risk (existential risk is something along the lines of 'machine rise').

Models in production are governed by a team of “safety systems”; this is for, say, systematic abuses of ChatGPT that can be mitigated with restrictions or API tweaks. Models in development have the “preparation” team, which attempts to identify and quantify risks before the model is released. And then there is the “super alignment” team, which is working in theoretical guides for “superintelligent” models, which we may or may not be close to.

The first two categories, being real and not fictitious, have a relatively easy to understand rubric. Their teams rate each model in four risk categories: cybersecurity, “persuasion” (e.g., disinformation), model autonomy (i.e., acting on its own), and CBRN (chemical, biological, radiological and nuclear threats, e.g. the ability to create new pathogens).

Various mitigations are assumed: for example, a reasonable reluctance to describe the process of making napalm or homemade bombs. After taking into account known mitigations, if a model is still assessed as having a “high” risk, it cannot be deployed, and if a model has “critical” risks, it will not be developed further.

Example of evaluating the risks of a model using the OpenAI rubric.

These risk levels are actually documented internally, in case you're wondering if they're at the discretion of some engineer or product manager.

For example, in the cybersecurity section, which is the most practical of them, it is a “medium” risk to “increase the productivity of operators… in key cyber operation tasks” in a certain factor. On the other hand, a high-risk model would “identify and develop proofs of concept for high-value executions against protected targets without human intervention.” Crucially, “the model can devise and execute novel end-to-end strategies for cyber attacks against protected targets, as long as only a high-level desired objective is achieved.” Obviously we don't want that to come to light (although it could sell for a considerable sum).

Asking OpenAI about how these categories are defined and refined, for example whether a new risk, such as a fake photorealistic video of people, falls under "persuasion" or a new category, has not yet received a response.

Therefore, one way or another only medium and high risks should be tolerated. But the people who make those models aren't necessarily the best ones to evaluate them and make recommendations. For that reason, OpenAI is creating a “Cross-Functional Security Advisory Group” that will sit at the top of the technical side, review expert reports, and make recommendations that include higher-level insight. Hopefully (they say) this will uncover some “unknown unknowns”, although by their nature they are quite difficult to detect.

The process requires that these recommendations be sent simultaneously to the Board of Directors and leadership, which we understand to mean CEO Sam Altman and CTO Mira Murati, as well as their lieutenants. Leadership will make the decision on whether to send him or freeze him, but the board will be able to reverse those decisions.

This will hopefully short-circuit something similar to what was rumored to have happened before the big drama: a high-risk product or process getting the green light without the board's knowledge or approval. Of course, the result of said drama was the sidelining of two of the most critical voices and the appointment of some money-minded guys (Bret Taylor and Larry Summers) who are smart but nowhere near experts in artificial intelligence.

If a panel of experts makes a recommendation and the CEO decides based on that information, will this friendly board really feel empowered to contradict them and hold them back? And if they do, will we find out? Transparency is not actually addressed beyond the promise that OpenAI will request independent third-party audits.

Let's say a model is developed that guarantees a "critical" risk category. OpenAI has not been shy about bragging about this sort of thing in the past: talking about how tremendously powerful They are your models, to the point of refusing to release them, is great advertising. But do we have any kind of guarantee that this will happen if the risks are so real and OpenAI is so worried about them? Either way, it's not mentioned.

RELATED

Leave a response

Please enter your comment!
Please enter your name here

Comment moderation is enabled. Your comment may take some time to appear.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

SUBSCRIBE TO TRPLANE.COM

Publish on TRPlane.com

If you have an interesting story about transformation, IT, digital, etc. that can be found on TRPlane.com, please send it to us and we will share it with the entire Community.

MORE PUBLICATIONS

Enable notifications OK No thanks