Friday 20 December 2013

Groovy in the Cloud: sandboxing and more


The Scenario  


So you have a hosted product that allows users to enhance or customise functionality with Groovy scripts. They have an online script editor and a rich DSL making the scripts both expressive and readable.

The product is used for a wide variety of integration tasks.  Although the DSL has evolved over time, occasionally new projects require a more 'free form' scripting approach to deliver on requirements.

This means that you don't want to throttle the power and capabilities of Groovy too much or you'll lose the ability to deliver on ad-hoc requirements.

So you find yourself in a situation where you give your users a loaded gun and hope they don't shoot themselves (or you) in the foot!


You don't want to do that...  


You're not alone in adding a scripting capability to a product.  Take a look at Apache Camel or Jasper Reports, to name but two.  What makes this scenario especially dangerous is the ability of a user to enter script content via an online form and then have it execute on demand.
How we all laughed when we saw the Web App that allowed it's users to enter SQL expressions and pass them directly to the database for execution!
A pragmatic approach may be to execute each script in a separate process space; but then script/product integration would need to be conducted via remote interface(s).  And then there is the management of the processes and overhead of creation to consider and...

Alternative approaches aside, if you find yourself in a situation where you embed the Groovy script engine in your product (or it's container) and you'd rather not have it execute 'System.exit(0)' or something equally nasty, read on.


Sandboxing Groovy  


Groovy is a dynamically typed language and as such there is only so much validation you can performed at compilation time.  An effective sandbox must inspect code at runtime.  This does however come with a performance overhead.  An overhead we were willing to live with.


1.  Using the sandbox


1:      // The name of my script  
2:      String scriptName = "MyScript.groovy";  
3:    
4:      // Register a validator - using regex or explicit script name above  
5:      InterceptorRegistry.getInstance().register( scriptName, new TestValidator() );  
6:    
7:      // Add the custom scanning AST transformation  
8:      CompilerConfiguration config = new CompilerConfiguration( CompilerConfiguration.DEFAULT );  
9:      config.addCompilationCustomizers( new ASTTransformationCustomizer( ScriptScanner.class ) );  
10:    
11:      // Load the script text  
12:      GroovyShell shell = new GroovyShell( config );  
13:      String theScript = "println 'hello from MyScript';System.exit(0)";  
14:      GroovyCodeSource cs = new GroovyCodeSource( theScript, scriptName, "/groovy/sandbox" );  
15:    
16:      // Compile  
17:      Script parsedScript = shell.parse( cs );  
18:      // and run...  
19:      parsedScript.run();  

Using a singleton registry to install instances of InterceptorValidator.  Each validator is either registered against a script name or a pattern.  The validator is called before each script method is executed.


1:  public class TestValidator implements InterceptorValidator {  
2:    @Override  
3:    public boolean canInvoke( String sourceName, int lineNumber, String className, String methodName ) throws SandboxSecurityException {  
4:      if ( className.equals( "java.lang.System" ) && methodName.equals( "exit" ) ) {  
5:        throw new SandboxSecurityException( "System.exit() not allowed at line: " + lineNumber );  
6:      }  
7:      return true;  
8:    }  

You can return true to allow execution to continue, false to skip the method execution or throw a SandboxSecurityException.

Running the example you get the following output:
1:  013-12-20 09:54:12,935 TRACE [com.amalto.groovy.ScanTransform] (main) Visiting methods of: run (ScanTransform.java:31)  
2:  2013-12-20 09:54:12,939 TRACE [com.amalto.groovy.ScriptScanningVisitor] (main) Wrapping method call: println args: org.codehaus.groovy.ast.expr.ArgumentListExpression@7bafb0c7[ConstantExpression[hello from MyScript]] (ScriptScanningVisitor.java:55)  
3:  2013-12-20 09:54:12,943 TRACE [com.amalto.groovy.ScriptScanningVisitor] (main) Wrapping method call: exit args: org.codehaus.groovy.ast.expr.ArgumentListExpression@3e68cd79[ConstantExpression[0]] (ScriptScanningVisitor.java:55)  
4:  2013-12-20 09:54:13,332 TRACE [com.amalto.groovy.RuntimeScriptInterceptor] (main) ** Groovy Runtime ** (MyScript.groovy) Method: println Class: MyScript Line: 1 (RuntimeScriptInterceptor.java:32)  
5:  hello from MyScript  
6:  2013-12-20 09:54:13,357 TRACE [com.amalto.groovy.RuntimeScriptInterceptor] (main) ** Groovy Runtime ** (MyScript.groovy) Method: exit Class: java.lang.System Line: 1 (RuntimeScriptInterceptor.java:32)  
7:    
8:  com.amalto.groovy.interception.SandboxSecurityException: System.exit() not allowed at line: 1  
9:       at com.amalto.groovy.TestValidator.canInvoke(TestValidator.java:28)  
10:       at com.amalto.groovy.interception.InterceptorRegistry.isValid(InterceptorRegistry.java:32)  
11:       at com.amalto.groovy.RuntimeScriptInterceptor.invokeMethod(RuntimeScriptInterceptor.java:35)  
12:       at com.amalto.groovy.RuntimeScriptInterceptor$invokeMethod.callStatic(Unknown Source)  
13:       at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallStatic(CallSiteArray.java:53)  
14:       at <snip>  


2.  How it works


The addition of an ASTTransformationCustomizer at line 9 allows code injection (via the visitor pattern) during the SEMANTIC_ANALYSIS stage of compilation.

What this means is, as you 'visit' each expression you can wrap it with a proxy invoker.  It's the proxy invoker that contains the validadator call.

A custom ASTTransformer is standard stuff, however things became a little tricky when trying to resolve execution targets in the proxy invoker.  You need to resolve 'closure delegates' and for this I used a bit of magic suggested by the Groovy User List.  (Thanks guys)

The source code is available on GitHub.  It's still a prototype and I'd welcome suggestions and improvements.


What else?


I've concentrated so far on wrapping method calls providing simple white/black list validator support.  I could also wrap other operations in my sandbox such as property access; although I'd need to identify a 'dangerous' property access first.

However, picking up on what Jim Driscoll of Oracle suggested in his 'Groovy in the cloud' presentation; if you wrap both expressions and property access you then have the tools to provide a simple online debugger for Groovy!

I should stress at this point that I'm not suggesting any online debugger be as feature rich as a desktop client using the VM's JDI.

An online debugger that allows multiple users to debug scripts at the same time (something the VM's JDI does not support).  The online debugger would provide simple expression stepping and the ability to view all variables whose values have been derived by observation of property assignment.

What's next... the creation of a new ASTTransformationCustomizer for use when debugging online.

Links I found useful:


A great presentation by Jim Driscoll of Oracle:
http://www.slideshare.net/jimdriscoll/groovy-in-the-cloud

Kohsuke Kawaguchi sandbox:
http://groovy-sandbox.kohsuke.org/

Using Java security policy to sandbox Groovy:
http://blog.datenwerke.net/2013/06/sandboxing-groovy-with-java-sandbox.html
http://www.sdidit.nl/2012/12/groovy-dsl-executing-scripts-in-sandbox.html

Interesting stuff but I did not pursue this approach as I found it tricky to configure and deploy in our container.



The code discussed in this article is available from GitHub here: