.NET Compiler Platform from A to Z
One could encounter different situations where it becomes necessary to write one’s own code compiler, interpreter or analyzer for a programming language. Creation of compilers and interpreters is believed to be an “aerobatics” in programming, whilst the creation process itself is seen as very complicated and time consuming. However, the .NET platform has had tools existing quite for a long time, which greatly simplify this task.
What we had before Roslyn came
The .NET Framework can compile a source code without Visual Studio installed on the machine. The .NET Framework (starting with version 2.0) includes command line compilers csc.exe and vbc.exe. These compilers can be used to build .NET applications from any text file containing C# or Visual Basic source code. The compilers are run from the command line. The command line compiler parameters enable you to:
- Set the name of the compiled file (/out);
- Collect console applications (/target:exe);
- Collect applications with graphical interface without using a console (/target:winexe);
- Collect dynamically linked libraries (/target:library);
- Add references to external assemblies (/r);
- Write command-line arguments for the *.rsp file and specify the name of the rsp file as the command-line argument (@file.rsp).
The csc and vbc parameters perfectly handle the task of compiling a source code contained in one file. But MSBuild is used for the more complex tasks of compiling and assembling projects. Moreover, Visual Studio files *.csproj, *.vbproj and *.vcxproj serve as XML codes for MSBuild. Visual Studio uses MSBuild to build projects. In addition, MSBuild can be called from the command line or from a .NET application code via APIs.
It is also possible to generate a low-level MSIL code using System.Reflection.Emit. You can also go for dynamic code generation for .NET programming languages using CodeDOM, and then compile the generated code with the help of code providers (for example, CSharpCodeProvider, which is an add-in over the csc compiler).
All the approaches listed above were being used for code generation before the emergence of the .NET Compiler Platform, better known as Roslyn.
Roslyn is a collection of open-source compilers, code analysis and refactoring tools which work with C# and Visual Basic source codes. This set of compilers and tools can be used to create full-fledged compilers, including, first and foremost, source code analysis tools.
The History of Roslyn
The name “Roslyn”, the new platform for compiling a source code, was first written by Eric Lippert, a former Microsoft employee, when he started to recruit developers for a new project. Lippert named the compiler in honor of Roslyn, a suburb in Washington.
The first version of Roslyn was released in October 2011 as a part of Community Technology Preview (CTP) – an extension for Visual Studio 2010 SP1. The update of CTP in September 2012, despite the large scale, was not very successful. It had the so-called “breaking changes” – changes in Roslyn components, which could potentially crash other components. Besides, not all the features of the CTP APIs were implemented for C# and Visual Basic languages.
At its Build conference in April 2014, Microsoft announced Roslyn as an open source project, and also implemented a way to integrate Roslyn in Visual Studio 2013. Since then, Roslyn has been distributed under the Apache 2.0 license. However, even by then, not all Roslyn features were implemented – there were plans for deployment in C# 6.0 and Visual Basic 14.0.
Starting with 2015 version, Visual Studio uses Roslyn to compile and build its own projects. However, to date, Roslyn only supports two languages – C# and Visual Basic.
In January 2015, Microsoft moved Roslyn source code to GitHub.
Installing Roslyn
To date, Roslyn has remained a part of Visual Studio 2015 and is installed together with it. Roslyn is a part of Visual Studio 2017 as well. It has been released in March 2017.
However, Roslyn is not included in the .NET Framework. Even in the .NET Framework 4.6 version, the traditional csc.exe and vbc.exe compilers are included. This is done for it to be compatible with previous .NET Framework versions.
To install Roslyn compilers without installing Visual Studio, you need to download and install Microsoft Build Tools. Roslyn can also be downloaded from Github, then you can compile and get binary files csc.exe and vbc.exe, which can be accessed from the command line.
APIs for Roslyn compilers
Most of the existing traditional compilers come as “black boxes”, which “magically” convert the source code into an executable file or library. Unlike them, Roslyn allows you to access each stage of the code compilation and application creation process via its own APIs.
Together with compilers, other “black boxes” are often supplied – integrated development environments (IDEs) that can enable you to increase the development speed with convenient tools, such as code highlighting, Intellisense, refactoring tools, performance analysis tools (profilers) and other complex tools. Roslyn takes over these features and also provides an API to them. Moreover, with Roslyn, the developer can work with the compiler from his own application, using the compiler as a service to:
- Generate code in C# and Visual Basic (like CodeDOM);
- Analyze code;
- Refactor code;
- Use C# and Visual Basic as script languages, interpreting instead of compiling the code. Roslyn APIs are represented by three sets (Figure 1).
Fig. 1 – Roslyn APIs
The compiler APIs allow you to get an object model of processes that occur at each stage of the compilation process, regardless of the Visual Studio components installed (Figure 2).
Fig. 2. Compiler APIs
The Roslyn compiler pipeline is represented by four phases, each of which has its own object representation:
- The parser displays information in the form of a syntax tree;
- The symbol declaration phase displays a hierarchical symbol table;
- The binding phase returns information in the form of semantic analysis results;
- The emitting phase provides APIs for generating low-level code in MSIL language (similar to what System.Reflection.Emit does).
Language services use these APIs to perform their own functions. For example, code highlighting uses a syntax tree, while an object browser uses a hierarchical symbol table.
Roslyn diagnostic APIs allow you to handle errors and warnings that occur at all the compilation stages. Roslyn also allows you to process errors through analysis tools written by the user.
Scripting APIs allow executing C# or Visual Basic code without compilation – something similar to the REPL interactive environment in Perl, Python, Haskell, Erlang, and others.
Workspace APIs gives direct access to the application’s object model in the compiler without parsing the source code files for the second time. The APIs also allow for projects tuning, management of project dependencies, source code generation without using Visual Studio components.
Syntax trees
The syntax tree is the basic structure used by Roslyn for compilation, code analysis, binding, refactoring, code generation and other operations. Roslyn syntax trees have three key properties:
- They contain all the source information, such as grammatical constructs, tokens, directives, comments and even whitespaces – all this information is contained in the syntax tree;
- The syntax tree or its part can be converted back to the source code – you can build syntax trees and generate code from them, you can edit the syntax tree and it will generate a corrected code;
- They are thread-safe and protected from changes. This means that you will not be able to directly change the data in the syntax tree. The tree completely reflects the state of the source code at the time of construction.
These three important attributes of the trees allow you to work with the syntactic structure of the source code, including in custom projects, accessing it through APIs. These properties have also greatly simplified complex refactoring operations, and this happens naturally without direct code editing but only by editing the syntax tree. Each syntax tree consists of the following elements:
- Syntax Nodes – they represent complex syntactic constructs, such as declarations or expressions;
- Syntax Tokens – they represent the simplest constructs for constructing syntax nodes. Syntax tokens consist of, for example, an identifier or operator;
- Syntax Trivia – it represents parts of the source text that are mainly insignificant for the compiler, such as comments, directives or whitespace;
- Spans display positions within the source text of each node, token or trivia, and its length;
- Kinds identify the syntax unit in the tree;
- Errors are processed in the syntax tree in two ways: either by inserting the expected token, or by adding a token that is unknown to the compiler as a trivia.
Semantic model and Workspace APIs
Unlike syntax trees that represent the structure of source code, semantics is the logic in the source code and all its constructs. It includes declarations of variables, classes, objects, fields, methods, function calls and passing parameters to them, types of operands and operation results, and operator priorities. Semantic analysis of source code checks the code (or syntax tree in Roslyn) for compliance with the rules of the language. Semantic model provides the following information about the source code:
- Semantic symbols: source elements or elements imported from libraries (types, methods, properties, fields, events, etc.);
- Resulting type of expression;
- Diagnostic data: errors, warnings, exceptions, etc.
Workspace APIs represent the object model of solutions, projects in solutions and documents in projects. All the objects and methods listed above can be called from any .NET application working with Roslyn as a service and using Roslyn APIs.
Working with Roslyn: samples
There are so many examples of working with Roslyn. Here are some of them:
- The function of Roslyn syntax analyzer is to create syntax trees and source code of them;
- Github samples;
- Scripting APIs samples;
- Syntax visualizer;
- Roslyn source code in Github.
Future development of Roslyn
Roslyn will be developed further in two important areas: creation of new features and improving existing algorithms. The following are expected among the qualitative improvements of algorithms:
- Increasing the performance and speed of algorithms in the compiler platform;
- Creating a new implementation of PDB Writer with big parallelism when writing text to a PDB file;
- Increasing the test coverage with the help of new testing tools;
- Eliminating Roslyn’s dependence on the full version of .NET Framework so that Roslyn could be deployed, for example, on WinRT.
Some of the features of Roslyn compilers are still considered experimental and are being tested publicly. Others that have already been implemented can be improved – performance, speed and quality of work can be enhanced. Still others associated with the new functionality require a decision by Microsoft and the .NET Foundation community to be taken first before intensive development and implementation could start. Here are some of the ways to improve the following versions of Roslyn compilers:
- New features for programming languages
C# 6.0 and Visual Basic 14.0 (more); - APIs for creating XML documentation from code comments;
- Improvement of diagnostic APIs for synchronous code analysis in the process of writing it. For example – identifying and indicating errors and warnings while writing code without running it for compilation;
- Increasing the performance of code analyzers via Roslyn APIs;
- Increasing the number of rules for static code analysis tool FxCop;
- Creating APIs for writing custom static code analyzers;
- Modifying the semantics of some expressions for scripting languages
(C# Script and VB Script); - Improving REPL interface – interactive environment windows for programming within command line interface tools;
- Improving APIs for working with scripting languages
(C# Script and VB Script); - Increasing the performance of FindAllReferences operation;
- Improving the algorithms for finding conflicts when renaming.
Some more piece about Roslyn
Despite the large number of flaws, the Microsoft’s new compiler platform Roslyn is gaining popularity, and it’s no accident. Roslyn is one of the few compilers that give you the opportunity to observe all the compilation and assembly stages, access any intermediate results and internal compiler constructs, as well as use various language services of the compiler, refactoring and diagnostics tools. Due to the wide interpretation options inherent in Roslyn, the C# and Visual Basic have become scripting languages. Despite its relatively small history, Roslyn is already being used in large projects, such as IDE Visual Studio 2015, static code analyzer PVS-Studio, and cross-platform framework .NET Core. It is also used as an alternative to script system Windows PowerShell. In the future, the number of such projects will only increase.
Some life hacks on the use of Roslyn
Roslyn provides a huge set of tools for building your own compilers, code analyzers, interpreters and scripting languages. A significant shortcoming of Roslyn is that it only works with two programming languages: C# and Visual Basic. However, Roslyn makes it easier to create your own language on the .NET platform. In this case, you only need to translate the code into C# or Visual Basic, or create a syntax tree, and then use Roslyn compiler APIs to build a full-fledged application on the .NET platform. Another option is to run the generated code for execution (interpretation) as a script. If you need to generate and compile a source code using C# as a scripting language, then the best solution is to use Roslyn compiler APIs. If you do not like the source code analyzers built into Visual Studio, then Roslyn APIs could enable you to create your own. You can even create your own IDE, using the features of this compiler platform and connecting it as a service to your project.
Roslyn is not just another Microsoft compiler – it is an off-the-shelf framework, which you can use to create your own source code tools. Roslyn gives .NET developers many new features. It is a great tool that helps you to write your own compiler, interpreter or analyzer for a programming language. We advise you to study how the compiler works for it would simplify your tasks. We are interested in Roslyn because it can be used to create your own programming language on the .NET platform.