This manual describes RECODER-CS, the port of the RECODER framework for C# language (RECODER originally worked on JAVA files).
In this description we will suppose that you already know how to write programs using RECODER, and will only be focusing on the changes and differences that have been made to the program. We will not explain C# terms and definitions either.
This document is not a real manual. It only explains the differences between RECODER-CS and RECODER. We will try to explain briefly, what has been changed and how (mostly describe what is not possible, and what is possible). We will not go into details, since there were a big number of changes, and there is still a lot of todos.
If you want to get more detailed information about these changes you need to
look at the sources. We have marked all codepieces that have beem disabled or
are incomplete. Disabled codes are commented with //
and
are always marked with a // DISABLED
comment and a description
of the reason. If there is a todo item, you will find a // TODO
comment before the critical section. If you find code commented without any
tag: it should not be our change (possibly comes from RECODER).
RECODER-CS is a modification of RECODER, but is is neither compatible, nor interoperable with RECODER. This means that
recoder
as the main namespace). However: During the modifications we were trying to make as little changes as possible, in order to maintain maximal compatibility with RECODER. This means that you will not have to change much in your programs, since RECODER-CS is almost API-compatible with RECODER.
RECODER-CS can parse C# code, build up an AST, and run a semantical analysis on it. It can show you what classes are available, what methods and fields they have, handle variables and resolve references to variables, fields and methods. Sources can be pretty printed and transformed (in a limited way).
The abilities of Recoder-CS are still limited. The biggest limitation is that
you can not parse code which contains preprocessor directives, unsafe extensions,
or is not available in source form.You can use transactions on the program
model, but you cannot unroll/revert/undo them. Also the kits in recoder.kit
are still pretty incomplete. Also the semantical analysis has still some bottlenecks
(see later).
You must also consider that RECODER-CS implements a parser for the ECMA C# language, and not some vendor specific extension.
Just like RECODER, RECODER-CS is also based on a central program model of the sources, and service modules, which can manipulate, analyze, build and update that model.
These services have almost the same API, as in RECODER, except the services,
which were working on bytecode - those have been removed. Some additions have
been made to SourceInfo
to deal with the new types and constructs
of C#.
In the following sections, we will just briefly describe each service of recoder, and the changes we made to them.
Source file repositories are responsible for loading and saving compilation units, loading classes and etc.
First of all: RECODER-CS supports neither bytecode parsing, nor loading
classes by reflection (which would be impossible to implement, since we
program in JAVA
...)
Another important change has been done to the source file repository as well. Since compilation units in C# may have multiple public class declarations with multiple namespaces it is impossible to find files (compilation units) by their class names. The way we solved this problem was that whenever the source file repository is created, it parses each and every compilation unit available in the input path.
This means however that you must have the sources of the core library (e.g. System.Object, System.String), etc available in the input path to get the source info working normally. In the CVS you can find the corlib of the MONO project, which shall be usable for this purpose.
The input path may be (should be) specified by the input.path
system property, the environment variable CLASSPATH
is ignored.
To set the input path property, you either have to use the -D
parameter
at program start, or
System.getProperties().put("input.path","<whatever>")
at the beginning of your program, or alternatively, you can use the
ServiceConfiguration.getProjectSettings().getSearchPathList().add("<whatever>")
method. As usual, the path must be specfied in a list, separated by a semicolon (;) or a colon (:) according to your platform (Windows/UNIX).
In some cases the source file repository may die, if it finds an error
(for example an unresolved reference) while loading the classes in the input
path. This is because the repository uses its own error handler at initialization,
and this error handler terminates at the first error. A workaround is that you
only place correct files into the input path and read the ather files
only later using ServiceConfiguration.getProjectSettings().getSearchPathList().add("<filename>")
.
Before those classes will be loaded you should replace the DefaultErrorHandler
with your own handler, which tolerates errors. Note, that this "bug"
should have been fixed...
ProjectSettings
does not want to ensure anymore that system classes
are in path. It is your responsibility to add the source of those to the input
path.
When creating a new compilation unit in a transformation, you have to create
its DataLocation
first. (This is because the repository can not
decide by itself where the file belongs to.)
Currently, the repository loads all compilation units, and runs the TypeFinderVisitor
to collect the types in that compilation unit. But it runs it for each type
separately. We should add a cache here.
The changes in the recoder.io architecture made necessary some changes in the
ServiceConfiguration
classes. Because the SourceFileRepository
now parses all the classes on initialization, we needed to include
InitializeException.
ServiceConfigurations
. For
tasks that do not require semantical analysis, and special source repository
handling (like pretty printing), it is now possible to define ServiceConfiguration
s
that only create the ProgramFactory
and the ProjectSettings
services. We use such a configuration in our test suite. This allows pretty
printing of sources with incomplete semantical information (e.g. missing references
too). TODO: Clean up the structure of the ServiceConfiguration
s.
The parser is still an LL-parser generated by JAVACC.
Since we could not find a grammar for C# (the only available ECMA grammar was left-recursive and so not suitable for JAVACC) we have decided to derive the parser from the JAVA parser instead of implementing the ECMA specification directly. The two parsers (although they seem to be similar) have a very similar set of rules, but inside they basically have not much in common.
Using the same parser as in RECODER allowed us a big reuse of the tree classes
already written for JAVA (classes in recoder.java
). You should
though not forget that almost all files have changed a little bit, and so the
JAVA and C# classes are not compatible with each other.
At some places (because of the left-recursiveness) we had to use a pretty big lookahead, which might make the parser a bit slower, than the JAVA parser. We don't think that these problems can be solved by any LL parser, so we let it as it is.
There were a big number of changes, which we can not all count. You should see the API documentation and/or the parser grammar for reference. (Since this part is also very weakly documented in RECODER, these are your only hopes....) Here we only mention the most important changes.
C# allows associating metadata stored in the attributes of some program elements.
The interface AttributableElement
is implemented by all elements,
which can have attributes. On those, you can use getAttributeSectionCount()
and getAttributeSectionAt()
to get the attribute sections of the
element. Then on the attribute sections you can use getAttributeCount()
and getAttributeAt()
to get the attrbiutes defined by the section,
and the attributes you can decompose as well (see the API-doc, it is very straightforward).
You can also read the AttributeTarget
of the attribute section
(those modifiers are stored in recoder.csharp.attributes.modifiers
).
Note: there is no semantic analysis on attributes (you can not obtain the meaning of the metadata), only type references are resolved. This is because interpreting the metadata would require a knowledge of all system attributes, and so would be a lot of work.
Existing expressions have not been changed. Introduced new operators like CheckedOperator
,
UncheckedOperator
, TypeofOperator
, AsOperator
.
Literals have been changed to implement the ReferencePrefix
interface,
since in C# 123.ToString()
is a valid expression (this is called
boxing and it is resolved by the semantical analysis).
ArrayLengthReference
is obsolete. In C# arrays are boxed to the
System.Array
type when used as a prefix. The type System.Array
then has a Length
property (among others).
A new reference is the UncollatedMethodCallReference
(a
subclass of UncollatedReferenceQualifier
), which is created instead
of the MethodReference
. This is needed, because you cannot distinguish
a delegate call from a method call in the plain analysis. TheUncollatedMethodCallReference
will be resolved by the semantical analysis, and replaced by either MethodReference
or the DelegateCallReference
(also a new class).
Multidimensional arrays have been added to the model. In RECODER dimensions
of an array have been stored in an integer (a[][][]
was stored
as dimension 3), C# however makes difference between real multidimensional arrays
and arrays of arrays (as in JAVA). So you can write something like a[,][]
which is not equivalent to a[][,]
(although dimension is 3 in each
cases).
The concept we used for storing the new dimensions was to use an array of integers
instead of a single integer. (Another possible solution would have been to introduce
an type reference to a type, whose basetype is also an array - this would have
been more complex). With our solution a dimension of a[][,,][,]
maps to an integer array of int[1][3][2]
, while the single expression
a
has the dimension of either null
, or int[0]
(int array with length 0) - both is possible.
This kind of dimension mapping was used in FieldDeclaration
s and
VariableDeclaration
s, TypeReference
s.
In ArrayReference
(array indexing operator) we used however an
other solution. The reason is that here you can also use expressions for indexing
the reference. Here, we really use references to array references, and we have
added dimensions too. This was needed, because here we also had to store the
expressions of the indexes too. So a[3,2][4,5][6]
maps to an array
reference with dimension 1
, expression "6"
,
base type of an array reference with dimension 2
, expressions "4"
,"5"
,
basetype of an array reference to the basetype a
, dimension 2
,
expressions "3"
,"2"
. Basically
we nest the references into each other.
In C# there is no longer a relation between namespace (package), compilation unit and assembly (library).
PackageReference
is now NamespaceReference
(which
is by the way more logical) and PackageDeclaration
was replaced
by NamespaceSpecification
. When declaring namespaces you must be
aware of the fact that C# uses a completely different semantics to declare namespaces.
In C# namespaces are not implicitly specified, but there can be multiple NamespaceSpecification
s
in a unit. These specifications can be put inside each other, and every specification
may have its own imports (called usings). So you can write something like
using x.y; namespace a { namespace b.c { using z; namespace d { class A {} } } }
Here A is in the namespace a.b.c.d
. While the three NamespaceSpecification
s
only have names "a"
, "b.c"
, and
"d"
. To make life more easier there is a method getFullName()
,
which returns the full name of the namespace (for example "a.b.c"
for the second namespace). Todo: you can make a more performing (but
less robust) implementation of this method by caching the full name.
An other problem was with the different structure of C# compilation units and JAVA compilation units.So, when working on the AST you should consider this.
C# has also support for namespace and type aliases called using-alias (e.g.
using ws = System.Web.Services
makes a namespace alias). There
is no support for these in the semantical analysis yet, but it is not too difficult
to add support for those.
In RECODER there was an assertion that types are either primitive types or
array types, or class types; which is true in JAVA, but not true in C#. So,
we had to introduce a new level of abstraction called declared type (DeclaredType
),
which represents a type that is declared in the program. Class types are declared
types which can have members, but enums and delegates are also type declarations,
but they don't have members. So we have introduced TypeDeclaration
and ClassTypeDeclaration
abstract classes. TypeDeclaration
implements DeclaredType
only, while ClassTypeDeclaration
implements ClassType
.
However, we had to leave the member declarations in the TypeDeclaration
,
instead of pushing them down into ClassTypeDeclaration
becaus there
would have been too many modifications, and enums (which are not class types)
can also have fields (but not methods). This "cheat" however is hidden
from the outside world, since only ClassTypeDeclaration
s have methods
to report members.
Inheritance was also a problem, since C# makes no syntactic difference between inheritance and implementing an interface.
A new class type is the StructDeclaration
, which declares
a struct
. There are some semantical differences between classes
and structs, but in the AST the only difference is that a struct can have no
destructor, and can not inherit from other classes. These constraints are currently
not checked by the parser, which means that you need to check at the semantical
analysis.
Two new classes are the DelegateDeclaration
and EnumDeclaration
.
Delegates are types for methods, which have a defined signature (parameters). A variable with a delegate type can have a number of methods assigned to it, and these methods can be invoked by using the variable as if it was a method.
Enums have members (with optional initializers) and a basetype. Enum members
(EnumMemberDeclaration
/ EnumMemberSpecification
)
behave exactly like fields, therefore they extend FieldDeclaration
and FieldSpecification
. The initializers are not checked for semantic
correctness by the parser, but by the type analysis.
About enum members you also have to know that - since enum members are always
declared one-by-one without a type declaration - there is always one
specification and one declaration for each member. And since EnumMemberDeclaration
s
have no type implicitly given they return the basetype of the enum as their
types (which is true). An eleganter solution could have been to introduce only
EnumMemberDeclaration wich could act as a declaration and a specification, but
we found it too complicated (the semantical analysis should have been rewritten).
In the future RECODER-CS and RECODER shall be refactored so that they no longer make differences between specifications and declarations (since this is senseless in JAVA and in C#). Then we could also correct this problem.
Fields and methods are the same.
Properties are special fields with accessors (baseclass Accessor
)
those you can get with the getGetAccessor()
and getSetAccessor()
method. As at the enums, the same rule applies here too: since properties are
fields they have a declaration and a specification subclassing the field declarations
and specifications, but C# allows only one specification at a time. So again,
there is always one PropertyDeclaration
with one PropertySpecification
.
Events are also fields with a type of a delegate, and supporting some
"advances" operations. Therefore they have EventDeclaration
and EventSpecifications
. However (to make life a bit harder) C#
also allows events to be defined like properties. In this case we have one EventDeclaration
and exactly one EventSpecification
with two accessors of course.
For convenience in the AST we made no difference between normal and property-like
events (maybe we should have?), but stored the two accessors in EventDeclaration
.
(If it is a normal event these accessors are null of course.) We might want
to change this hack in the future.
Operator overloads overload operators. Instead of implementing many
classes (e.g. PlusOperatorOverload
) we have one field indicating
the type of the operator. We consider this a better solution, since we can use
a switch
statement instead of a bunch of instanceof
operators. And again: here is also no semantical check (e.g. we should check
that a binary operator has exactly two arguments one of which must have the
same type as the class itself).
Indexers are basically overloads of the array ( [] ) operator, but they
differ from the operators, so we gave them an own class (IndexerDeclaration
).
The abstract model is pretty good self-explaining. Looking at the following figure shall make it clear (click to enlarge)
Methods, fields are the same, but as methods don't report their exceptions
in C#, getExceptions()
is deprecated, and returns null
.
The most important change was to introduce the DeclaredType
as
an abstraction for delegates, enums and class types (which now also includes
structures), and changing classes and namespaces so that they can hold declared
types instead of class types.
New are the abstract interfaces Enum
, Delegate
, OperatorOverload
,
Indexer
, Event
and Property
, they are
very self-explaining. However currently there are no methods to report these
more specific members, you can only use the usual getMethods()
and getFields()
methods to get them. Events and properies are
reported as fields, operators and indexers as methods. In a similar way:
you can use the getDeclaredTypes()
method to get all enums, delegates
and class types in a class or namespace, but you can not query them one-by-one.
The DelegateConstructor
class is needed, because a variable with
delegate type must be also constructed (imagine this as a default constructor
of a delegate) with a method as parameter, e.g.
public delegate void MyMethod (int a); class X { MyMethod mm; // Field with delegate type public m(int a) {} // This is a method which will be bound public void Main(string [] args) { mm = new MyMethod(m); // This is not a real constructor! mm(1); // Calls m(1) }
So here, there is a virtual call to a delegate constructor which has as parameter the name of the variable.
These services are used by the abstract model to synthetise information.
Name analysis (DefaultNameInfo
) can now handle the new
primitive types, deal with C#'s namespaces, load types from the source file
repository.
Source analysis (DefaultSourceInfo
). Works, except for
a few "limitations". First, there is no support for operator overload
(this is mostly the method getType(Expression)
), secondly there
is no using-alias support. Thirdly the problem with the delegates: in the previous
example an access to the delegate mm
(which is bound to
method m)
also means, that m
has a reference
in the method Main.
Visibility handling. Yes, here are also problems: C# has visibility
modifier internal, which is visible to members of the same assembly. However,
the components of an assembly are not determinable until compile time, what
means, that we don't know which classes belong together. Therefore we assume,
that internal
equals public
for convenience.
There is support for the new primitive types, the two type-aliases (object
and string
). Boxing is resolved when you use a variable as a reference
prefix. There is no boxing support, when you interpret expressions (again method
getType(Expression)
)
DefaultConstantEvaluator
should also handle the new primitive
types (unsigned int for example), and their new literals, such as "123ul
"
(unsigned long). However there is possibly a design issue with the latest feature,
since unsigned longs will be mapped as supercharged JAVA longs, and interpreted
as a negative number.
The classes DefaultConstantEvaluator
and the method getType(Expression)
in DefaultSourceInfo
are incomplete. Also there are some methods
which need to be reviewed, whether they match also the C# specifications not
only the JAVA specs.Check the // TODO
tags in the sources
to get more detailed information.
You can do transformation directly on the AST (e.g. insert/remove/replace a
node), using the attach()
and detach()
methods in
ChangeHistory
. You can write your own transformations by subclassing
the Transformation
class, and using the attach(foo, bar)
methods. These methods are the same as in JAVA, and so you cannot attach the
new classes of C# (e.g. can not attach an EnumDeclaration
directly).
It may be possible to attach those too, if the attached node has a superclass
which can be attached. For example you do can attach an EnumDeclaration
using the attach(TypeDeclaration)
method, since EnumDeclaration
is a TypeDeclaration
.
You can do the partial parsing feature in CSharpProgramFactory
using the parseWhatever()
methods.
There is limited support for the higher level transactions in the recoder.kit
packages:
CommentKit
is senseless, since it creates JavaDOC comments.
ExpressionKit
should be usable, since there is not much difference
between the expression logic in C# and JAVA.NameGenerator
was not changed. (Seems to contain nothing language
dependant). ModifierKit
can now create C# modifiers. Constants no longer
come from recoder.bytecode.AccessModifier
, but are kept in the
Modifiers
interface. NamespaceKit
can only be used to create namespace references.
StatementKit
is unchanged, but shall work mostly since C#'s
statements and JAVA statements are very similar. We don't know if it can deal
with get and set accessors.VariableKit
was not changed, but should be usable.UnitKit
contained a lot of methods managing imports (usings),
some of them is rewritten, some of them became senseless, since C# only has
multiple imports. Please note that transformations are still untested, use them with care and lot of testing.
The complete transformations (recoder.kit.transformation
) were
removed.
These files have not been touched. Suggested is therefore that you use them with care.
Here are some examples, which demonstrate how you can use RECODER-CS.
The pretty printer is effectively the Hello World application of the RECODER-CS.
import java.io.File; import java.io.FileReader; import java.io.IOException; import java.io.PrintWriter; import java.io.Reader; import java.io.Writer; import recoder.DefaultServiceConfiguration; import recoder.ParserException; import recoder.convenience.Naming; import recoder.csharp.CompilationUnit; import recoder.csharp.PrettyPrinter; import recoder.list.CompilationUnitList; import recoder.service.SourceInfo; public class ExamplePrinter extends PrettyPrinter { public static void main(String[] args) throws IOException, ParserException, Exception { DefaultServiceConfiguration sc = new DefaultServiceConfiguration(); ExamplePrinter epr = new ExamplePrinter(sc, new PrintWriter(System.out)); CompilationUnitList list = sc.getSourceFileRepository().getCompilationUnits(); epr.printCompilationUnits(list); } private DefaultServiceConfiguration serviceConfiguration; private SourceInfo sourceInfo; // cached public ExamplePrinter(DefaultServiceConfiguration sc, Writer out) { super(out, sc.getProjectSettings().getProperties()); this.serviceConfiguration = sc; sourceInfo = sc.getSourceInfo(); } public void printCompilationUnits(CompilationUnitList cus) throws IOException { for (int i = 0, s = cus.size(); i < s; i += 1) { CompilationUnit cu = cus.getCompilationUnit(i); printCompilationUnit(cu); } } public void printCompilationUnit(CompilationUnit cu) throws IOException { String name = cu.getDataLocation().toString(); System.out.println("Visiting compilation unit:" + name); visitCompilationUnit(cu); getWriter().flush(); } }
First, we create the ServiceConfiguration
. In this case we use
the DefaultServiceConfiguration
. On initialization the service
configuration loads, parses and analyses every source file in the input path.
Then we instantiate our ExamplePrinter
class, which inherits from
the recoder.csharp.PrettyPrinter
class. This is the actual implementation
of the pretty printer using the Visitor design pattern.
Now we ask the SourceFileRepository
service to give us every compilation
unit in the model. Now we can use the pretty printer to print out the compilation
units.
Notes:
ServiceConfiguration
and the ProgramFactory
service to parse the given source files,
and feed the created compilation units to the PrettyPrinter
.String name = cu.getDataLocation().toString();
line assumes that every CU has an associated DataLocation
. This is the case
when we read them from files. However, programmaticaly generated CUs not always
have a Location.You can find more examples in the examples
directory.
The SyntaxPrinter
program reads and displays the AST of
the given source file. It can be used to debug and view the output of the parser.
The PlainAnalysis
program gives you information about all
classes in the input path. It writes you information about class members and
references. Looking at the source you will be able to understand how RECODER-CS
works.
The Sourcerer
program is the back-ported version of Sourcerer
from RECODER. It visualizes your classes and there members, you can see the
results of the analysis.
Here is what we have used for testing:
Well, there is still a lot to do. Some of them are here:
DefaultConstantEvaluator
and getType(Expression)
in DefaultSourceInfo
- This
will be worked on in the future. +=
.