Encoding, AKA character set, describes the way characters are encoded into bytes and vice versa. Famous standard encodings are “UTF-8” or “ISO-8859-1”. The list of available encodings can be determined by calling java.nio.Charset.availableCharsets(). There is also a list of encodings and their canonical Java names in the API docs.
Unfortunately, each platform and/or spoken language tends to define its own native encoding, e.g. Cp1258 on Windows in Vietnamese or MacIceland on Mac OS X in Icelandic.
In an Eclipse workspace, files, folders, projects can have individual encodings, which are stored in the hidden file .settings/org.eclipse.core.resources.prefs in each project. If a resource does not have an explicit encoding, it inherits the one from its parent recursively. Eclipse chooses the native platform encoding as the default for the workspace root. You can change the default workspace encoding in the Eclipse preferences Preferences->Workspace->Default text encoding. If you develop on different platforms, you should consider choosing an explicit common encoding for your text or code files, especially if you use special characters.
While Eclipse allows to define and inspect the encoding of a file, your file system usually doesn’t. Given an arbitrary text file there is no general strategy to tell how it was encoded. If you deploy an Eclipse project as a jar (even a plug-in), any encoding information not stored in the file itself is lost, too. Some languages define the encoding of a file explicitly, as in the first processing instruction of an XML file. Most languages don’t. Others imply a fixed encoding or offer enhanced syntax for character literals, e.g. \uXXXX in Java.
As Xtext is about textual modeling, it allows to tweak the encoding in various places.
The plug-ins created by the New Xtext Project wizard are by default encoded in the workspace’s standard encoding. The same holds for all files that Xtext generates in there. If you want to change that, e.g. because your grammar uses/allows special characters, you should manually set the encoding in the properties of these projects after their creation. Do this before adding special characters to your grammar or at least make sure the grammar reads correctly after the encoding change. To tell the Xtext generator to generate files in the same encoding, set the encoding property in the workflow next to your grammar, e.g.
Generator {
encoding ="UTF-8"
...
As each language could handle the encoding problem differently, Xtext offers a service here. The IEncodingProvider has a single method getEncoding(URI) to define the encoding of the resource with the given URI. Users can implement their own strategy but keep in mind that this is not intended to be a long running method. If the encoding is stored within the model file itself, it should be extractable in an easy way, like from the first line in an XML file. The default implementation returns the default Java character set in the runtime scenario.
In the UI scenario, when there is a workspace, users will expect the encoding of the model files to be settable the same way as for other files in the workspace. The default implementation of the IEncodingProvider in the UI scenario therefore returns the file’s workspace encoding for files in the workspace and delegates to the runtime implementation for all other resources, e.g. models in a jar or from a deployed plug-in. Keep in mind that you are going to loose the workspace encoding information as soon as you leave this workspace, e.g. deploy your project.
Unless you want to enforce a uniform encoding for all models of your language, we advise to override the runtime service only. It is bound in the runtime module using the binding annotation @Runtime:
@Override
public void configureRuntimeEncodingProvider(Binder binder) {
binder.bind(IEncodingProvider.class)
.annotatedWith(DispatchingProvider.Runtime.class)
.to(MyEncodingProvider.class);
}
For the uniform encoding, bind the plain IEncodingProvider to the same implementation in both modules:
@Override
public Class<? extends IEncodingProvider> bindIEncodingProvider() {
return MyEncodingProvider.class;
}
An XtextResource uses the IEncodingProvider of your language by default. You can override that by passing an option on load and save, e.g.
Map<?,?> options = new HashMap();
options.put(XtextResource.OPTION_ENCODING, "UTF-8");
myXtextResource.load(options);
options.put(XtextResource.OPTION_ENCODING, "ISO-8859-1");
myXtextResource.save(options);
The SimpleProjectWizardFragment generates a wizard that clients of your language can use to create model projects. This wizard expects its templates to be in the encoding of the Generator that created it (see above). As for every new project wizard, its output will be encoded in the default encoding of the target workspace.
The source code of the Xtext framework itself is completely encoded in “ISO 8859-1”, which is necessary to make the Xpand templates work everywhere (they use french quotation markup). That encoding is hard coded into the Xtext generator code. You are likely never going to change that.