Panscript is the germ of a proposal to reinvent the language of the web. It replaces the current babel of HTML, CSS, javascript, PHP, SQL and all the rest with a single language for creating, processing and delivering content, whether on a digital device or in print. Given recent developments in the use of HTML as a desktop environment, even coding the entire user experience in a single language appears feasible. Since Panscript began life on the old Linux Format wiki (now archived at https://web.archive.org/web/20160413203438/http://www.linuxformat.com/wiki/index.php/Panscript ), HTML5 has borne out many of my worst fears over the arbitrary and broken code salad that is today's web. I just know that we could - and should - be doing better than this.
Much the same applies to ICT systems generally, so my ideas have now expanded to the most grandiose proportions, with The Pan System reinventing the entire field from the ground up.
Issues with this page: I need to split off the specification as a separate page and add a more comprehensive table of contents to both.
Contents
Modern web pages are a horrible mix of languages with widely differing grammars and syntaxes. For example a typical client page contains:
<element property1="foo" property2="bar" ... > content here </element>
element.class {property1:foo; property2:bar; ...}or embedded in the HTML as:
style="property1:foo; property2:bar; ...;"
instruction1; instruction2 { lots of stuff inside curly brackets }
Then, the source page actually stored on the server may use a different language again, which essentially tells the server how to create the page it actually serves. Such languages include:
Meanwhile it is quite common to want a document available in all of hardcopy print, PC desktop and mobile online formats.
The original content itself may be created in a variety of formats, some open ones being:
Often these source documents will contain features (headers, footers, page breaks and numbering, page cross-references, deep linking to the author's host operating system and such) which are unsuited to web pages. Likewise, HTML and friends contain links and stuff which are not suited to print publishing. Re-purposing content can involve a great deal of effort. Few things in life are more infuriating than being presented with the mobile view on a desktop PC which does not have a touchscreen.
Even in its own small world of web page coding, HTML5 sucks. It is a dog's breakfast introduced by an agonised W3C trying to impose the dogmatic separation of reality into semantics, content and style. In practice they have been comepelled to muddle them up and, through their own sense of religious crusade, refused to admit that the job is botched and vacuous. To take just one example from my linked article; we have CSS (style) positioning introduced to avoid using HTML (semantic) tables for layout. Yet on the one hand audio readers for the blind cope far better with table-based layout while on the other we also have CSS (style) based table (semantic) presentation, ostensibly for when the table semantics have been lost but in truth a simple reversal of the heresy - using style to mimic semantics. It's a horribly self-contradictory piece of hypocritical spaghetti logic and HTML5 is infested with the stuff.
It's all horrible! Uh, sorry, did you catch that? It's horrible! All of it!!
One day I got fed up with the prospect of learning yet more awful languages, one after another - HTML, XHTML, javascript, CSS, PHP, ... just to get a web page looking and working the way it should even when printed out. I thought, one language to do it all - why not? All you would need to do is learn one language, dump your stuff on the page and go. So that's what panscript is all about - one language for creating, processing and delivering page content, whether on the web or in printed documents or, hopefully, other access media such as speech synthesis and Braille.
Clearly, this language will be very rich - probably as complex as many human languages. But it should bring major benefits, such as:
Other anticipated features include:
For example an author could first learn the styling and layout markup, then move on to client-side scripting or page print formatting, already able to understand the basics of the code and needing only to expand their vocabulary a little. Or an application developer could start designing the page layout, again finding they have many of the skills already in place.
XML was the community's first stab at much of this, but it suffers from being long-winded and repetitive (full of word salad) to the point of incomprehensibility - thus defeating the original reason for having all those words in the first place. Also, because it is essentially an interpreted language, processing it is a slow business - and repeatedly processing all that repetitive word salad is badly clogging up the world of web services. It's time to move on.
Many of the existing languages have some great features we wouldn't want to lose:
cellpadding
property for Tables - doing that in CSS without endless repetition
requires complex class definition. And frames provide a
rudimentary form of transclusion - they may have been politically
incorrect since the last Century but, for some reason known only to
those of us who like to get things done, they just won't go away. Sheesh! How we have to cherry-pick to get a decent feature set.
Different kinds of language have different ideas about how to organise information.
So one problem is how to respect and interact with all four kinds of model. If we want recursive hierarchies, for example calling a script from within a stylesheet that is itself within another script, then we will need to be very clear about these things.
A particular problem occurs with the concept of style, where the w3c community have got in a pretty muddle:
We need to understand the differing natures of all these kinds of "style", and how to relate to them. For now, notice that some come closest to "markup", others to "programming".
Another problem revolves around the identifying of different types of structural element. Here are some examples:
Back in the bad old days there came a time when most presentation formats, such as postscript pages or Word documents, had source code that was at best barely human-readable.
HTML was created as a human-readable presentation language. XML was intended to bring this goodness to more media formats by using richer human-readable markup, and was described by its designers as a "verbose" language. Sadly it went too far, causing two unfortunate effects:
Yet any practical web language must be human-readable. This is because the humble text editor is just too darn useful ever to die - it will always be an important development tool.
So -- how to square the circle?
"Web 2" meant different things to different people.
To some it was the interactive web - blogs, wikis, instant messaging and YouTube. This is one of the main areas where a single development language, which is also a simple-to-learn markup language, can bring great benefits.
To others, Web 2 was the semantic web. There are two approaches to embodying semantics in web content:
Semantics can be multi-layered. Consider an XHTML (strict) element with some microformatted information, such as this fragment from an imaginary article on Sherlock Holmes:
<h3><span class="address">221b Baker Street</h3></span>
The "h3" tag provides semantic information about the place of the fragment within the article - it is a second-level heading. Meanwhile the "address" value provides semantic information about the place of the fragment within Sherlock Holmes' life. Now, suppose we want to add some arbitrary third kind of semantic, say that the address is fictional, or that further information is available to subscribers, or ... . We need a general, extensible semantic framework whose syntax does not distinguish between levels in the way that XHTML tends to.
Also, we want to avoid the risk of snowballing complexity. Suppose I want to add markup compatible with both a "semantic web" XML standard foo and a popular microformat vBar. Here is another imaginary fragment which is trying to do this:
<p class="address"> <foo:FOO xmlns:foo="http://www.w3.org/2009/04/21-foo-syntax> <foo:address class="vBar"> <foo:line1 class="firstline">221b Baker Street</foo:line1> </foo:address> </foo:FOO> </p>
But will the separate standards be able to recognise or ignore each other's markup as required? Will a vBar engine find its marker inside a foo markup tag? How do we indicate that the paragraph class references a CSS stylesheet, while the foo:address class references the vBar microformat? And so on. Oh, and fancy debugging that load of spaghetti? Call it human-readable? I don't!
So it would be nice to define a single "right" way of doing things. One thing seems clear: all those semantic elements and namespace uris wished on us by XML are just horrible (and the more they get used, the more they weigh web services down with the massive processing overload of all that verbiage). Adding semantics as properties of a single element is far neater. here's something more like the kind of structure I envisage:
<p class="address" foo="address.line1" vBar="address.firstline">221b Baker Street</p>
And if you really want to link with your XML based web services, then you can knock up a nice XSLT on your application server and install a couple more CPU's, can't you? :-p
No language, no matter how powerful, will ever exist in isolation. It will always have to interact with other languages. So it must be made possible, even easy, to embed "islands" in a foreign language structure, and likewise to embrace foreign islands.
I wanted a good, memorable name. I also wanted it to start with 'P', so that the "LAMP" architecture could adopt it seamlessly (dream on!).
Pan was the ancient Greek god of shepherds and their flocks, so who better to name my language after than someone who looks after lots of similar things. (Useless factoid: Pan was also the god of popular music. At the end of Kenneth Grahame's book The wind in the willows, Pan makes an appearance as the piper at the gates of dawn. Rock fans will immediately recognise the title of the first Pink Floyd album.)
The prefix Pan-, meaning "all", is also ancient Greek, so panscript has a neat double-meaning.
This needs a collaborative effort. I have neither the time nor the skills to do it all myself.
I don't intend to go very deep until the basic idea and syntax have been thrashed out. Think of this page as a kind of working whiteboard.
Things to sort out with the draft spec:
[[ rogue material
[contained element] ]]
. Need to iron these out. Foundational concepts include:
By "human" is meant the casual content creator as much as the skilled system developer.
The re-use of modular code leads to the idea of a document as a collection of disparate fragments, stitched together by some common framework. So the highest-level constructs, and where we need to start, are those that create this common framework: a language for stitching fragments into.
The syntax and grammar have to be universal, to work whether a fragment is written in the same language or another one, whether it is embedded in the framework or linked to as an external resource, and whatever kind of stuff that fragment might be.
The parallel with XML data islands should not be missed, nor should the need to be a good deal more readable and write-able.
The generic semantic is therefore just an overall container in which different sub-languages can be written (somewhat similar to the concept in XML, but it needs to be a bit more flexible). To date I anticipate such subsets for the three main presentation formats; text markup, vector graphics and maths equations, and for programmatic scripting. Presentational styling may end up as another, but I am hoping it can simply be a common grammar and vocabulary across the three presentational languages. To date I have begun notes on two of these.
Pantext is a text markup language suited to hand-coding of anything from a simple note to a web page to A publishable-quality illustrated document.
I have begun sketching out the basics without reference to the present panscript syntax. The idea is to then compare the two approaches and pick the best features to create a unified language.
Pangraph is a vector graphics language suited to hand-coding and for parsing to svg by a wiki or similar server.
As with pantext I have begun sketching out the basics without reference to the present panscript syntax - or to pantext - so that I can come back and merge the best of each.
The fundamental building block of Panscript is a plain text file called a module or script.
Character encoding must be Unicode. It should be UTF-8 unless otherwise specified in the file type.
Universal constucts are confined to a basic Latin character set.
To aid in readability, non-Latin character sets are expected to be incorporated, for purpose-specific code, in locales where the human population use non-Latin alphabets and keyboards.
Modules are re-usable - any module will probably invoke many other modules.
A script contains a hierarchy of elements, or objects.
Just to give a flavour, here is a simple "Hello world" example:
[p0.1/My First script; Copyright Guy Inchbald, 2007. Licensed under the GPLv3. [[ [m [[Hello world]] ] ]] My First script]>
The detailed syntax borrows a little from the MediaWiki idea of repetitive key strokes, preferably unshifted. The general syntax for a Panscript element comprises a sequence of entities:
[class/id; properties [[content]] id]
where the entities are defined as:
[...]
(square brackets) are the opening and closing tags which bound the element. /
(forward slash) is a separator between the element's class and name, or identifier, while ;
(semicolon) is the separator between the id and the properties. All
elements and nested elements within the script are assumed to be
numbered sequentially, starting from 0 (zero). The whole of /id;
is optional - if it is omitted then the sequential number is used as the id. [[...]]
. It may be empty, or may contain the basic information expected (text, code, etc), and/or may contain one or more nested elements. If it is empty and there is no closing id, then the [[ ]]
may be omitted (or, to put it another way, if [[ ]]
is omitted and the closing id is present, then the closing id will be treated as a property). In general, any sequence of white space characters (spaces, tabs, returns) acts as a simple separator, as if it were a single space character. Exceptions occur for certain kinds of text content. Where the Panscript syntax shows no space between entities, white space may be freely inserted, for example the following is equally valid:
[class /id ; properties [[ content ]] id ]
An alias is an alternative id for an element or class. For
example if something called thingummajig.whatsit exists, then we might want to create thing as an alias for it. Then, every time we need to reference the thingummajig.whatsit, we need only write thing
.
This allows our code to be human-readable, but not to run away with the word salad problem.
One or more aliases may be established for any element or class. The default id for any element is its sequential number in the script. Any other id provided is effectively an alias for this number.
Some aliases are reserved (predefined), others may be user defined.
Text markup code always needs an escape system so that it is possible to include reserved code characters like [ and ] in text.
It is tempting to reserve \
for the
single-character escape as in \[
and \]
, including escaping a \ character as in \\
.
To escape a string, approaches to consider are:
\\\
escaped string goes here \\\
. "Passive" escape tags in the
string would still need to be actively escaped, as \\
for a single backslash and \\\\\\
for three consecutive backslashes. The odd number - 3 - of backslashes ensures that it cannot be confused for a series of escaped backslashes. But what happens if we meet say \\\[
- is this the string escape with [ as the first character in the string, or an escaped \ followed by an escaped [ ? One possible answer - for a string beginning with a reserved character, add two further backslashes: \\\\\[...
.
>
to
escape >
. Effective, but clunky if needed a lot and there are all those codes to learn. \\\20
escapes the next 20 characters.There are (provisionally) five top-level classes of element (kinds of stuff) that a script can contain:
Anything outside the script definition will be ignored. This allows a script to be embedded in other kinds of language.]
The first element in any script is its definition. This element contains all the others. The syntax for the definition uses the following values:
[panscriptversion/Name of script; properties [[''content'']] Name of script]
Where:
pversion
.
Where a script is embedded in another language it may or may not be
possible, or wise, to do this. For example, here is an empty script (i.e. with no content):
[p0.1/Empty script; Copyright Guy Inchbald, 2007. Licensed under the GPLv3. Empty script]
This is general media content (text, graphic, etc. maybe eventually audio and stuff) to be rendered by the viewing agent.
[media/id; format [[''content'']] id]
Where:
m
Here is a very simple "Hello world" example:
[m [[Hello world]]]
To create a functional script we put it inside a top-level object, something like:
[p0.1/Example script; Copyright Guy Inchbald, 2007. Licensed under the GPLv3. [[ [m [[Hello world]]] ]] Example script]
I may come back and define some sub-classes such as media-text, media-image (aliases t and i respectively). Who knows.
I don't know a lot about programming languages in general, but here's what seems to be a workable approach:
[executable/id; parameters [[''instructions'']] id]
Where:
x
Data is stuff that is available for other elements, such as executables or content, to draw on.
[data/id; format [[''data'']] id]
Where:
d
Comments are indispensible for adding helpful explanations and for hiding unused stuff.
[comment/id; comment]
Where:
c
, as in [c comment]
.
Note that a comment has no content entity, and is effectively an empty element. Any nested elements within the comment entity will be ignored. Any closing id will also technically be treated as part of the comment field, thought his does not matter. Thus, in the following element, the "[[ ]] id" is all treated as contained within the comment field.
[comment/id; comment [[ ]] id]
This is a bit unsatisfactory, as it breaks a basic rule of grammar about the [[ ]]
container. But it is necessary, since commenting-out blocks of code will often place such brackets in the comment area. Well, it's not strictly necessary, but writing [c [[comment]]]
every time would be more of a pain than [c comment]
.]
Many sub-classes of the high-level classes will be needed. The syntax is simply:
class.subclass
You may wonder why there are so few top-level ones. For example there are likely to be sub-classes such as javascript, css, image, heading1 and so on. Why are these commonplace things not high-level classes themselves? Wouldn't that make them easier to type, too?
Well, firstly, we can distinguish for example x.javascript
from m.javascript
and c.javascript
. The first of these will
be executed, the second treated as rendered media (text) content (very useful for tutorials on javascript!) and the third is commented-out. So the developer can plug code in and out, try
it out and present it to the reader and so on, and move from one mode to another just by changing a single character in the code.
Secondly, using aliases we can create javascript or ecma or js or whatever as an alias for executable.javascript. So when we want to add some javascript we don't have to write <script class="javascript"> ... </script>
or even [x.javascript [[ ... ]]]
but simply [js [[
... ]]]
.
So along with the many standard sub-classes that will be needed, there will probably be even more standard aliases.
There might be a need to create further sub-subclasses, such as media.heading.2
or media.list.ordered
, and so on. Again, aliases make such things manageable.
Links to other objects - Panscript modules or anything else - are embedded in one of the main element types. Not yet sure whether they go in as properties or content, or either depending on their purpose.
The Panscript language is designed so that paths such as high/medium/low/lower/nearly reached me/hello
blur away the structural implementation - which is the filename, which the script definition, etc. I have a gut feeling that this is a Good Thing, but need to flesh the principle out a bit.]
A typical code object such as a web page or a script may contain several languages. Where multiple Panscript modules are embedded in such a page, each script definition must have an explicit and distinct id. Otherwise they would all default to "0", and it would not be possible to find any given script.
Going to script 0 will always find the first script in the page. If there is only a single script in the page then you can get away with the default id, but this is not recommended in case you later come back and add another script before it.]
These are embedded as the content of an appropriate kind of high-level module.
Specifying the language might be done as a property of the module, or as a sub-class. Haven't thought about this yet.
Updated 11 July 2022