Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Language Design: Force unsigned Bitwise Operations And Precedence of Operations
#1
So I've been working on a compiler-parser set for small scripting language for the A-Engine (all character per character. no cheap tokenizers or any string search techniques involved B) ). It's going very well so far. been learning a lot reinventing wheels.

So yes, while I worked on this, I used a lot of bitwise operations to manipulate flags (AND to check, OR to switch on, XOR to switch off, xANDx-1 to check if POT ..etc), and I figured it wouldn't hurt to get those operators in the scripting language. The thing is (I'm really not sure whether I am doing all this properly or not), I got some sort of dynamic typing into this (types are Number, String and None), but since the scripting language is for the A-Engine, a game engine, I don't think people would ever need to do bitwise stuff besides flag manipulation (if that would even be necessary). Now, my Number type is really just float; so for my bitwise operators, I just cast both operands to int then back to my type to return it as a result. I was wondering if I shall cast into unsigned int instead, so that bitwise NOT on 1 returns 0 and not -2 as a result of NOTing the sign byte. By doing so however, I'll have to either not support using bitwise operations on negative numbers or simply just leave out the leftmost byte. I personally am leaning toward just letting the cast be to int, and those who want bitwise NOT to convert 1 to 0 and vice versa would use the logical NOT (!). Suggestions?

The next point involves the precedence of operators. So one of the weird things that hit me was the annoying decision of giving bitwise operator like AND and NOT a lower precedence than most equality related operators (==, >, <..etc). Later I realized: I wasn't the only one, this was only in C++, and that there isn't really a standard order everyone follows; e.g: python's list is very different from C++. So yeah, I thought I could try coming up with one myself to make things more intuitive :P I used C++'s as a base, and the first thing to change was of course that. I also did stuff like give bitwise OR and XOR the same precedence, and added a couple of operators like python's exponentiation (**). Hopefully, there weren't something I overlooked, or was there?

Lastly, I'm considering trying to be very flexible with operand types. e.g:
- "text * number" is 'text' repeated 'number' times. I know this can be done in python.
- "text + number" is length of 'text' + 'number'.
- "text - number" is length of 'text' - 'number'.
- "text / number" is length of 'text' / 'number'.
-etc..basically any attempt to do arithmetic on text that is not otherwise supported will use the length of the text operand(s) instead.
Is this a wise decision? Or should I stick to the classic "unsupported operand types" error messages?

Thanks!
[Image: signature.png]
A-Engine: A new beat em up game engine inspired by LF2. Coming soon

A-Engine Dev Blog - Update #8: Timeout

Reply
Thanks given by:
#2
Regarding bitwise operations:
The best solution I can think of is to introduce a new type, Bitfield that has bitwise operations defined for it, but not any arithmetic operations (so you cannot add them for example). It would be stored as a single 32-bit unsigned integer. Casting between floating points and integers is silly, and way more costly than necessary for no gain, so definitely do not do that.
Enumerations are definitely also worth considering for the added type safety.

Regarding operator precedence:
It is a historical issue (http://programmers.stackexchange.com/que...omparisons), and as a result I would not be so afraid to change it. The only place it might cause trouble is if you want to port some code from the scripting language to C++, where it might be surprising.
Regarding other operators I like the way python does comparisons (
23<x<42
in python is equivalent to
23<x && x<42
in C++), so that would be nice. I would also forbid different logical/bitwise operators operators to appear on the same level (so
a&b|c
is forbidden,
a && b || c
is also forbidden, but
a&b == 0 || c&d&e == 42
is fine).

Regarding text:
Please do NOT do that. The first one is not so bad, but the latter are particularly bad because they have different semantics to the first, despite having really similar syntax, but in general I just find that combining very different types leads to confusion in many cases and the alternative is not really that bad:
    C++-Code:
"text".repeat(4); // "texttexttexttext"
"text".length() //4
"text"+to_string(42)
Or whatever similar syntax you prefer. Every single time I have seen some language try to infer programmer intent it has always gone wrong in one way or another.
Age ratings for movies and games (and similar) have never been a good idea.
One can learn a lot from reinventing wheels.
An unsound argument is not the same as an invalid one.
volatile in C++ does not mean thread-safe.
Do not make APIs unnecessarily asynchronous.
Make C++ operator > again
Trump is an idiot.
Reply
Thanks given by: A-Man
#3
Thank you very much for your time! I read your reply last night, but sadly didn't have time to reply to it.

Someone else Wrote:The best solution I can think of is to introduce a new type, Bitfield that has bitwise operations defined for it, but not any arithmetic operations (so you cannot add them for example). It would be stored as a single 32-bit unsigned integer. Casting between floating points and integers is silly, and way more costly than necessary for no gain, so definitely do not do that.
I didn't think about the overhead of conversion I admit, and I really like the idea. I've been thinking how I'd represent the bitfield type, and how does a prefix 'b' sound?
200
is a number while
b200
is going to be a bitfield.

(02-01-2016, 08:19 PM)Someone else Wrote:  Enumerations are definitely also worth considering for the added type safety.
You mean implementing enumerations? I'm not sure how would that work with dynamic typing. Plus, there are some other concerns I haven't addressed with giving people freedom to create values like enumerations do. For example, It's more important that people should be able to use expressions in a markup language freely without having to enclose it with anything unnecessary.

Quote:Regarding other operators I like the way python does comparisons (
23<x<42
in python is equivalent to
23<x && x<42
in C++), so that would be nice.
Oh, will do!

Quote:I would also forbid different logical/bitwise operators operators to appear on the same level (so
a&b|c
is forbidden,
a && b || c
is also forbidden, but
a&b == 0 || c&d&e == 42
is fine).
I'm not sure about this, especially the logical && and ||. I've gotten used to || always being processed after &&, which comes handy like
a && b || c && d
is
(a && b) || (c && d)
, and is also kind of intuitive. As in real life, you don't hear confusing statements like "get apples or oranges and peaches or peers" very often, but "get milk and biscuits or cola && potato chips" is easy to interpret without much ambiguity. Perhaps it might different for people, but I'd still not forbid it; just maybe let it spew a warning message in a text file log (?). Bitwise AND and OR though, I feel OR should get the higher precedence. Since it's common for you to want to check whether your variable contains a multiple number of flags, like so:
value & flag1|flag2|flag3|flag4
. But maybe, for the sake of everyone's sanity, I can just give bitwise AND, OR and XOR the same precedence such that the first comes, first evaluated. I can spew a warning there too if they're on the same level if you insist it should.

Quote:Regarding text:
Please do NOT do that. The first one is not so bad, but the latter are particularly bad because they have different semantics to the first, despite having really similar syntax, but in general I just find that combining very different types leads to confusion in many cases and the alternative is not really that bad:
    C++-Code:
"text".repeat(4); // "texttexttexttext"
"text".length() //4
"text"+to_string(42)
Or whatever similar syntax you prefer. Every single time I have seen some language try to infer programmer intent it has always gone wrong in one way or another.
lol, yes sir. I will use methods or anything similar for an alternative.

I really appreciate your insight! Thanks again.
[Image: signature.png]
A-Engine: A new beat em up game engine inspired by LF2. Coming soon

A-Engine Dev Blog - Update #8: Timeout

Reply
Thanks given by:
#4
(02-02-2016, 08:04 AM)A-Man Wrote:  I didn't think about the overhead of conversion I admit, and I really like the idea. I've been thinking how I'd represent the bitfield type, and how does a prefix 'b' sound?
200
is a number while
b200
is going to be a bitfield.
Not sure.
b200
could also be a variable. Maybe a postfix although I would say enumerations would still be the better solution.

(02-02-2016, 08:04 AM)A-Man Wrote:  You mean implementing enumerations? I'm not sure how would that work with dynamic typing.
Simple. Don't implement dynamic typing. Like inferring programmer intent it is something I have never seen actually work in practice.
If you don't want people to write variable types implement type inference (
auto my_var = my_func();
in C++11).

(02-02-2016, 08:04 AM)A-Man Wrote:  Plus, there are some other concerns I haven't addressed with giving people freedom to create values like enumerations do. For example, It's more important that people should be able to use expressions in a markup language freely without having to enclose it with anything unnecessary.
I am not entirely sure what you meant here. You are making a scripting language, not a markup language, right?

(02-02-2016, 08:04 AM)A-Man Wrote:  I'm not sure about this, especially the logical && and ||. I've gotten used to || always being processed after &&, which comes handy like
a && b || c && d
is
(a && b) || (c && d)
, and is also kind of intuitive. As in real life, you don't hear confusing statements like "get apples or oranges and peaches or peers" very often, but "get milk and biscuits or cola && potato chips" is easy to interpret without much ambiguity. Perhaps it might different for people, but I'd still not forbid it; just maybe let it spew a warning message in a text file log (?). Bitwise AND and OR though, I feel OR should get the higher precedence. Since it's common for you to want to check whether your variable contains a multiple number of flags, like so:
value & flag1|flag2|flag3|flag4
. But maybe, for the sake of everyone's sanity, I can just give bitwise AND, OR and XOR the same precedence such that the first comes, first evaluated. I can spew a warning there too if they're on the same level if you insist it should.
I did not even know the precedence of any of these operators, and I will surely forget it again, because I always use parenthesis around them anyway. You might have an easy time remembering their precedence, but code is much more important to be readable, and the vast majority of people don't, in fact to such a degree that Clang and GCC have implemented warnings for this. Again the alternative (adding parentheses is the code) only costs two extra characters, but it is guaranteed to be unambiguous to anyone reading it.
The same thing goes with bitwise operations the price is two extra characters, for clear unambiguity which I consider a small price to pay.
I would also say that a lot of people would expect
|
and
&
to have the same precedence as
||
and
&&
. Also consider that it is much easier to go from a stricter syntax to a less strict syntax than vice versa since you break less code (in this case you would break nothing if you were to switch later).
Giving them the same precedence also seems bad as I cannot think of any practical situations where that is preferred.
Age ratings for movies and games (and similar) have never been a good idea.
One can learn a lot from reinventing wheels.
An unsound argument is not the same as an invalid one.
volatile in C++ does not mean thread-safe.
Do not make APIs unnecessarily asynchronous.
Make C++ operator > again
Trump is an idiot.
Reply
Thanks given by: A-Man
#5
(02-02-2016, 03:01 PM)Someone else Wrote:  Not sure.
b200
could also be a variable. Maybe a postfix although I would say enumerations would still be the better solution.
Actually, variables have a '$' prefix, or '@' for "system variables" (those which the engine will be changing, like the position of the player..etc). 'b' as a post-fix is what I'm going with it seems, as 'b' as a prefix usually denotes base 2 in other languages.


Quote:Simple. Don't implement dynamic typing. Like inferring programmer intent it is something I have never seen actually work in practice.
If you don't want people to write variable types implement type inference (
auto my_var = my_func();
in C++11).
What about Python, or Javascript (this uses 'var', but yeah, still dynamic)? They seem to have gotten that with no problems at all. I still really want to avoid a keyword for that. It's supposed to be a simple scripting language anyway, and I really care about its simplicity.

(02-02-2016, 08:04 AM)A-Man Wrote:  I'm not entirely sure what you meant here. You are making a scripting language, not a markup language, right?
It's a scripting language, but there is the equally important markup language. The markup part is like LF2 .dat files', except that the tags can use expressions. Think of it as using HTML with Javascript, except for this, the script will be something users can use to do more hardcore stuff. Possibly as an option to write AI too. You can have a look here for an example.
(the "$l." prefix refers to local variables inside the script, "$g." for global variables shared between all objects, "$p." for permanent variables which last beyond a game session (save data/settings/..etc) or simply '$' for variables local to the object only, including all the scripts inside/imported).

Fair points. I had grown tired from parenthesis jumbles before I learnt to break down long conditions into smaller parts.

Quote:I would also say that a lot of people would expect
|
and
&
to have the same precedence as
||
and
&&
. Also consider that it is much easier to go from a stricter syntax to a less strict syntax than vice versa since you break less code (in this case you would break nothing if you were to switch later).

[quote]
Giving them the same precedence also seems bad as I cannot think of any practical situations where that is preferred.
Because it's intuitive I think? I mean before we learnt about Exponents-Division/Multiplication-Addition/Subtraction (mistakenly taught to younger students as the "BODMAS rule"), I'd do everything in order; 5 - 5 * 3 equals to 0, and hell to all those teachers who argued! But yes, the point is, I believe people not familiar with stuff like bitwise operators will expect them to run in order. Those who have some experience in programming will just parenthesisize their way through. I'd consider.

While I personally think, and I'm sure you agree, that the premature syntax errors when extensive pay off with saving you time you'd spend fixing more troublesome semantic errors, I also think that overdoing it by rubbing lots of error messages on people's face puts them off. No language I've used has forbade that before, so I'd rather not as well. Though it seems a warning message will be necessary.

on a side note: I always laugh when I see you've added one other statement to your signature, as they're often directly calling to me. I've asked before why one wouldn't use NULL, and you gave me some valid reasons, but why not rand? Is it because RAND_MAX differs across compilers and stuff? What if I redefine it to what I want, or just cycle it around the range I want with %?

Thanks!
[Image: signature.png]
A-Engine: A new beat em up game engine inspired by LF2. Coming soon

A-Engine Dev Blog - Update #8: Timeout

Reply
Thanks given by:
#6
At this point I have stated the main reasons I have. I would say though that warnings should generally be considered as errors anyway, and teaching anyone to just ignore warnings is bad.

(02-02-2016, 06:47 PM)A-Man Wrote:  on a side note: I always laugh when I see you've added one other statement to your signature, as they're often directly calling to me. I've asked before why one wouldn't use NULL, and you gave me some valid reasons, but why not rand? Is it because RAND_MAX differs across compilers and stuff? What if I redefine it to what I want, or just cycle it around the range I want with %?
Redefining
RAND_MAX
first of all results in undefined behaviour, so don't do that, but in practice if you did it would not change the return value of
rand()
, it would just make your code more error prone because the two are no longer related to each other.

Secondly
rand()
is not thread-safe so calling it from multiple thread will require you to synchronize your call to it (through a mutex or similar), which is unnecessarily slow and error prone.

Thirdly modulus is also bad. Concider:
    C++-Code:
//RAND_MAX is 32767
cout << rand()%100 << endl;//non-uniform [0,67] is more likely than [68,99]


Lastly there are better means:
    C++-Code:
#include <array>
#include <random>
#include <iostream>
 
using namespace std;
 
template <typename T>
mt19937 CreateMersenneTwister(T &rng){
    array<random_device::result_type,8> Seed;
    for(auto &e:Seed) e = rng();
 
    seed_seq SeedSeq(Seed.begin(),Seed.end());
    return mt19937(SeedSeq);
}
 
int main(){
    random_device rd;//hopefully non-deterministic CSPRNG, though not the case on MinGW which sucks
    auto mt = CreateMersenneTwister(rd);
 
    uniform_int_distribution<int> Dist(-23,42);
 
    for(int i = 0;i<20;++i) cout << Dist(mt) << endl;//print uniformly distributed random numbers in range [-23,42]
}

The seed comes from a good source (and it has more entropy), the distribution is uniform, while a single mersenne twister is not thread-safe you can just create more so each thread has its own generator, the generated numbers are of a much much higher quality than
rand()
and more.
Here is a good talk if you want to hear more: http://channel9.msdn.com/Events/GoingNat...ed-Harmful

If you are on MinGW or another platform that has a bad
random_device
, you can still just use whatever you would call
srand()
with though the seeding would probably not be as good as with a proper
random_device
.
Age ratings for movies and games (and similar) have never been a good idea.
One can learn a lot from reinventing wheels.
An unsound argument is not the same as an invalid one.
volatile in C++ does not mean thread-safe.
Do not make APIs unnecessarily asynchronous.
Make C++ operator > again
Trump is an idiot.
Reply
Thanks given by: A-Man
#7
(02-01-2016, 05:41 PM)A-Man Wrote:  been learning a lot reinventing wheels.
More like reinventing car. You're literally writing a script engine from scratch.

I wonder if you're using any kind of source control tool (usually 'git'). Don't wanna sound off topic but this is incredibly important. Doesn't matter if you want to share the source or not, you should be using it somehow (either locally or even better via GitHub). This makes it possible to revert things if something goes horribly wrong. Also you clearly see what you have changed with diff. Any programmer should (must?) know a fair amount of source control / versioning system.

(02-01-2016, 05:41 PM)A-Man Wrote:  The next point involves the precedence of operators. So one of the weird things that hit me was the annoying decision of giving bitwise operator like AND and NOT a lower precedence than most equality related operators (==, >, <..etc). Later I realized: I wasn't the only one, this was only in C++, and that there isn't really a standard order everyone follows; e.g: python's list is very different from C++. So yeah, I thought I could try coming up with one myself to make things more intuitive :P I used C++'s as a base, and the first thing to change was of course that. I also did stuff like give bitwise OR and XOR the same precedence, and added a couple of operators like python's exponentiation (**). Hopefully, there weren't something I overlooked, or was there?
This may come in handy:
http://dlang.org/spec/expression.html
I highly suggest you to not make it any different than what C language has, since it's creator(s) were actually mathematicians. Don't claim you know better than them ;) So you should not give AND operator higher precedence than comparison operators otherwise how do you expect it to work?
    C++-Code:
x <= 50 && x > 0

It will first try to execute 50 && x which doesn't make sense in any way.

Ohh, and I'm completely against pythonic comparison (from a compiler writer's point) 2<x<9. You should not bother making such syntactic sugars otherwise you are gonna have a bad time, don't make exceptional stuff because you will end up fighting for it all over the place. You need to think every possible situation and will need to put that exceptional code everywhere... Let the system work in it's normal consistent way.

(02-01-2016, 05:41 PM)A-Man Wrote:  Lastly, I'm considering trying to be very flexible with operand types. e.g:
- "text * number" is 'text' repeated 'number' times. I know this can be done in python.
- "text + number" is length of 'text' + 'number'.
- "text - number" is length of 'text' - 'number'.
- "text / number" is length of 'text' / 'number'.
-etc..basically any attempt to do arithmetic on text that is not otherwise supported will use the length of the text operand(s) instead.
No no no not never ever do this kind of things.
Ultimately, my constant dissatisfaction with the way things are becomes the driving force behind everything I do.
[Image: sigline.png]
LF2 IDE - Advanced visual data changer featuring instant data loader
LF2 Sprite Sheet Generator - Template based sprite sheet generator based on Gad's method
[Image: sigline.png]
There is no perfect language, but C++ is the worst.
Reply
Thanks given by: A-Man
#8
Someone else Wrote:At this point I have stated the main reasons I have. I would say though that warnings should generally be considered as errors anyway, and teaching anyone to just ignore warnings is bad.
Yes, and I appreciate it a lot. I will consider more than a warning for boolean/bitwise operators of the same level ambiguity, but I'd still sneak in an option to turn such errors off or reduce them to warnings.

I've thoroughly read your explanation, and will further check that video soon!

Nightmarex1337 Wrote:More like reinventing car. You're literally writing a script engine from scratch.
It's totally justified, I think. I mean, I'm doing this for a simple game engine that as well targets uninformed audience, and going with a full-fledged programming languages may be overdoing it (and will probably require hacks and bodges to make things work like I want, and I hate that)

Nightmarex1337 Wrote:I wonder if you're using any kind of source control tool (usually 'git'). Don't wanna sound off topic but this is incredibly important. Doesn't matter if you want to share the source or not, you should be using it somehow (either locally or even better via GitHub). This makes it possible to revert things if something goes horribly wrong. Also you clearly see what you have changed with diff. Any programmer should (must?) know a fair amount of source control / versioning system.
Uhh, I'm totally guilty for having no experience in any source control stuff, but it's been on my todo-list for a while >.<
I will do soon.

Nightmarex1337 Wrote:This may come in handy:
http://dlang.org/spec/expression.html
I highly suggest you to not make it any different than what C language has, since it's creator(s) were actually mathematicians. Don't claim you know better than them ;) So you should not give AND operator higher precedence than comparison operators otherwise how do you expect it to work?
Code:
x <= 50 && x > 0
It will first try to execute  50 && x which doesn't make sense in any way.
Thanks! I will check it out.
I'm not claiming anything lol, but I should say when I said AND, I meant bitwise AND :P Like, one would expect
x & y > 2
to first evaluate the x&y and then compare with 2. It also appears, according to the SE question link Someone else has provided, that this is really a remnant of the past design, when & and && were collectively just &, and what it does was deduced by the context.
Regardless, with
50 && x
, 50 is True because its a non-zero, when x would also be True if its a non-zero (or False otherwise). I'm sure this is how almost every language (except C, I think) would do it.

Nightmarex1337 Wrote:Ohh, and I'm completely against pythonic comparison (from a compiler writer's point) 2<x<9. You should not bother making such syntactic sugars otherwise you are gonna have a bad time, don't make exceptional stuff because you will end up fighting for it all over the place. You need to think every possible situation and will need to put that exceptional code everywhere... Let the system work in it's normal consistent way.

If you're worried about the trouble implementing it would get me, then I think I really have this sorted. I went with parsing everything to data trees instead of Polishnotation or anything of that sort. So at one level, I have got an operator stack and an operand stack. What I'll do is just check if I have 2 operators in [icode=cpp]>, <, >= or <=[/icode] after each other and do my stuff if True. Something like 6>5>4 would parse normally anyway, so the only exception would be in the expression evaluation function (which is a really simple one).
I'd really like to have x>y>z syntax myself because it's shorter and is what people do in maths to test if a number in within an interval.

Nightmarex1337 Wrote:No no no not never ever do this kind of things.
Right! I've already cut on the plan and all the code related to it XD

Thanks!
[Image: signature.png]
A-Engine: A new beat em up game engine inspired by LF2. Coming soon

A-Engine Dev Blog - Update #8: Timeout

Reply
Thanks given by:
#9
(02-03-2016, 09:34 AM)A-Man Wrote:  I'm not claiming anything lol, but I should say when I said AND, I meant bitwise AND :P Like, one would expect
x & y > 2
to first evaluate the x&y and then compare with 2. It also appears, according to the SE question link Someone else has provided, that this is really a remnant of the past design, when & and && were collectively just &, and what it does was deduced by the context.
Ohh, what a relief... Well, now I'm on your side :D

(02-03-2016, 09:34 AM)A-Man Wrote:  Regardless, with
50 && x
, 50 is True because its a non-zero, when x would also be True if its a non-zero (or False otherwise). I'm sure this is how almost every language (except C, I think) would do it.
There is no modern language I know that allows you to do that, it's proved to be a bad design choice to let boolean operations on different types of operands. Compiler enforced bool made it a lot easier to catch those little (and hard to see) mistakes like this one:
    CSHARP-Code:
if(x = 50)
    fiftyCent = new FiftyCent();
else
    fiftyCent = null;
// Error: expression 'x = 50' not implicitly convertible to bool


(02-03-2016, 09:34 AM)A-Man Wrote:  If you're worried about the trouble implementing it would get me, then I think I really have this sorted. I went with parsing everything to data trees instead of Polishnotation or anything of that sort. So at one level, I have got an operator stack and an operand stack. What I'll do is just check if I have 2 operators in >, <, >= or <= after each other and do my stuff if True. Something like 6>5>4 would parse normally anyway, so the only exception would be in the expression evaluation function (which is a really simple one).
I'd really like to have x>y>z syntax myself because it's shorter and is what people do in maths to test if a number in within an interval.
Let's see...
Ultimately, my constant dissatisfaction with the way things are becomes the driving force behind everything I do.
[Image: sigline.png]
LF2 IDE - Advanced visual data changer featuring instant data loader
LF2 Sprite Sheet Generator - Template based sprite sheet generator based on Gad's method
[Image: sigline.png]
There is no perfect language, but C++ is the worst.
Reply
Thanks given by: A-Man
#10
(02-03-2016, 06:49 PM)Nightmarex1337 Wrote:  There is no modern language I know that allows you to do that, it's proved to be a bad design choice to let boolean operations on different types of operands.
Well C++ does, but I would agree that integer to conversions are not a good idea. Pointers are usually fine in my experience.
The way GCC and Clang deals with the dreaded [codeo=cpp]if(a = b)[/codeo] problem is warning if you don't put add an extra set of parentheses:
    C++-Code:
if(a = b){ /* ... */ }//warning
if((a = b)){ /* ... */ }//ok
So far I have not seen a case where that has not caught a bug where the programmer meant
==
instead of
=
.
Age ratings for movies and games (and similar) have never been a good idea.
One can learn a lot from reinventing wheels.
An unsound argument is not the same as an invalid one.
volatile in C++ does not mean thread-safe.
Do not make APIs unnecessarily asynchronous.
Make C++ operator > again
Trump is an idiot.
Reply
Thanks given by:




Users browsing this thread: 1 Guest(s)