A discussion of how to do Computer Science well, particularly writing code and architecting program solutions.
Showing posts with label language choice. Show all posts
Showing posts with label language choice. Show all posts
Wednesday, November 22, 2017
Review: Languages and Code Quality in GitHub
The study is several years old, but was recently reprinted in the Communications of the ACM. In it, they mined GitHub data for active open source projects, collecting the defect and development rates. They classified the defects according to their type, and the development language according to their feature. And they found that language choice matters, marginally. Some types of bug are far more common, such as memory management bugs in C / C++. Functional languages have the lowest rate; however, this analysis is only based on the commit history and does not also analyze development time, or differences in programmers. So the takeaway is that language features do matter, but programmers just write buggy code.
Friday, March 4, 2016
Conference Attendance SIGCSE 2016 - Day 2
After lunch when we are all in food comas, let's attend the best paper talk!
A Multi-institutional Study of Peer Instruction in Introductory Computing -
This study followed 7 instructors across different institutions as they used peer instruction. This showed that both the instruction is generally recognized as valuable, while also touching on routes in which it can go awry. Tell students why this technique is being used and what it's effect. Hard questions are good questions to ask, as students will discuss and learn from the question. This requires that questions are graded for participation and not *correctness*. Possible questions and material for peer instruction is available.
Development of a Concept Inventory for Computer Science Introductory Programming -
A concept inventory is a set of questions that carefully tease out student misunderstandings and misconceptions. Take the exams and identify both the learning objective and the misconception that results in incorrect answers.
int addFiveToNumber(int n)
{
int c = 0;
// Insert line here
return c;
}
int main(int argc, char** argv)
{
int x = 0;
x = addFiveToNumber(x);
printf("%d\n", x);
return 0;
}
a) scanf("%d", &n);
b) n = n + 5;
c) c = n + 5;
d) x = x + 5;
Each incorrect answer illustrates a different misconception. For example, input must come from the keyboard. Or variables are passed by reference.
Overall, this study illustrated how the concept inventory was developed, but not the impact of having it, or what it showed in the students and their learning.
Uncommon Teaching Languages - (specifically in intro courses)
An interesting effect of using an uncommon language in an introductory course is that the novices and experts have similar skills. Languages should be chosen to minimize churn, otherwise students feel that they haven't mastered any languages. And related to this point, languages also exist in an institutional ecosystem. Furthermore, we want to minimize the keywords / concepts required for a simple program. A novice will adopt these keywords, but they also are "magic" and arcane. And then how long are the programs, as we want novices to only have to write short code to start.
I also attended the SIGCSE business meeting and then the NCWIT reception. I have gone to NCWIT every year at SIGCSE, as I want to know what I should do (or not do) to not bias anyone's experience in Computer Science.
A Multi-institutional Study of Peer Instruction in Introductory Computing -
This study followed 7 instructors across different institutions as they used peer instruction. This showed that both the instruction is generally recognized as valuable, while also touching on routes in which it can go awry. Tell students why this technique is being used and what it's effect. Hard questions are good questions to ask, as students will discuss and learn from the question. This requires that questions are graded for participation and not *correctness*. Possible questions and material for peer instruction is available.
Development of a Concept Inventory for Computer Science Introductory Programming -
A concept inventory is a set of questions that carefully tease out student misunderstandings and misconceptions. Take the exams and identify both the learning objective and the misconception that results in incorrect answers.
int addFiveToNumber(int n)
{
int c = 0;
// Insert line here
return c;
}
int main(int argc, char** argv)
{
int x = 0;
x = addFiveToNumber(x);
printf("%d\n", x);
return 0;
}
a) scanf("%d", &n);
b) n = n + 5;
c) c = n + 5;
d) x = x + 5;
Each incorrect answer illustrates a different misconception. For example, input must come from the keyboard. Or variables are passed by reference.
Overall, this study illustrated how the concept inventory was developed, but not the impact of having it, or what it showed in the students and their learning.
Uncommon Teaching Languages - (specifically in intro courses)
An interesting effect of using an uncommon language in an introductory course is that the novices and experts have similar skills. Languages should be chosen to minimize churn, otherwise students feel that they haven't mastered any languages. And related to this point, languages also exist in an institutional ecosystem. Furthermore, we want to minimize the keywords / concepts required for a simple program. A novice will adopt these keywords, but they also are "magic" and arcane. And then how long are the programs, as we want novices to only have to write short code to start.
I also attended the SIGCSE business meeting and then the NCWIT reception. I have gone to NCWIT every year at SIGCSE, as I want to know what I should do (or not do) to not bias anyone's experience in Computer Science.
Wednesday, August 29, 2012
Computer Science Education (part 0 of N)
Besides my usual semesters of computer science courses and research, this fall I'm cross-enrolled at a neighboring university that offers education classes. Last night had some very interesting conversations. We are starting to prepare syllabi for the course we'd either ideally teach or want to be prepared to teach. I was in a group with two education PhDs (most of the class are). They consented to consider an introductory computer science course and answer two questions.
What are common naive theories that students have entering the course?
How might the course be designed to encourage students to reconstruct their theories?
So what naive theories do students have?
First, computers are magical. No, computers do exactly what a programmer tells them to do. (More advanced students learn about race conditions, compiler influence on correctness, etc). Which, unfortunately, means that if a computer is not doing what you want it to do, then you instructed it incorrectly (c.f., The rat is always right).
Second, I'm going to be a game programmer. No, most computer scientists do not write games (or at least, aren't paid to). But we find many other interesting parts to the field. Besides, many game programmers are treated little better than grad students.
Do you know other naive theories?
Then after class, I spent some time discussing more "advanced" theories in computer science.
Functional versus imperative programming. Does one paradigm exist to rule them all? Is one class of programming languages sufficient? Do students gain by learning about both paradigms? I discussed this briefly in Is versus ought, and have been regularly reading a strong function view in Existential Type.
Big 'O' notation and algorithm / data structure selection. I previously discussed this some in Know your N. And was co-author on a paper, "Brainy: effective selection of data structures", that demonstrated actual data structure selection for a program is not always best from the "Big 'O'" point of view.
Language equivalence. Related to functional versus imperative and one of my first posts, "Problem Solving via Programming", programming languages are theoretically equivalent (i.e., turning complete). But in practice languages should be selected for particular problems. What problems are best for specific languages?
What are some other major theories about computer science that students should know?
What are common naive theories that students have entering the course?
How might the course be designed to encourage students to reconstruct their theories?
So what naive theories do students have?
First, computers are magical. No, computers do exactly what a programmer tells them to do. (More advanced students learn about race conditions, compiler influence on correctness, etc). Which, unfortunately, means that if a computer is not doing what you want it to do, then you instructed it incorrectly (c.f., The rat is always right).
Second, I'm going to be a game programmer. No, most computer scientists do not write games (or at least, aren't paid to). But we find many other interesting parts to the field. Besides, many game programmers are treated little better than grad students.
Do you know other naive theories?
Then after class, I spent some time discussing more "advanced" theories in computer science.
Functional versus imperative programming. Does one paradigm exist to rule them all? Is one class of programming languages sufficient? Do students gain by learning about both paradigms? I discussed this briefly in Is versus ought, and have been regularly reading a strong function view in Existential Type.
Big 'O' notation and algorithm / data structure selection. I previously discussed this some in Know your N. And was co-author on a paper, "Brainy: effective selection of data structures", that demonstrated actual data structure selection for a program is not always best from the "Big 'O'" point of view.
Language equivalence. Related to functional versus imperative and one of my first posts, "Problem Solving via Programming", programming languages are theoretically equivalent (i.e., turning complete). But in practice languages should be selected for particular problems. What problems are best for specific languages?
What are some other major theories about computer science that students should know?
Labels:
algorithms,
education,
language choice,
link,
rules,
theory
Saturday, March 26, 2011
Functional vs Imperative Programming
I was recently provided a link to another "programming" blog. The blog, Existential Type, is written by a professor from my undergrad institution, Carnegie Mellon. In it, he appears to be chronicling his efforts of teaching functional programming to freshmen computer science students. I have been deeply engrossed in reading the posts thus far and they have revived knowledge of past debates in language choice, etc. You may look forward to several future posts delving deeper into this issue. But today is Saturday and I have more research to do.
Wednesday, January 12, 2011
From C to C#
My recent work has lead me to port several programs that I'd previously developed in C into C#. In a previous post, I explained some of my preference for C, but work being work (or research).... After these efforts, I thought I'd expand upon some of my joys and sorrows experienced during this effort.
First, overall the process was surprisingly easy. Most code required little to no changes. Often, entire code bodies were copied between projects with a single find / replace: "." in place of "->".
Second, the minor changes were basically those within the object-based model. Fields of a struct become public (or private). And global functions need to be assigned to a class.
Third, some types and constructs do not have an easy conversion. While not difficult, it was nonetheless annoying to have to redo arrays as having run-time defined length. Or to change types to UInt32.
The final aspect to porting between languages is the run-time / OS / system interfaces that do change. So while the internals of the magical program box remain relatively constant. The externals of how it interacts with the system to take in / send out data change. Fortunately, for my porting tasks, this code was relatively modest. (One of the more difficult parts of this step is covered in Ray Tracing ... Part 2).
First, overall the process was surprisingly easy. Most code required little to no changes. Often, entire code bodies were copied between projects with a single find / replace: "." in place of "->".
Second, the minor changes were basically those within the object-based model. Fields of a struct become public (or private). And global functions need to be assigned to a class.
Third, some types and constructs do not have an easy conversion. While not difficult, it was nonetheless annoying to have to redo arrays as having run-time defined length. Or to change types to UInt32.
The final aspect to porting between languages is the run-time / OS / system interfaces that do change. So while the internals of the magical program box remain relatively constant. The externals of how it interacts with the system to take in / send out data change. Fortunately, for my porting tasks, this code was relatively modest. (One of the more difficult parts of this step is covered in Ray Tracing ... Part 2).
Tuesday, October 26, 2010
A Pointer on Pointers Part 1
People often express uncertainty about the working of pointers, structures, etc. They may not be my readers, but hearing their plight moves me to write nonetheless. Others have noticed this too, again looking at StackOverflow. Everything in this post is written from a C-centric point of view, unless otherwise noted.
There are two separate items to understand about pointers. First, where they come from. And second, what they point to. We will roughly be working off of the following picture of memory usage, but understand that this is so simplified that it is almost wrong.
Memory can be allocated from two places: stack and heap. Both are generalizations of the actual mechanics. Stack allocations are explicitly named variables in the source that are either made globally or local to a function. Global variables use space delineated in the binary image, rather than the stack pages, but they should be considered identically by a programmer in that the allocations are only good within their scope. This leads to my first example:
Managed language programmers often attempt the above example. newFoo() creates a foo and returns a reference; however, the referenced foo is on the stack. In case there is uncertainty, let's see the assembly for function newFoo():
Esp points to the end of the stack, so the subtract instruction changes the location pointed to, thereby setting aside the required space for an object foo on the stack, but the space is only available while in the function. When the cleanup section is reached, the space is "reclaimed" as esp is changed back. (The funny thing is, an optimizing compiler may inline newFoo() and therefore change the scope of this stack allocation). So to make an allocation persist beyond its allocation scope, we need the "heap".
Heap memory is memory that won't be reclaimed until the program explicitly requests (or termination). While malloc() is the most common method for heap allocations, it is not the only one. From these methods, a section of memory has been set aside for the program's usage. To again simplify our picture of computing, a Java implementation of newFoo would disassemble to something like the following:
A language like Java might prohibit making an allocation of an object foo on the stack and therefore force the allocation to made from the heap. And it does so by not using "sub esp, 0x10" to make the allocation but instead calling out to malloc() to set aside the necessary space. Now, out of this function, the address returned is not reclaimed.
An entire series of courses would be required to fully explain the trade-offs and reasoning, and since the length is growing long, I will conclude here and discuss the usage of this memory in a later post.
There are two separate items to understand about pointers. First, where they come from. And second, what they point to. We will roughly be working off of the following picture of memory usage, but understand that this is so simplified that it is almost wrong.
Memory can be allocated from two places: stack and heap. Both are generalizations of the actual mechanics. Stack allocations are explicitly named variables in the source that are either made globally or local to a function. Global variables use space delineated in the binary image, rather than the stack pages, but they should be considered identically by a programmer in that the allocations are only good within their scope. This leads to my first example:
struct foo* newFoo()
{
struct foo F;
return &F;
}
Managed language programmers often attempt the above example. newFoo() creates a foo and returns a reference; however, the referenced foo is on the stack. In case there is uncertainty, let's see the assembly for function newFoo():
; Using 32bit x86 assembly, as 64bit has calling convention differences
push ebp
mov ebp, esp
sub esp, 0x10 ; Allocate 16 bytes for a "foo"
mov eax, esp ; Put the address into the return value
mov esp, ebp ; Cleanup
pop ebp
ret
Esp points to the end of the stack, so the subtract instruction changes the location pointed to, thereby setting aside the required space for an object foo on the stack, but the space is only available while in the function. When the cleanup section is reached, the space is "reclaimed" as esp is changed back. (The funny thing is, an optimizing compiler may inline newFoo() and therefore change the scope of this stack allocation). So to make an allocation persist beyond its allocation scope, we need the "heap".
Heap memory is memory that won't be reclaimed until the program explicitly requests (or termination). While malloc() is the most common method for heap allocations, it is not the only one. From these methods, a section of memory has been set aside for the program's usage. To again simplify our picture of computing, a Java implementation of newFoo would disassemble to something like the following:
push ebp
mov ebp, esp
sub esp, 0x4 ; Space for arguments
sub esp, 0x4 ; Space for arguments
mov [esp], 0x10 ; "Push argument"
call malloc
mov esp, ebp ; Cleanup
pop ebp
ret
A language like Java might prohibit making an allocation of an object foo on the stack and therefore force the allocation to made from the heap. And it does so by not using "sub esp, 0x10" to make the allocation but instead calling out to malloc() to set aside the necessary space. Now, out of this function, the address returned is not reclaimed.
An entire series of courses would be required to fully explain the trade-offs and reasoning, and since the length is growing long, I will conclude here and discuss the usage of this memory in a later post.
Thursday, October 21, 2010
Accessing Structs with Side Effects
In my recent research, I've needed to take certain additional actions when the fields in a data structure are changed. Since all code involved is my own, my initial approach was to annotate any access with explicit calls. Obviously, this isn't sustainable in larger projects, but for quick prototyping, it is sufficient. Let's see a simple example (with pseudo-C as usual):
Since we want to ensure that we take action after every update, get / set routines could be introduced for each field. Yet, now I have the overhead of calling these routines on every access. And I still have to trust that all accesses will use the routines (though using another language like C++, I could make the fields private and force access to the routines).
My research is currently using C#, so I have further tools available to me, specifically accessors. Initially, they seemed to be yet another silly hoop to jump through while writing code. However, they provide a great tool for my present problem, where modifications to fields need to incur side effects. Now for a C# snippet:
So my data structure now has implicit side effects from updates (good for the research). And so any future development doesn't need explicit annotations. All in all, I was quite pleased with this change (except for spending the time debugging the differences between implicit and explicit detection).
typedef struct _tree {
pTree left, right;
int data;
} Tree, *pTree;
Tree t;
t.left = t.right = NULL;
// Now take action
Since we want to ensure that we take action after every update, get / set routines could be introduced for each field. Yet, now I have the overhead of calling these routines on every access. And I still have to trust that all accesses will use the routines (though using another language like C++, I could make the fields private and force access to the routines).
My research is currently using C#, so I have further tools available to me, specifically accessors. Initially, they seemed to be yet another silly hoop to jump through while writing code. However, they provide a great tool for my present problem, where modifications to fields need to incur side effects. Now for a C# snippet:
private Node pLeft; // Internal storage
public Node left // Public accessor
{
get { return pLeft;}
set
{
if (pLeft == value) return; // discard redundant stores
pLeft = value;
pStack.Push(this); // side effect on update
{
get { return pLeft;}
set
{
if (pLeft == value) return; // discard redundant stores
pLeft = value;
pStack.Push(this); // side effect on update
}
}
So my data structure now has implicit side effects from updates (good for the research). And so any future development doesn't need explicit annotations. All in all, I was quite pleased with this change (except for spending the time debugging the differences between implicit and explicit detection).
Monday, September 27, 2010
Stack Overflow: String Replace
I've recently begun exploring Stack Overflow. Helping to answer people's questions has a considerable appeal to one, such as I, who enjoys teaching. Sometimes you can find a question that no one has answered, as few know the answer. Other times, almost any professional programmer can answer the question, and often these are from CS students trying to solve their homework. On occasion, I may repost interesting questions here.
For my first repost, we have a StringReplace function that needs optimizing. Independent of the questioner's implementation, let's consider the core algorithm.
This is how we want to replace. Find each instance of OldText in BaseString and replace. Given the nature of the question, our implementation will be written in C and not use any libraries (like regex, CRT, etc).
Working backwards, we'll need to return a final string of some indeterminate size. Then the BaseString, as it is modified, is stored in this final string, in the following form, where each base is a subset of the string:
When the problem is viewed this way, an implementation is suggested, recursive. The following code expands upon this approach (though avoiding some casting requirements, necessities of sizeof(char), and possibility of unicode support).
First, the routine Match() is basically a string compare, which can have its own interesting optimizations including partial skip-ahead. Second, a final version wouldn't use strlen, except perhaps at the start. The lengths can be passed between iterations. Finally, using memcpy and malloc aren't cheating as I've written my own implementations in the past, and therefore their omission in the present is just for the sake of brevity.
But this is just what I'd currently write for a string replace. You can follow the link to Stack Overflow and see the original request (and my suggestion for improvement). Or perhaps consider your own.
For my first repost, we have a StringReplace function that needs optimizing. Independent of the questioner's implementation, let's consider the core algorithm.
StringReplace(OldText, NewText, BaseString)
for SubString of length OldText in BaseString
if (OldText == SubString) {Replace text}
This is how we want to replace. Find each instance of OldText in BaseString and replace. Given the nature of the question, our implementation will be written in C and not use any libraries (like regex, CRT, etc).
Working backwards, we'll need to return a final string of some indeterminate size. Then the BaseString, as it is modified, is stored in this final string, in the following form, where each base is a subset of the string:
When the problem is viewed this way, an implementation is suggested, recursive. The following code expands upon this approach (though avoiding some casting requirements, necessities of sizeof(char), and possibility of unicode support).
#define StringReplace(x, y, z) StringReplaceHelper(x, y, z, 0)
char* StringReplaceHelper(char* OldText,
char* NewText,
char* BaseString,
unsigned long space)
{
char* start = BaseString, ret = NULL;
unsigned long offset = 0;
while (*BaseString != '\0')
{
if (Match(OldText, BaseString))
{
offset = (BaseString - start);
// The next search will begin after
// the replacement text
ret = StringReplaceHelper(OldText, NewText,
BaseString + strlen(OldText),
space + offset + strlen(NewText));
break;
}
BaseString++;
}
// If the end of string, then this is the last piece
// Else copy in subset and replacement piece
if (*BaseString == '\0')
{
offset = (BaseString - start);
ret = (char*) malloc((space + offset));
memcpy(ret + space, start, offset));
}
else
{
memcpy(ret + space, start, offset);
// Don't actually do the following, the processor
// will have to walk NewText twice
memcpy(ret + offset, NewText, strlen(NewText));
}
return ret;
}
First, the routine Match() is basically a string compare, which can have its own interesting optimizations including partial skip-ahead. Second, a final version wouldn't use strlen, except perhaps at the start. The lengths can be passed between iterations. Finally, using memcpy and malloc aren't cheating as I've written my own implementations in the past, and therefore their omission in the present is just for the sake of brevity.
But this is just what I'd currently write for a string replace. You can follow the link to Stack Overflow and see the original request (and my suggestion for improvement). Or perhaps consider your own.
Thursday, September 9, 2010
Problem Solving via Programming Languages
As an interlude to the OO series, I want to clarify something about language preference.
BASIC, C, C++, C#, IA64, Java, Lisp, MIPS, ML / OCAML, Perl, Python, and X86 / X64 -- I've written executable code in them all (some far more than others). While C dominates my coding experience, and is my general preference, I recognize that sometimes another language provides better support. Usually, this support is primarily a reduction in development time, but I'd like to delve a little deeper into why I choose three particular languages: C, C#, and Perl.
C -- "Machine independent assembly"
When I chose to write C code, I'm often thinking of precisely how a computer will execute the high-level solution that I've sketched. Usually, given a problem, I am decomposing it into executable bits, like looping through a list, building a tree just-so, etc. I know what I want, and if I'm wrong then its my foot that I'm shooting.
C# -- "Rapid application development"
I'm only beginning to use C# regularly, but I'm finding it very useful for achieving simple tasks, particularly those that require a UI. .NET is providing an extensive collection of support libraries, so that loading graphics, interacting with them, etc is mostly a matter of connecting the pieces together. To me C# is called for when my problem is UI-centric: click the button and an image appears....
Perl -- "Text, files and strings"
My Perl use has been declining over the years, not for a dislike of the language, but rather a shortage of problems that call for its use. Perl is my tool of choice when I need to work with lots of text and files. I've used it to write everything from copy files between directions and rename to parse MBs of raw data and compute statistics for particular results of interest. (The later eventually was rewritten in C for speed and portability).
I'm always thinking that one of these days I'll try learning more about a functional language. However, I quickly fall into the trap of thinking procedurally. "I know you're on an x86, so just let me write my assembly and we can be done."
BASIC, C, C++, C#, IA64, Java, Lisp, MIPS, ML / OCAML, Perl, Python, and X86 / X64 -- I've written executable code in them all (some far more than others). While C dominates my coding experience, and is my general preference, I recognize that sometimes another language provides better support. Usually, this support is primarily a reduction in development time, but I'd like to delve a little deeper into why I choose three particular languages: C, C#, and Perl.
C -- "Machine independent assembly"
When I chose to write C code, I'm often thinking of precisely how a computer will execute the high-level solution that I've sketched. Usually, given a problem, I am decomposing it into executable bits, like looping through a list, building a tree just-so, etc. I know what I want, and if I'm wrong then its my foot that I'm shooting.
C# -- "Rapid application development"
I'm only beginning to use C# regularly, but I'm finding it very useful for achieving simple tasks, particularly those that require a UI. .NET is providing an extensive collection of support libraries, so that loading graphics, interacting with them, etc is mostly a matter of connecting the pieces together. To me C# is called for when my problem is UI-centric: click the button and an image appears....
Perl -- "Text, files and strings"
My Perl use has been declining over the years, not for a dislike of the language, but rather a shortage of problems that call for its use. Perl is my tool of choice when I need to work with lots of text and files. I've used it to write everything from copy files between directions and rename to parse MBs of raw data and compute statistics for particular results of interest. (The later eventually was rewritten in C for speed and portability).
I'm always thinking that one of these days I'll try learning more about a functional language. However, I quickly fall into the trap of thinking procedurally. "I know you're on an x86, so just let me write my assembly and we can be done."
Subscribe to:
Posts (Atom)