1 | In this file: [overview] [optimizations] [todo] |
---|
2 | |
---|
3 | ======================================================================= |
---|
4 | |
---|
5 | OVERVIEW |
---|
6 | The IDL compiler as a whole is composed of these modules: |
---|
7 | - libIDL |
---|
8 | Reads in a .idl file, filters it through the C preprocessor, |
---|
9 | and parses it into a tree of data structures that represent |
---|
10 | the .idl file. |
---|
11 | - driver |
---|
12 | - C backend |
---|
13 | Takes the IDL_tree (and associated information) from the driver |
---|
14 | and outputs a header file, client-side stubs, |
---|
15 | server-side skeletons, and miscellaneous routines for |
---|
16 | a specific interface. |
---|
17 | |
---|
18 | - header |
---|
19 | The typedef's and function prototypes for |
---|
20 | modules/interfaces |
---|
21 | - stubs |
---|
22 | client-side marshal/send request/demarshal routines |
---|
23 | - skeletons |
---|
24 | server-side demarshal/upcall/marshal routines |
---|
25 | - common |
---|
26 | routines such as the type allocation/de-allocation |
---|
27 | routines that are needed by both stubs & skeletons. |
---|
28 | |
---|
29 | All four routine categories use the same basic idea of |
---|
30 | recursively processing the IDL_tree and acting on |
---|
31 | "interesting" IDL elements as they are found by |
---|
32 | outputting the appropriate C. |
---|
33 | -- Elliot |
---|
34 | |
---|
35 | ======================================================================= |
---|
36 | |
---|
37 | OPTIMIZATIONS |
---|
38 | |
---|
39 | . When marshalling or demarshalling a parameter, if you know the alignment |
---|
40 | of the previous parameter, you can know whether you need to align the |
---|
41 | current parameter. [implemented, AFAIK] -ECL |
---|
42 | |
---|
43 | . When doing byteswapping, we should do it in place instead of using the |
---|
44 | function pointer. [NYI, but is very to change - just rewrite GET_ATOM()] |
---|
45 | -ECL |
---|
46 | |
---|
47 | . When demarshalling structures with only fixed-size elements in them, |
---|
48 | we should be able to memcpy directly off the wire. |
---|
49 | In the normal case, for a |
---|
50 | struct { |
---|
51 | int int1; |
---|
52 | char *string1; |
---|
53 | int int2; |
---|
54 | }; |
---|
55 | we have to pull off an int, then pull off a string, then pull off an int. |
---|
56 | |
---|
57 | Now consider |
---|
58 | struct { |
---|
59 | int int1; |
---|
60 | float float1; |
---|
61 | }; |
---|
62 | |
---|
63 | If we have to byteswap this, no gain. But in the "don't need to |
---|
64 | byteswap" case, we can directly memcpy() this struct from the raw |
---|
65 | data buffer for a nice gain. -ECL |
---|
66 | [Implemented] |
---|
67 | |
---|
68 | . there needs to be a way to say "add an iovec to the list" |
---|
69 | so that things like constant strings can be marshalled super-easily. |
---|
70 | [implemented] |
---|
71 | |
---|
72 | [Note that since ORBit's IIOP module uses writev() to send out data, |
---|
73 | and that list of vectors is generated to point at the data to be sent, |
---|
74 | ORBit will probably perform extremely well if you send large arrays |
---|
75 | or sequences of basic types, or strings, across the network. If you're |
---|
76 | just calling a void dosomething(long n1, long n2, long n3); all day, |
---|
77 | it will probably be less than optimal. If, on certain architectures, |
---|
78 | we can count on n1, n2, and n3 to be consecutive in memory, it might |
---|
79 | be possible for the IDL compiler to recognize this and output |
---|
80 | code that does |
---|
81 | marshal_value_at_address(&n1 /* base address */, sizeof(long) |
---|
82 | * 3 /* length in bytes */); |
---|
83 | |
---|
84 | (the giop_message_buffer_append*() routines already try to recognize |
---|
85 | appends of consecutive memory regions and coalesce them, but it's |
---|
86 | untested, and slower than the above) |
---|
87 | ] |
---|
88 | |
---|
89 | If an optimization will slow the IDL compiler down by an order of |
---|
90 | magnitude, that's fine - the idea is to do lots of work at compile |
---|
91 | time in order to save work at runtime. |
---|
92 | |
---|
93 | . For the server-side, we can use gperf to generate a nice hash of the |
---|
94 | operation names that we know at compile time, for doing the operation |
---|
95 | name -> class_specific_POA_data conversion. |
---|
96 | [This would give a nice gain - any takers?] |
---|
97 | [implemented, not using gperf but a switch statement.] |
---|
98 | |
---|
99 | . Use alloca() in skels to get memory. |
---|
100 | [Not needed - we just have straight variables on the stack, which is |
---|
101 | even faster ;-] |
---|
102 | [We should use alloca() instead of typename__alloc() whenever possible] |
---|
103 | |
---|
104 | . For the last generated demarshaller that will ever use a recv_buffer, |
---|
105 | we don't need to increment the recv_buffer->cur pointer afterwards. |
---|
106 | [A pain to implement.] |
---|
107 | |
---|
108 | . Direct mem append of string lengths instead of indirect, in certain |
---|
109 | cases. |
---|
110 | |
---|
111 | . Given a known number of fixed-length values, marshal them into an |
---|
112 | on-stack buffer and then pass that in one call to append_mem. |
---|
113 | |
---|
114 | . Can read & write from a socket at the same time. (multithreading). |
---|