a. Arduino/Raspberry pi related workshops
b. Autocad for inventor 2013 workshops
This is a repost of my answer on Quora .
Whether its hardware or software or any product, spend lot of time and energy in customer development . Finding out what users want and think they will pay for is more critical than what we think will be cool for them.
Since HW involves lot of capital, you need to get the above part right.Once you have it, you would need to figure out the device/product environment conditions aka environment limitations, regulatory restrictions/guidelines/requirement whatever you call and very likely you will be changing the product spec or operating conditions to match the new reality.
Then come up with product design (look and feel), in layman terms, the product casing or how users should feel like when they see the product. Very likely because of the product casing/packaging, you will very likely revise your spec again..
Next, from the spec you came up with talking to users etc, figure out what it translated to technical components. Like how many HDMI ports , VGA, optical . USB ports etc, what network interfaces you support, what will be the capacity of hard drive and whether it should be SSD etc..
In case of Apple-TV for example, you might need to video encoding/decoding etc, you need good video decoder…check who sells video decoder chips (probably ARM or imagination tech) , also depending what addln functionality you need, you will also need main CPU processor, so figure out if its going to be ARM or MIPS or Intel Atom ..While you are figuring this, also check the embedded OS you want the device to be running and then check if the compatibility of OS with HW at device driver level…
Check all potential vendors for each component and many times they give you sample pricing (typically in volume of 10K pricing …some times they might give you pricing for 1K pieces as well )
Once you have selected vendors, figure out the power consumption and see if the power budget makes sense.Figure out if you want to your device to be running both in Africa and Siberia ..Accordingly revise your power budget and therefore you will change product spec and/or product casing etc..
Next implement the design spec using platforms from upverter etc and simulate the design. Next select the PCB vendor who can do the design and manufacturing for you…Board design/packaging might impact lot of factors starting from Design spec all the way to casing etc..So this is important..
Add all the pricing and get the BOM (bills of materials) ..Check if the BOM makes sense for the price range you are planning to sell..If your initial BOM at this stage is for example 25$ , triple that number because you are still missing pcb design and manf, board packaging , yield, lab bring up , compliance testing , QA and other NRE costs…Now check if you are still making money for the price that your customer is willing to pay …If you initially planned to sell for 50$, but your BOM is around 75$, you are loosing money and unless you have lot of money where you can afford losses initially to gain traction, you might have to change your spec again..Most likely you will remove some components and/or reduce some components..for example you might say I will only add 2 HDMI instead of 4, you might reduce processor speed from 1GhZ to 750MHz etc…
Now go back to your customer and convince him that what you are still delivering is lot of value to him and you are still solving their pain points and only removing the nice to have features…If they agree, proceed further..
While you are doing all this, start your embedded and SW development process in parallel using developer or emulator prototyping boards…Have this ready first so that you can do the demo to potential customers/investors on how the product will perform once HW is done …Show them once manufactured, its plug and play ..get their reactions/feedback..very likely you might change your spec totally or part of it…Once you do this 2-3 times and feeling confident, accelerate your PCB board design and manufacturing and get the prototype out…As a start up ,never go for manf without getting SW/Embedded part ready ..Its more like a lean start up development model for hardware..
If you are building a ASIC like Apple does for their Apple-TV , its different on lot of other aspects and I wont recommend it unless you can raise 10-20M$ easily and can hire 10-20 HW engg team …Of course not every product needs an ASIC or even an FPGA and don’t have to rely on leading manf technologies….
As as others have commented , you need to worry if you are infringing IP or if you are creating IP, make IP protection as part of your design thinking and strategy process..You can start with a provisional patent to buy you time and give you a opp to figure if you still want to go ahead after the feedback from SW demo/product demo..
BTW, try to use opensource technologies and customize it where you can…especially the design tools..Some tools are very expensive…Some vendors give their tools free for folks using their IC platform…Another good advice is try to limit vendors as much as you can, it will save you lot of headaches on lot of areas…
I might have left lot of other pieces, but this should give you a good idea of the process involved..If you need details or clarifications, feel free to ping me..
There was an interesting question that was asked during the interview with Moshar from Broadcom
EE Times: Are you using formal verification, and does it reduce the need for simulation and acceleration?
Moshar: We are using formal verification, but I don’t believe it is reducing the scope of the work we need to do. It will help you make sure that your IP is golden, but formal verification really does not apply at the SoC level. You have to go through all the traffic scenarios you need to cover.
It would have been nice if a detailed answer was given ..From the question posed and reading the answer , it appears as if it is a limitation of Formal Verification. I think if you describe in terms of Formal properties using PSL , it would be still possible to formally verify the traffic between various cores on SOC..
I think it would be interesting to know on how companies operating in ESL space view this. As systemC and C based language design becoming popular and support TLM, it would interesting to see how one can extract information from these abstract models and verify the design intentions.
I was just looking at some of the latest anouncements and the prototypes at the recent CES show and if you look at the trend and the products that consumers are crazy about, and if you want to sum it up..The market is all about how much personalization can the users do in the product. Every consumer wants to see himself/herself in the product..so what does this translate it to for the Semi-Conductor industry folks.. “ Personalization of Silicon “..
Look at the famous Apple IPOD ..Many see it as the symbol of the youth…So the closer the silicon is to the hearts the consumers, the better are the chances for product success…But this personalization all comes with a price…Everyone knows pricing is one key element which determines the reach and sucess of the product .
So, the big question is , how can the IC industry reduce its IC design and packaging (I mean chip packaging and not feature/product packaging ) costs and yet be profitable? The answer partly comes from the EDA industry. Automation of the design flow. It is the degree of automation and the accuracy of the results which it can deliver over a short span of time . Multi-million transistors ( 10M+ transistors) and SOC’s are becoming common and the TTM is becoming shorter day by day. For example, with-in few days/weeks of the Apple’s IPHONE announcement, LG released a similar competetive product with almost the same functionality as IPHONE. Imagine how fierece the competetion is. Each single day counts.
How can the design teams manage this pressure ? Marketing teams often wants to add new features in the last minute. People involved in the IC Design knows what this translates to. It is not a matter of simple ECO . Sometimes a small feature addition can lead to couple of weeks delay. I know a customer who has to do go through the entire design cycle twice as marketing asked them to add 2 more new features each time and this effected their tape-out schedules by 6 weeks..Not every Semi-conductor company can afford such a delay ( especially folks operating in Consumer Electronics market segment ). So , the solution doesnt come by adding few more engineers or simply delaying the product launch.. You need design tools which are smart enough to detect the incremental changes anywhere in the flow and automatically do the appropriate steps in the design flow with no or minimum intervention from designer.
Not many EDA companies are recognizing the importance of the automation and their perception of automation is awefully wrong. I would say in years to come, the company which manages to bring the true “automation” of the design flow to the industry will emerge as the real winner.
If your customer makes money, you make money. It is as simple as that.
BTW, You need engineers to find smarter ways to design and bring “true” personalization of the silicon
I have recently seen many folks asking questions like “How to write a synthesis script” or “Whats is in a synthesis environment” . Many freshers right out of the school claim to know about synthesis but in reality doesnt have a clue where to start.
When I refer to Synthesis in this post…I mean Logic Synthesis and I mostly cover only logic synthesis and it doesnt include any STA (Timing analysis). I will cover some points which overlap between logic synthesis and front end timing optimization. I will write a seperate post on Front-end timing optimization and Physical Synthesis (Floorplan , Global Placement and Routing , CTS ).
Synthesis is not just a script you write for a specific tool. It is actually much more than that. I have seen many folks who loosly couple it with a specific tool like Synopsys DC or Magma Blast Create. It is in a sense a methodology which evolves over the time by doing many runs as the block/chip evolves. Before you start synthesis , One has to know
1. What are the area goals? If area is one of the important criteria, then one needs to know what are the area reduction techniques available from the tool . I’m assuming that RTL has conforms to best design practices. Check the number of logic levels and cells used after first iteration. Accordingly decide what your strategy for area reduction should be. One benefit of reducing the area will help to comeup with a relatively smaller floorplan. Ofcourse most EDA folks prefer bigger floorplans sothat their floorplanning or placement tools can easily do the job , easily avoid the congestion and cross-talk issues :). But I strongly suggest synthesis folks to have some understanding of physical synthesis.
2. What are the power goals? Some tools have advanced power savings schemes in synthesis itself like Power gating flops/Retention flops. They do this when the RTL designer uses some special pragmas in their RTL code and they capture this when the tools parse the RTL.
3. If clock gating allowed? Are there any modules/blocks for which clock gating has to be disabled? You need to understand why it has to enabled/disabled and should be aware of its impact. You should know if the technology library has the support for ICG (Integrated
clock gating cells) .
4.How much hierarchy has to be kept? Keep in mind that Logical Hierarchy is different from Physical Hierarchy. When you decide to maintan hierarchy, it has to be considered that some optimization algorithms are limited by the module hierarchies/boundaries and so care should be taken as if and how much QOR can be sacrificed.
5. Is flattening allowed ? The answer to this question depends partly on the decision you make on the above question. If yes , is it allowed on entire design ? If not, can you atleast do flattening selectively. Other relevant information you need to know is , if rtl inferred models can be flattened .
6. What is your DFT strategy and methodology. Many might wonder why is it important to consider at Logic Synthesis stage. This is especially important when you plan for DFT during RTL development. Like you might have declared the test/scan ports in RTL and since scan insertion will not be done till synthesis has been done, many optimization algos in the synthesis engine will see them as floating and will blow them away. So, you might need to instruct the tool not to touch them.
7. Resource sharing and Operator Merging : If the these options are available in the synthesis and if you dont have any constraint or reason for not using them, then it is highly recommended to take advantage of these . But care has to be taken as some formal tools either dont have good support for this or dont support them at all .
8. Datapath architecture selection : Some advanced synthesis tool allows you to configure/pre-select the datapath architectures . If timing is critical for a particular block, then you might want to overide the area optimization steps by selecting the fastest architecture available for all the datapath components in that particular block. Do remeber that selecting fastest architecture might blow up area sometimes.
9. Formal Verification : FV (Formal Verification) tools dont do agressive optimizations ( or should I say, it is not a good idea to so ) as synthesis tools do. So, it is highly important that you let your FV tool know about the synthesis options when exists & possible. You should try to mimic the synthesis env in your FV environment. Else you can see some false failures.
10. Hard Macros : When macros are used and if some of the inputs/outputs are unused, they might be removed. So if any macros or hard instantiated gates are present , you should set the appropriate commandslike force keep or set_dont_touch .
11. Spare gates/registers : When spare registers are described in the RTL itself, synthesis tools should be instructed to preserve them else they are treated as a part of unreachable registers and might be thrown off during
synthesis (deadcode removal ). Some people use spare registers in the backend and sprinkle them evenly.
12. It is important to analyze the technology library for the cell delays and area. One another important factor most people forget is to consider the effect of EM (electro-migration) and yield . All the bad cells ( which have high delays , or bad for EM or yield, bad area ) should be hidden or disabled from being used by synthesis engine. Forgetting to disable cells bad for EM or yield effects timing closure with cross-talk/SI during backend.
Apart from this, make sure all complex cells like AOI, OAI, Full Adders and Half Adders, XOR etc are available to the synthesis tool. It helps save area and increases drive load capability resulting less buffering.
13. Dont use highest effort levels in synthesis by default ( unless you know what you are doing ). Some optimization algorithms might hurt your design by doing agressive optimizations. Synthesis knobs have to be used with care and by studying what it does to the design.
14. Designware usage: Sometimes, it makes sense to use Designware components in RTL . Make sure the synthesis tool used can detect the Designware components , understand and synthesize them. Some synthesis tool vendors change them to their equivalent models ( for example, Lavaware components from magma ) . If not , you might need to black box them and read in the gate level netlist of those designware components after synthesis is done.
15. Pipe-lining : Almost all synthesis tool support this . So where necessary, the designer or synthesis expert has to know which block/module needs pipe-lining ; how many stages are required and what is the latency at each stage etc.
16. Re-Timing : Sometimes when RTL has those designware or lavaware components , some synthesis tools automatically apply re-timing and some synthesis tools require you to explicilty set the relevant re-timing configs . But keep in mind that re-timing is not supported that well in FV tools.
17. Dontcare optimization : If the RTL contains dont-cares (X) , many synthesis tools allow you to choose whether you want x to be treated as “0″ or “1″. I suggest it would be better if we leave it to the synthesis tool to decide. Most dontcare algorithms select the value of x which will result in smaller ckt area if area optimization is enabled or better ckt with better timing if timing mode is enabled.
18. Clock Edge mapping: Some synthesis tool map to neg edge flops and add a inverter if they see that is has better area savings than pickingup a pos edge flop. Some design methodologies especically back-end teams dont prefer this sometimes. So, you need to set the configs accordingly.
19. If there are any complex cells like Full Adders, Half Adders etc with multiple outputs in your library , then most synthesis tools dont utilize them and so if your want to synthesis tool to use them , then you have to hand instantiate them in RTL. Remeber these cells will be timed, but will not infered or decomped to simpler cells/logic during optimization phases .
With all these said, I cant stress enough how important it is for RTL coders to follow best practices. There is lot of information out there or one can refer to STARC methodology guide or Design-Reuse methodology manual for information .
I have been hearing about the design/IP reuse from time to time. Today there was an article on EE times which can read by clicking this link Design-Reuse
I think many big companies have adopted the reuse methodology and realize the benefits of it. Even small-medium companies re-use the blocks or IP’s in most of their chips. So, I dont think there is any need to keep repeating the importance over and over again. Most of the companies cut down their design cycle time and costs using this reuse methodology. I think now one should look forward and see if they can get good prototyping flow . I recently worked on prototyping efforts and it correlated very well with 90nm , 130nm and 180nm. For 65nm and below, special considerations have to be adopted like taking the account of parasitic effects . With this flow, the time to achieve timing closure is significantly smaller as designer gets early feedback.
I would appreciate if anyone has any experience in developing/using the prototyping flow and can share their concerns. It will be very informative and good discussion.
The De-Coupling point principle put forth by Clayton M. Christensen, Christopher S. Musso and Scott D. in their article “MAXIMIZING THE RETURNS FROM RESEARCH”, says that “The company developing a new technology must plan to integrate forward from the point at which a new technology is developed, across every interdependent interface in the chain of value-adding activities out to that point at which there is a modular interface with the next stage of value-added.” They say that, it is the activity just before the decoupling point where the most attractive profitability in the value chain can be achieved. The reason for this is that performance in a modular product is not determined within the product’s architecture, but within the subsystems from which the modular product is assembled. At the stage of value-added just before the decoupling point, performance differences are determined primarily by the interdependent architecture and less by the components that are used.
EDA industry is very competitive and is more technology driven than any other field in the IT industry. As the design complexities increase, designers demand more innovative and complex tools and as the time to market pressures are increasing (for many chips it is less than 6-9 months), EDA companies are always under great pressure to roll out sophisticated tools to understand and solve design complexities . Having said this, it can be easily understood that EDA industry works in tight integration with the Design companies and the manufacturers. Each chip is designed in a different way and the Methodology engineers propose new
Methodologies and flows which indirectly puts pressure on the EDA industry.
Now let’s look at various perspectives: An EDA Startup Company can roll out sophisticated software for a specific stage in the ASIC/FPGA/Structured ASIC design flow. The problems typical EDA start-up companies’ faces are: They have to make sure that the algorithm on which they built the software delivers both the performance (how fast the tool can analyze the design sources say RTL or Netlist) and the design capacity it can handle (20 million gate design is very typical now-a-days), the output of the tool should be compatible with the tools from the four big giants (Cadence, Synopsys, Mentor and Magma) so that the designer can take the output from the tool and use it in the next steps of the design flow with the implementation tools. The tool interoperability issues are always a painful task for the EDA engineers.The success rate for the EDA startups is very less for the reason said earlier. They have to see a way where their tool suite can be seamlessly integrated. So for EDA startups, the De-Coupling point lies in effectively addressing the design complexities, able to handle higher capacity and performance that the competitor tools. But the companies have to keep in mind that de-coupling point might shift to higher point in the value chain.
For an already big company like Cadence, Synopsys and Mentor, their de-coupling point doesn’t lie in the new tool offerings, but rather understanding the design flow gaps in the already offered tools and quickly filling it out and thus enabling a better and fully integrated platform.
If anyone out there has any ideas/opinions on the de-coupling point…please comment….
Floorplanning is an visual art. I believe this process can never be fully automated. What I mean by this, you cannot just simply push a button and let the tool do the job and give you a production quality floorplan the very first time.
You can automate the process, but the user should still drive the process. He should be able to define all the requirements and let the tool do its job and then review the result.
The intent of automation at the floorplanning level is only to reduce TAT and arrive at your final result pretty quickly. The user has to understand that there can be never one flow/size fits all and since each design is unique, you might need provide all inputs to the tool sothat it can converge on a good quality floorplan in decent time and the user can spend just few hrs in making it a production floorplan as opposed to spending couple of days/weeks.
Often there might be cases when we dont know whats the cause of congestion. You can never point at one source and this contributes to congestion. This has to be tracked from very early in the flow (logic synthesis).
I will try to highlight some of the things to check based on my experience. Also,by no means this is complete reference .
1. Does your cell count looks suspicious? even though area is comparable, but if cell count is pretty high, P&R tools might have a problem in placing and routing so many cells and leads to congestion.
2. Is synthesis tool using lot of complex gates or are there any big muxes infered from RTL itself, then you might need to recode the RTL making routing job easier.
3. If RTL is OK, and synthesis is inferring complex gates, it might help yet times to decompose those logic.
4. Some times logic restructering with cone depth greater than default set in the tool will help..
5. check if you are over contraining/are giving very agressive slack targets to logic synthesis tools..
6. See if you can flatten some smaller modules where constraints are not set..this helps all optimization commands
7. Some times dont touch/force keep attributes prevents synthesis tools from remapping
8. Check the library to see if any functionally equivalent cells with smaller area footprint should have been used…for example, if you have hidden a DFFS flop and only DFFRS (set-reset flop) is available, you are adding one more pin and higher cell area to be used..check for these..
9. Often incorrect constraints , it can be synthesis or timing or floorplanning or placement constraints also leads to the problem. If the problem is here, dont expect the tool to override these as its a user issue and the tool tries to honour the constraints
10) If you see too many level of logic, you might want to collapse them . One more point that pops up in this context is hierarchy maintainence. Check if you can do selective hierarchy maintainence and if its correctly setup.
11) Check for HFN
12) Check secondary cost function objectives
13) If DFT is already done, check the number of Testpoints (TP) inserted . you might be inserting too many TP for very small cov gain. You need to consult with your DFT team to quantify how much you can really sacrifice. It varies for every design
14) Check if a particular block/module can be optimized for area while the timing critical part of the ckt can be optimized for delay.
15) Majority of the congestion issues can be traced to floorplan.
Things to check at floorplan stage in no specific order:
a) Is the congestion around channels between macros? You might need to resize the channels sothat all macro pins can be accessed by the router.
b)See if you are wasting too much space for channel/island widths etc..you might not need channels all the times..an example could be for CAM’s where a pair macros have to be aligned and you can abut the macro pair on side where there no pins. This will save some space.
c) Check the pin density forthe overall block.
d) Check if there are routes around the macro corners
e) check the average & peak track overflows on each metal layer. This will give you hints what can be the reason.
f) Use blockages cautiously
g) did you set the highest routing layer incorrectly
h) is too much wire causing the issue ?
i) are the endpoints which are connected placed far apart. Need to check why ?
j) is it because of scan chain wirelength? Did the scan optimization happened correctly. Yet times, incorrect scan order constraints prevent the scan chain wirelength optimization.
k) check the scan repartitions (often tools like LV will print what are compatiable grps and so what can be reordered and optimized)
l) Check if the tool is buffering a lot to fix the timing issues..
m)check to see if the pre-placement of analog blocks etc are in optimal location ?
n)check to see if the floorplan grid is defined and set correctly..If incorrect, you are wasting some routing resources. Setting right and efficient routing constraints are essential to get best routing results .
o) check to see if the MBIST controllers are placed at optimal locations to the memories they control. Optimal sharing of memories and mode of MBIST sharing (whether serial/parallel) also pays siginifcant role
p) If the macro placement floorplan provided for MBIST insertion is different from the actual floorplan you are trying to comeup with, then expect congestion issues as the memories shared are not correct in the revised floorplan. Floorplan changes should be only incremental . This is bit subjective and has to reviewed on case by case basis
q) check the cell density. Are there any decent empty spaces in the floorplan while there are some spaces that are heavily congested. This happens if the tool is trying to squeeze the logic inorder to meeting timing.
r) Check for the overlaps. Is there enough space for all the macros/std cells to legalize .
s) check your power planning ( power mesh/rail creation)
If you checked all the above and you still cannot resolve, may be you are trying to stuff too much logic and you might need to expand/grow your floorplan. This often involves top level floorplanning changes.
If anyone has any other suggestions/tips for congestion anlaysis , or if you have any ideas or methods on predicting the congestion itself, I would welcome that feedback as well…
Lets face it. You have run the logic synthesis/physical synthesis tool and you have a problem. You havent met slack. Now what? where do you start? Each issue depends on the design/process node, I’m going to keep it simple and will list few pointers/tips as to where we should be looking for. These are just for guidance and you as Designer/CAD engineer/applications engineer have to dig deep and find the root cause. I’m assuming you have stopped the flow right after global placement and routing and havent entered CTS ( ofcourse it doesnt make sense even to do CTS when you failing timing )
Before , we start discussing about debugging timing issue/QOR, I’m assuming that the floorplan is of good quality. Please remember that a bad floorplan can give you a bad QOR no matter how hard the tool tries. Dont even both to do any debugging. Correct your floorplan first. If you are still in the early exploration phase, then dont complain about QOR, but rather concentrate on the correct by construction approach to give you best results.
Lets use the old and well known divide and conquerer approach. Ok now. lets break down the problem into 2 areas.
1. Your logic synthesis tool did the good job and its your physical synthesis which made the things worse.
2. The QOR after your logic synthesis is already bad.
Below is a sort of check list like what you want to do if you have issue with the above 2 items.
A1) What was the critical path looking like ? Is is a datapath or Register – IO path, IO-IO path or some macro-reg path?.
Look at the timing histogram and see how many paths fail .This gives you a good idea on how bad the timing looks like on your design. Check the top 15-20 critical paths. If it is a IO path, check the IO constraints. Also, relax the IO margings sothat they dont fail and become critical, do incremental timing optimization . Now check again, how the timing looks like. Is the critical path a reg-reg path? If it is a reg-reg path, then check step 1.1b
A2) For register-register path, check the detailed timing path and see
A2.1) Check if its a High Fanout issue, and if yes, whether the path has been buffered/cloned correctly?
A2.2) Also, check if the path is overbuffered/under buffered?
A2.2) Check if the tool has picked up correct drive strength cells? Yet times, tools dont upsize/down size the cells correctly and as a results, more buffers/inverters are added making the timing worse. Remember in 65nm a buffer has around 50ps delay.
A2.3) Also check if the tool has picked up a multi-stage cells. It is always a good idea to give the freedom to the tool to pick a cell and buffer it rather allowing the tool to select a available multistage cells. If you are trying to extract every pico second out of the tool, you might want to check this.
A3) Check if the tool complains about congestion. If you see congestion, then you might want to check the utilization. A quick visual inspection of the floorplan will tell you if there is more space available in the primary inner shape for the tool to move around. If this is the issue, then you have a placement issue.
A4) If Cloning/Buffering doesnt seem to the issue, then cross probe the timing path into the layout window using fly lines and see how the path is laid out. Is the path very long and jogging all around. If yes, then its a global placement/routing issue.
A5) Check the macro placement. How does it look? Does it look optimal? Did you create the halos around the macros? What about Placement blockages? If these are missing, please provide them and re-run the physical synthesis.
A6)If you are utilizing the auto floorplan capabilities of modern tools, then better check the quality of floorplan. Many tools have issue in creating a good and optimal rectilinear floorplans.
A7)If all else looks good, check the number of logic levels for register-register timing path. If this seems to high or suspicious, then check the logic levels at the of logic synthesis. IF you still the same thing there, then you have a logic synthesis issue rather than physical synthesis issue.
A8) To debug a logic synthesis issue, check the timing for the same path you see at the end of physical synthesis in front end STA. Check if it datapath or a control path or some kind of distributed logic sitting and getting shared between two modules.
A9)For datapath, check if the tool has picked the correct architecture. For example in case of adders,it might have selected ripple, but may be it could have selected carry-look ahead adder. similarly check if the tool can pick better architectures for other datapath components
A10) It is a control path, then check how the logic is being written in RTL. Whether if it is a deeply nested if-else logic with mutually exclusive conditions. should we really create a deep mux chain. How was the case logic written. May be the tool is inferring a big shifter, but only partial shiter is used. So the tool unnecessarily created the big shifter logic there.
A11) Yet times, it is problem of optimization itself. May be the tool couldnt have knocked off the extra registers by doing more agressive constant flop optimizations and dead code removal .
A12) Some times mistakenly users set unnecessary dont touch (synopsys) or force keep (magma) or they mess with the configs where they insist the tool the retain floating logic. Be careful in what you want to retain or what can be knocked off.
A13) Yes, the world is not flat always . Some users want to keep all the hierarchy. You dont want even that. Many optimization alogorithms works best when there is no hierarchy. Whenever there is a hierarchy, the scope of the optimizations is limited to within that module. So, only retain the hierarchy on which the timing contraints or present or when there are special requirements from other 3rd party tools regarding maintaining the hierarchy they introduced. So, check if any of the logic in the critical path can be flattened.
A14) Sometimes hiding the high drive strength cells in the library or preventing the tool from using very complex gates like XOR/XNOR and in some cases AOI/OAI cells helps to improve the timing. But this should not be done blidnly. Check the library and the cells the tool is picking. Then decide whether selective hiding is the way to go.
A15)Over Synthesis: Many users blindly push the tool to meet a high target slack in the design. I’m not referring to the clock uncertainity normally you account for. This target slack is applied only for front end synthesis. Be reasonable in what you want the target slack to be. Normally 15% of the clock freq is decent enough.
A16) Over Constraining : This is a setup margin you apply all across the flow till CTS is done. Again be reasonable in how much you constrain for. Over contraining can sometime break the alogorithms. The optimization alogorithms sees more negative slack that it really is and so inorder to meet slack, it tries over buffering/cloning/bad sizing and gives it up. Just for the record, this is not a bug in the tool. But it has to do more with constraining the design correctly.
A17) One more thing to check for is the slew limits and fanout limits and check if the tool is honouring them.
A18)Some times, because of the incorrect false paths or multi cycle paths set , you are mis guiding the tool. Remember folks, over exceptions kill the design. Dont set a false path unless it is needed . Perhaps setting a multi cycle path is way to go.
A19) Perhaps this should have been mentioned in the beginning. Quality of Contraints dictate your timing results. Bad constraints leads to bad QOR. So check your timing contraints before you start your timing analysis. There are many tools out there which can help you in this. As a thumb rule,
A19.1) check your IO constraints,
A19.2) check your exceptions ( multi cycle/false paths)
A19.3) Check if there are unconstrained nodes in the design? You should not have any uncontrained nodes.
A19.4) Check if there are more events happening on a given node? Say 12 or 16 timing events happening on the same node is not good sign. Check the node.
A19.5) Also check the timing event density . This will tell you if you have over lapping or conflicting contraints or if you have high number of timing events in the design.Either way, its not good. For example, some 3rd party DFT tools whether they write our post scan SDC, it some times large timing events on some nodes and this causes havoc on timing algorithms.
A19.6)Also check if there are any nodes where there are zero timing events. This is not same as uncontraining the design.
A19.7)Check the clock definitions and units .
A19.8)Check the generated clock definitions and whether the source clock is mentioned correctly.
A19.9) Check to see if there exists any cases in the design where clock becomes data. If yes, then timing analysis tools, will treat this as data node as opposed to clock node .
A19.10)Check if the case analysis contraints have been setup correctly.
A19.11). If you have a clock gating cell say CKG1 between 2 registers say FF1/Q and FF2/D , then the tool will see two paths : path from FF1/Q to CKG1/EN and CKG/OUT to FF2/D . But in reality, there is only one path FF1/D to FF2/Q and setup checks are done on FF2/Q. So, you have to tell the tool somehow to consider the delay through the clock gate and that the clock cycle time is from FF1/CK to FF2 . So the way you do it is you want to apply a negative margin at the clk pin of the clock gating cell and setup margin on the clock gating output pin and constrain the path.
A19.12) The cleaner and better the constraints are , the timing results will that much better.
Misc Tips: It is always good idea to study the library . It gives good idea on what cells and of what strengths are available in the library. It helps to fine tune your optimization and guide your implementation tools.