Programming is a form of human communication, mostly with other humans; incidentally also with the computer by instructing it to execute a function for us. Therefore, when implementing something we carefully need to consider how to communicate what and how things are done, similar to when we talk to each other about a manual task using natural language. We need to negotiate why and how we do things to ensure that everyone understand what to do and in which way. Otherwise, misinterpretation, confusion, and conflicts are pre-programmed.
With current software development techniques such as pull/merge requests, the phrase that code is read much more often than it is written or changed is probably even more valid than it ever used to be. Reading only makes fun and delivers the required insights if the literature you are reading is well written and not a convoluted mess of illogical mysteries. Similarly, reviewing a pull request only works well if the proposed changes are presented in a way that is understandable.
Clean and self-documenting code is an important technique to ensure a reasonable reading experience. However, the code itself mostly explains what and how something is realized. The really important pieces of information usually hide behind "why" questions. Why did I use this design over the more obvious one? Why this algorithm and not the other one? Why is this special case needed to fulfill our business requirements? Why do we need to make this change at all? These are actually the important pieces of information that will likely cause confusion sooner or later if omitted. Bugs might be introduced in future refactorings if business requirements and their special cases are not known to someone reworking the code. New features might break the intended design if it wasn’t clearly presented. Finally, a pull request review is much more productive and also more pleasing for the reviewer if requirements, design choices, and motivations are known.
Code comments can answer many of these issues if done properly and much has been written about how to create useful comments (e.g. [Atwood2006], [McConnell2004] chapter 32, [Ousterhout2018]). Good and concise recommendations are:
Comments augment the code by providing information at a different level of detail. Some comments provide information at a lower, more detailed, level than the code; these comments add precision by clarifying the exact meaning of the code. Other comments provide information at a higher, more abstract, level than the code; these comments offer intuition, such as the reasoning behind the code, or a simpler and more abstract way of thinking about the code.
or more compact
Comments should say things about the code that the code can’t say about itself—at the summary level or the intent level.
So, the essence is always that we need further forms of explaining the things that the code itself cannot tell. While comments can well explain the current structure of the code at a certain point in time, they are actually pretty bad at explaining the evolution of a piece of software. Having a long change list at the beginning of each source file is probably not a good idea. What if that file vanishes or is split up at some point in time? This is where the commit log of version control systems such as Git comes into play as an important tool for communication.
The commit log in version control systems is the primary means of documenting the evolution of a software projects. Before Git, systems such as CVS or SVN didn’t have real options to manipulate the commit log after the fact. Whenever a commit was formed, it was more or less set in stone. Anything that needed to be reworked or fixed afterwards was a new commit and iteration was not possible. Therefore, one could easily end up with a commit log that could look like the following:
Implement cool new feature, also refactor foo Fix compilation issue introduced in previous commit Fix typos in new code comments and fix old bug in tape ejection Continue work Change feature implementation from switch case to state pattern Address review comments TEMP will continue tomorrow Finalize second feature
Such a commit log documents the editing history. This history definitely contains valuable information that goes beyond what the source code shows. However, is also contains a lot of noise that hides the essential messages such as the implemented features and their design decisions. In a pull request review such a commit log does not give the reviewer a concise picture of what has been done. When digging out old commits for debugging or understanding, a half-baked commit with message "TEMP will continue tomorrow" will leave the reader puzzled at best.
Some parts of this commit log could have been better even when using VCSs without rebasing features. For instance, TEMP commits could have been avoided at all and some commits could have been made more atomic and with better commit messages. But humans still make errors and need means to correct and improve. At least feedback from a code review could have never been incorporated into existing commits after the fact. Therefore, even with great editing care, when rebasing isn’t available or used, the commit log will continue to be a record of editing history. As such, it will never be an optimal tool for deliberate communication with other programmers.
With Git (and some more esoteric VCSs) including interactive rebasing support, we are now in the fortunate situation that commits are not immutable anymore and can be changed once new insights have been gained. Therefore, we can now use the commit log to tell a story about our implementation and the reasoning behind it, similarly to how code comments can be used. That way, the commit log becomes a tool that can be used consciously to document things the code itself cannot tell. By being a series of sequential code states, the commit log is particularly well-suited for describing the evolution of software and contained features / bug fixes.
Going back to the previous example, the commit log could have easily been rewritten to this more concise and descriptive version
Decouple foo by introducing the observer pattern Fix null pointer issue when ejecting tape Implement cool new feature Implement second feature
This gives a much cleaner and less cluttered view on what has actually been done. A pull request reviewer easily gets a high-level view on what really is contained inside a pull request. By having well-structured commits, the review of a PR with considerable size also becomes tractable because commits can be reviewed in isolation. The commit message body of each commit is the place where important why questions are answered. That way, a reviewer can understand the motivations and overarching decisions before looking at the first line of code. Confusion and misunderstanding are less likely in this situation and feedback becomes more valuable because the context for the proposed changes is clear. Also, from the perspective of someone digging up an old commit such a state gives more insights because each commit tells a cohesive story and can be understood in isolation.
When is a commit log suitable to aid in the typical reading tasks for code? What are the properties of good commits to achieve this? To my mind, there are two important aspects.
The first important aspect is to decide what forms a good commit. To my mind, the principles that lead to a good software architecture also apply to commits: cohesion, separation of concerns, or the single responsibility principle [Martin2014]. A commit is comprehensible if it does only one thing. That one thing can easily be described in the commit message and the commit can be understood without distractions and side tracks. Thus, each commit should be a change set with high cohesion and a single responsibility. If multiple things need to be done, apply separation of concerns and split up the work into multiple commits, one for each concern. This leads to more but smaller commits. Apart from usually being more comprehensible, such commits are also nicer targets for cherry-picking or reverts with less potential for conflicts. So, good commits and the ability to split work into small units also align with the ideas of agile software development, where many small increments are favored. It takes some practice understanding how to separate work into distinct commits. A good hint that work should be (have been) split is if the subject of the commit message is hard to find because the single purpose of the commit is not clear or important side tracks exist.
Apart from forming small and cohesive commits, the most important tool in each commit for communicating is the commit message. A lot has been written about how to create good commit messages. A good overview of different references is given in the blog post How to Write a Git Commit Message. Many of the references put a lot of emphasis on the style of the message. A cohesive and readable commit style ensures that the actual content can easily be discovered, especially when working with the Git command line. However, what is even more important is the content of the message. We want to use commit messages to guide a reader through the code and its evolution. Therefore, the commit message must clearly describe what was done and why. The what can often already be solved with the commit subject line, maybe a short additional paragraph. If not, the commit is probably too big. What should take more place is answering the why questions. Try to imagine what a future reader might want to know why something was done and in exactly this way and provide these answers upfront here in the commit message. It’s much cheaper taking a few minutes to do so than spending hours debugging because the important information is lost or impossible to find later on. Not including answers to any why question is the most common issue of bad commit messages that I have observed over the years.
The key to achieving a commit history that can be read like a book is iteration. When things don’t look well, you need to know the tools to improve the situation. For Git this is interactive rebasing and amending. Unfortunately, these tools need a bit of practice before they can be used fluently. I recommend the respective section on rewriting history in the git book or the tutorial Beginner’s Guide to Interactive Rebasing as a good starting point for learning interactive rebasing. Just as when learning test-driven development, take some time to experiment with the tools on a toy project.
So, whenever you propose some changes via a PR, first, take some time to review your commit history and clean it up in case something looks awkward or doesn’t explain the why questions before submitting the PR. Moreover, once you receive feedback, most of the time the proposed changes belong to one of the commits you proposed. So, instead of creating a new commit "address all PR comments", which doesn’t form a cohesive unit, add the requested changes to the existing commits where possible. Of course, some review feedback can best be addressed with a new commit, because the resulting changes form a cohesive unit. But especially small cleanup tasks don’t deserve their own commits and the commit log can be made more concise and descriptive by fixing up the original commits. Sometimes it even a good idea to completely rebuild the commits in the PR if larger changes become necessary. Don’t be afraid to do this.
Finally, in case your feature branch gets outdated, try to rebase it on main
instead of merging.
Linear commit histories are much easier to understand than many interleaved merge commits.
Only if rebasing becomes too complicated, resort to a merge commit instead.
In any case, if something goes wrong while rebasing, nothing is lost.
A messed up rebasing attempt can always be recovered with a git rebase --abort
.
You should not change history on any permanent branch such as main
or v2.0
.
All things discussed here are a tool for cleaning up proposed changes.
Once they have been accepted and become part of the mainline, commits should be treated as immutable.
Recovering a local Git clone from a changed upstream history is possible, but brings a lot of trouble and user will not be happy if this happens.
Why does the structure of test code actually matter? Why should one bother with achieving clean test cases if a convoluted test function in the end verifies the same aspects of the code? There are (at least) two good reasons that explain why investing work in the structure of test cases is important.
Although reliably verifying that code performs and continues to perform the intended function is probably the primary reason for writing automated tests, well-structured tests can serve more purposes for developers, most of them boiling down to communication. The disputable Uncle Bob Martin has coined a famous quote in this regard:
Indeed, the ratio of time spent reading versus writing is well over 10:1. We are constantly reading old code as part of the effort to write new code. … so making it easy to read makes it easier to write.
(Unit) tests play an important role in guiding a reading developer through a code base. If written with care, they tell stories about the requirements of the system under test, its architecture, and noteworthy corner cases to consider. The test code gives a second perspective on the system and frees readers from analyzing the actual implementation to understand its behavior and requirements. This is only possible if the author of the test cases intended the tests to convey this meaning and to fulfill the goal of guiding others through the code. To my understanding, communicating with test cases is one of the most important aspects about creating useful test code. Similar to code comments, there’s always more knowledge about a system than is told by its direct implementation and tests play an important role in communicating the additional information. Of course, as already noted by Eric Evans, any code should always do its best to tell the correct story:
The assertions in a test are rigorous, but the story told by variable names and the organization of the code is not. Good programming style keeps this connection as direct as possible, but it is still an exercise in self-discipline. It takes fastidiousness to write code that doesn’t just do the right thing but also says the right thing.
Efficiently communicating requirements and telling the story of the system that is not or cannot be conveyed by the production code is one of the most important aspects of tests apart from its function as an automated verification.
Another useful benefit of well-written tests is that they can help finding the source of bugs, especially the ones introduced during refactorings or changes to existing code. A badly written test case that fails cannot provide more information than "this huge mess of code doesn’t work anymore after your changes". However, with nicely arranged single-purpose test cases the message of a failing test case is much more granular and helps the developer by saying "most things work, only this special case isn’t handled properly anymore after your refactoring." Such a message makes it much easier to correct the introduced bug and usually avoids a lot of additional debugging work.
Although this might sound surprising, the most important techniques for achieving well-structured tests are not directly related to the test code. Instead, the key for being able to write good tests at all lies in the structure of the tested code. Highly coupled code, mixing multiple concerns, is as hard to test as it is hard to comprehend or to maintain. Therefore, applying well-known design patterns and techniques to the tested code base is the primary means for enabling good tests. Therefore, I can very much confirm the well-known claim of the TDD community that TDD leads to a better software architecture. However, this only works of some basics of good design are known. While software architecture and code structure fill books and decades of discussions and a comprehensive discussion is out of scope here, I will highlight two techniques that are of severe importance for being able to test. Surprisingly, most miscarried testing attempts I have seen in the past were the result of ignoring these basic software engineering principles: abstraction and dependency injection.
A key principle of software engineering is being able to reason about a specific problem by avoiding distractions from lower-level programming details. If I implement a complex algorithm for allocating parcels to carriers, I don’t want to think about network transport errors while finding out the capacity of each carrier in the course of this algorithm. Therefore, a common technique is to abstract from the ugly and unimportant details through a class or interface. Introducing an appropriate abstraction for "find out the carrier capacity" lifts reasoning in the capacity planing algorithm to a single level of … abstraction and removes the lower-level networking details from the reasoning process. By following this route, I am – in turn – able to write tests that focus on a single level of abstraction without mixing business problems with infrastructure concerns. Moreover, I can test business (capacity planning) and infrastructure (network communication) concerns in distinct tests. This makes the implementations and the tests easier to formulate and to understand.
Apart from simplifying reasoning, abstraction brings most value if technically realized using interface thinking as already described in 1995 in the classic gang of four book with their "principle of reusable object-oriented design":
Program to an interface, not an implementation.
That means our abstraction for "find out the carrier capacity" is an interface resembling the level of reasoning required for the capacity planning algorithm and the actual realization with the ugly details of network programming will then be an implementation of this interface:
// The abstraction at the level of the algorithm.
// We are only concerned about requesting capacities for carriers.
public interface CapacityRequestStrategy {
public int getCapacity(String carrierId);
}
// One implementation of the abstraction
public class NetworkedCapacityRequestStrategy implements CapacityRequestStrategy {
public int getCapacity(String carrierId) {
// all the ugly HTTP-details here
}
}
// The algorithm implementations is free from HTTP details
public class CapacityPlanner {
private CapacityRequestStrategy requestStrategy = NetworkedCapacityRequestStrategy()
public void planParcels() {
while (parcelsRemain) {
for (String carrier : availableCarriers) {
// No networking details here!
final int capacity = requestStrategy.getCapacity(carrier);
// do some magic to distribute parcels
// ...
}
}
}
}
Introducing such abstractions avoids having to deal with multiple concerns in the same unit to test and therefore also enables to formulate test cases reflecting the different concerns. Consequently, test become easier, because concerns are separated and each test only deals with a single concern and less cases to look at.
Given the code shown above, one might ask how to actually write a test for the capacity planning algorithm that doesn’t need mixing abstraction levels.
CapacityPlanner
still depends on the concrete network-based CapacityRequestStrategy
implementation.
We would have to employ fancy things like stubbing the HTTP API used to determine capacities to actually test this class, thereby again resorting to mixing abstraction levels.
Yuck…
Fortunately, a cure for this issue is pretty easy: dependency injection.
Instead of directly instantiating the NetworkedCapacityRequestStrategy
inside the CapacityPlanner
, let someone else provide an appropriate instance to the planner by passing it to the planner’s constructor:
public class CapacityPlanner {
private CapacityRequestStrategy requestStrategy;
public CapacityPlanner(requestStrategy CapacityRequestStrategy) {
this.requestStrategy = requestStrategy;
}
public void planParcels() {
// ...
}
}
Enabling dependency injection on a tested unit opens up the opportunity to install a test double [Fowler2006] inside the automated tests that never has to deal with the complexity of networking:
// This is a test double for the production CapacityRequestStrategy
class ConstantCapacityStrategy implements CapacityRequestStrategy {
private int capacity;
public ConstantCapacityStrategy(int capacity) {
this.capacity = capacity;
}
public int getCapacity(String carrierId) {
return this.capacity;
}
}
// This test can now be written without having to think about HTTP
class CapacityPlannerTest {
@Test
public rejectsParcelsIfNoCapacityRemains() {
planner = CapacityPlanner(ConstantCapacityStrategy(0));
assertThrows(
NoMoreCapacityException.class,
() -> { planner.planParcels() }
)
}
}
Installing test doubles is a real pain without dependency injection, because mocking would be necessary, which is pretty error-prone and usually depends on low-level programming language constructs, thereby bloating test cases with technical details.
Moreover, without an appropriate abstraction, the installed test double would again leak details into the algorithm discussions.
If an HttpClient
were injected instead of an abstraction, I could still install a double, but providing the appropriate behavior in the double would violate the abstraction level suitable for the test and the implementation by resorting to networking again.
Now that the preconditions for being able to write useful tests are met, I can outline my recommendations on how to write the tests themselves.
Apart from the fact that writing good tests is much easier if the code is structured well, the following sections describe a few guidelines that I would recommend following when actually writing the test cases.
Often, one can find test cases that look like this:
# floating point equality should be handled properly in real code
def test_it_works():
assert divide(1.0, 1.0) == 1.0
assert divide(2.0, 1.0) == 2.0
assert divide(1.0, 2.0) == 0.5
pytest.raises(ValueError):
divide(1.0, 0.0)
This test fails as a debugging aid, because the only feedback I get when something is broken is that… something is broken. The feedback on the level of failing test cases isn’t more specific, because there’s only on test case that either fails or succeeds as a whole. Wouldn’t it be much better if there was direct feedback from test execution that everything works apart from handling division by zero?
To avoid this trap it is much better to write one test case (function) per tested condition. A much better version with the same assertions could look like:
def test_same_numerator_and_denominator_is_one():
assert divide(1.0, 1.0) == 1.0
def test_numerator_higher_than_denominator_is_above_one():
assert divide(2.0, 1.0) == 2.0
def test_numerator_lower_than_denominator_is_below_one():
assert divide(1.0, 2.0) == 0.5
def test_division_by_zero_is_rejected():
with pytest.raises(ValueError):
assert divide(1.0, 0.0)
Now, in case my implementation of divide
does the math correctly but I just messed up with the exception handling, test results will immediately tell this picture and I know where to start debugging:
test.py::test_same_numerator_and_denominator_is_one PASSED [ 25%]
test.py::test_numerator_higher_than_denominator_is_above_one PASSED [ 50%]
test.py::test_numerator_lower_than_denominator_is_below_one PASSED [ 75%]
test.py::test_division_by_zero_is_rejected FAILED [100%]
Besides providing valuable debugging aids, building test cases per tested aspect also helps communicating requirements on the tested code effectively. I can now use the test case names to communicate what I require from my code to fulfill its technical, and more importantly, business value. Without such test cases, these requirements are often only implicitly represented in the code base. The test code therefore provides additional documentation and explanations that would otherwise be missing, but it cannot become outdated such as comments or external documentation could.
So, in case some of the following conditions match a test function or method, the test case should probably be split:
if foo: assert …
).
This is a telltale sign that multiple requirements are tested in a single test case.As a follow up on the previous rule of using test cases to verify individual requirements, another important aspect for the effective communication of requirements is that they are actually readable as natural language from the code. Humans are much better at understanding natural language than they are at reading highly abbreviated code.
Imagine the division by zero example from above were tested like this:
def test_exception():
with pytest.raises(ValueError):
assert divide(1.0, 0.0)
If this test fails, would you know which requirement is currently unmet in case test reports show the following?
test.py::test_exception FAILED [100%]
Probably not.
Therefore, use test case names to effectively communicate the imposed requirement on the system under test with natural language. A good test case name includes a verb and can be read as a sentence clearly expressing the verified requirement such as in:
def test_division_by_zero_is_rejected():
# ...
Some languages such as Kotlin allow making this even more readable by supporting (close to) arbitrary characters in method names:
@Test
fun `division by zero is rejected`() {
// ...
}
Especially when testing the actual business logic of a system, such a way of naming is of real value, because then business experts or the product owner can understand whether the developers have realized the correct requirements by browsing through the test case names (guided by the developers). In case a requirement was forgotten during implementation, this should become clear to business experts, because they can spot the gap in the natural language specification of what is tested. Therefore, by merely naming test cases correctly, a larger steps towards acceptance testing with value for non-technical stakeholders can be taken. Of course, also the next developer of a system or the future self will value expressive test cases that explain the system and its requirements clearly.
From time to time I’ve stumbled across a pretty interesting testing pattern. Instead of verifying the own requirements, tests were written against framework and language features. For instance, one case in Python looked close to this one:
def my_function(a_param: str) -> int:
if not isinstance(a_param, str):
raise ValueError("unsupported type")
return 42
@pytest.mark.parametrize(
"param_value",
[None, 42, 0.0, re.compile(r'.*'), ...]
)
def test_my_function_rejects_other_types(param_value) -> None:
with pytest.raises(ValueError):
my_function(param_value)
The affected project had decided to use Python type hints and mypy was used strictly. Therefore, all tooling was set up to write Python close to what a statically typed language would look and feel like. Yet, the test authors somehow repeated a lot of what the tooling was already enforcing using unit tests. Apart from the question where to stop in this specific case (there are indefinitely more types to test here), this approach largely increases the amount of test code to maintain and test runtime increases while creating close to no benefits. You should generally trust the tools you select to an appropriate level or you probably shouldn’t use them. Doing someone else’s work by verifying their implementation shifts a large burden on your own code base that will eventually result in more maintenance work without ever ensuring that your code actually continues to meet its own requirements.
One can say that the previous example at least exercises your own code and verifies that a single conditional works as expected. Even worse are situations like the following one.
class TestMyNetworkedServiceAdapter:
def test_requests_works(self):
with pytest.raises(ConnectionError):
requests.get("http://unknown-host")
Such experiments – most likely used to understand the functioning of an upstream library – remain in the test code surprisingly often. This is really nothing more than code bloat and doesn’t help at all for your own project. So just avoid this.
Of course, there are times when some bug or peculiarity of a used library is causing trouble for your code. But you have probably noticed this because of some missed requirement for your own code. Therefore, whenever possible, try to find a test case that reproduces issues with used tools through special cases of calling your own code. That way these special case tests contribute to the set of requirements imposed by your tests on the system under test and they remain valid even if you later decide to completely drop the buggy dependency. Not having to change test code alongside production code changes is always a good thing and reduces necessary work during refactorings.
Some test framework provide versatile features for writing tests in concise ways. Especially pytest has accumulated an enormous ecosystem of plugins for solving various (repetitive) tasks through syntactic sugar. The general recommendation with fancy tooling is to use it as a means of improving the documentation quality of test code and not for the sake of applying all plugins that are available. For example, despite using the common feature of parametrizing test cases for reducing the amount of test code, the following test function still weakens the documentation capabilities of the test:
@pytest.mark.parametrize(
["numerator", "denominator", "expected_value", "expect_exception"],
[
(1.0, 1.0, 1.0, False),
(2.0, 1.0, 2.0, False),
(1.0, 2.0, 0.5, False),
(1.0, 0.0, 0.0, True),
]
)
def test_divide(numerator, denominator, expected_value, expect_exception):
if expect_exception:
pytest.raises(ValueError):
divide(numerator, denominator)
else:
assert divide(numerator, denominator) == expected_value
While maintaining less code is always something valuable to consider, less code but with higher complexity and lower self-documentation abilities is probably not worth achieving.
This version of the tests is still better than the initial test_it_works
version, because the test runner reports success and failure for parameter combinations individually and debugging is easier, but the requirements are diluted.
Therefore, use such fancy feature for the purpose if making the requirement descriptions stronger, not weaker.
A good example where parametrization is beneficial is to increase coverage within a single requirement:
@pytest.mark.parametrize(
["numerator", "denominator"],
[
pytest.param(1.0, 1.0, id="basic case"),
pytest.param(2.0, 2.0, id="ne 1.0 works"),
pytest.param(-1.0, -1.0, id="both values negative"),
pytest.param(1.5, 1.5, id="fractional"),
]
)
def test_same_numerator_and_denominator_is_one(numerator, denominator):
assert divide(numerator, denominator) == 1.0
By using named parameters we can even increase the ability of the test cases to explain their exact requirements.
While the aforementioned guidelines mostly focussed on the design of individual test cases, there is also the question which tests to write and what will be the target of these tests. Tests are an extremely valuable tool greatly helping in the development process. Not writing tests is rarely a good option for close to any software project. Yet, test code is as important as the actual production code and therefore any test that is written adds to the size of the code base and increases maintenance efforts. Fortunately, implementing software is a creative process with a lot of freedom and whenever we want to realize a new requirement or fix a bug, we have the freedom to decide where and how to test it to weigh the benefits of tests with the drawbacks of adding more code. In this regard, [PercivalGregory2020] provides an interesting perspective on this problem:
Every line of code that we put in a test is like a blob of glue, holding the system in a particular shape. The more low-level tests we have, the harder it will be to change things.
Based on this idea they propose to favor higher-level unit tests when possible, and to drop down to testing individual units for specific problems. They also call this "testing in high and low gear". In the end, the test code will be a larger collection of loosely coupled tests using higher-level abstractions such as DDD application services, enhanced with a set of individual unit tests for covering and gaining confidence in complex or tricky cases. The low-level tests are highly coupled to the implementation code and therefore prone to changes alongside refactorings, but they will be relatively few. On the other end, the higher-level tests are also more likely to contribute to the aforementioned aspect of documenting business requirements ([PercivalGregory2020], p. 74). The proper functioning of a getter method is close to irrelevant to the business perspective, but whether I can pay in money and then request the final amount (indirectly through that getter) is a lot more relevant.
This perspective has a close relationship to the distinction between solitary and sociable unit tests [Fowler2014]. Testing on the higher level, including (parts of) the object graph below the high-level entrypoint, will result in a sociable unit test. The tested unit will include (most) parts of its production object graph and socially interacts with these real objects instead of being in solitude created via test doubles. Testing on high gear therefore prefers sociable unit tests.
How would it look like to test on the high level? Here’s a simplified example code base:
class Currency:
def __init__(self, euros: int, cents: int) -> None:
# assign members
def subtract(self, value: Currency) -> Currency:
# some actual implementations goes here
class BankAccount:
def __init__(self, balance: Currency) -> None:
self.balance = balance
def pay_out(self, desired: Currency) -> None:
new_balance = self.balance.subtract(desired)
if new_balance.is_negative():
raise ValueError()
self.balance = new_balance
Testing in low gear would mean:
class TestCurrency:
def test_subtract_works(self) -> None:
assert Currency(3, 0).subtract(Currency(2, 50)) == Currency(0, 50)
Using an explicit subtract
method is cumbersome.
Python provides operators and I could easily implement addition and subtraction operators for the Currency
class.
But my test for Currency
is highly coupled to the specifics of how this class is implemented and would need changes to reflect the new operators along this refactoring, thereby adding work to the refactoring task.
The high gear approach instead would look like this:
class TestBankAccount:
def test_paying_out_works_if_balance_is_high_enough(self) -> None:
account = BankAccount(Currency(3, 0))
account.pay_out(Currency(2, 50))
assert account.balance == Currency(0, 50)
In high gear I have simply left out the detailed test for the Currency
class, assuming that its proper functionality can be assured when my higher-level tests succeed.
Code coverage is a good hint to find out if high-level tests cover my lower-level units sufficiently or not.
This high-gear test is not coupled to the exact set of methods available on the Currency
class.
Refactoring subtract
to an operator would be invisible to this test and it could simply remain as is, causing less work during the refactoring.
The downside of this approach is that when something fails, I lose the exact feedback where the failure comes from. Fortunately, testing as performed by developers is not a form of black-box testing. I can try to judge from the underlying code structure and complexity if detailed feedback will be required or not. If things are complex, add some low-gear tests to get targeted requirements and debugging feedback. If the internal structure of a higher-level unit is quite simple and high-level testing does not hinder development and debugging much, then avoid adding low-level tests that do not add much benefit.
An important note on this procedure is that this mainly applies to testing the business logic of your code base. Even in high-gear sociable unit tests you should provide test doubles for persistence and IO. Otherwise, runtimes of your unit tests will start to increase and the benefit of instant feedback will be lost. Moreover, especially network protocols can have many interesting error conditions that are hard to test from a high-level business perspective. For these things your are better off isolating them in solitary narrow integration tests [Fowler2018]. Here, you can easily construct all kinds of intricate failure situations, which is often necessary to ensure the proper functioning of such adapters to external systems.
Dependency injection enables us to effectively swap out production collaborators for a tested unit with an implementation tailored to the specific test. These replacements are called test doubles [Fowler2006] and different techniques exist for creating such doubles. The most prominent ones are stubs and mocks. The term stub is used with slightly varying meanings in different place. The common use of the word stub doesn’t distinguish clearly between stubs and fakes as defined in [Fowler2006]. Any kind of implementation, either providing canned answers to expected calls, or providing a minimalistic implementation of the full protocol, is often blurred under the name stub.
On the other end of the spectrum are mocks. These test doubles are usually created through special mocking libraries in a declarative way by expressing the expected calls and potential answers to them. The second aspects lets mocks also act as stubs. However, as already outlined in [Fowler2007], there’s still an important difference: due to the fact that exact call sequences are used to define their behavior, mocks are coupled stronger to the exact way the tested unit interacts with the test double than fakes would be. Mocks primarily verify if the protocol spoken between the tested unit and the mock works as expected. When using stubs, the exact call sequences are of lower importance, because only the outcome is verified, not the steps towards this outcome. Therefore, stubs allow greater freedom for refactoring without having to change the test code, whereas mocks usually exhibit more coupling with the production code ([PercivalGregory2020], p. 51). Moreover, stubs (fakes) are often easier to comprehend. I can give meaningful names to my stubs and their implementations is usual and simple code that’s easy to read. On the other end, the mock declarations (when using typical mocking frameworks) are often harder to read:
mock = MagicMock(autospec=SomeStrategy)
mock.get_name.return_value = "static-error"
mock.compute_stuff.side_effect = ValueError()
That’s ok’ish to read, but a manual stub implementation (or fake implementation, to be precise) is easier to comprehend in most cases:
class RaisingStrategyFake(SomeStrategy):
def get_name():
return "static-error"
def compute_stuff():
raise ValueError()
5 instead of 3 code lines. An acceptable price given the gained clarity.
Another problem with mocks is that the declarations of desired functionality and expected calls are usually repeated (with slight variations depending on the actual test case) throughout the test code. Refactorings therefore often result in huge change sets in the test code, because mocking code has to be adapted close to everywhere.
So what is the recommendation here? First, if your code base is testable and uses correct abstractions with dependency injection, then using stubs is an easy task. This avoids one of the primary reasons for using mocks in dynamic languages such as Python, where mocks and monkey patching are used to overcome such design deficiencies. Therefore, my general recommendation – in line with [PercivalGregory2020] – is to use stubs whenever possible and when correct abstractions exist. If these abstractions are lacking, it might be time to add them.
However, also mock-style testing has its role.
Especially on the boundaries of a system, where interactions with third-party libraries or systems exist (i.e. ports and adapters), using mocks as a means of exercising and triggering specific boundary conditions is easier.
Moreover, for protocol adapters, expecting certain calls is a typical way of thinking and an essential aspect of verifying that the protocol is obeyed by the implementation.
Such a requirement naturally maps to the mock-style behavior verification ("When a new product is POSTed, the createProduct
method on my ProductManagementService
must be called.", "When requesting an HTTP resource, first open a channel, second, initiate TLS, …").
Therefore, using mocks with their focus on the spoken protocol is always a good idea when the actual interaction protocol is what matters.
This pretty much leads to the testing strategy outlined in [Richardson2018], p. 309.
Hopefully, the explanations in this post have at least shown that it is important to think about how to write good test cases. Well-structured test cases provide many benefits and once the mechanics needed to achieve them have become part of your coding muscle memory, the initial overhead of writing clean test code becomes negligible. Ultimately, clean test code is something many people including yourself will value sooner or later and should never be neglected.
I will first start to explain the details of the setup that we use, which is necessary to understand some parts of the bug hunt.
The server I am talking about is using Debian testing as its host system with Docker being installed using the provided Debian packages.
This daemon uses the automatically created docker0
bridge (plus some additional networks).
For launching a completely disparate daemon to be used for Jenkins slaves, a different bridge network is required.
Unfortunately, Docker itself cannot configure anything else apart from docker0
:
The
-b
,--bridge=
flag is set todocker0
as default bridge network. It is created automatically when you install Docker. If you are not using the default, you must create and configure the bridge manually or just set it to ‘none’:--bridge=none
Therefore, we had to manually create a bridge device for this daemon, which we did using the Debian network infrastructure:
auto dockerbrjenkins
iface dockerbrjenkins inet static
address 10.24.0.1
netmask 255.255.255.0
bridge_ports none
Therefore, apart from separating the socket and runtime directories of the two daemons, one important difference is that the main daemon creates the bridge device on its own and the daemon for CI slaves uses a pre-created bridge.
What we have observed is that from time to time builds in the spawned Docker containers failed with DNS resolutions errors. Thus, the first thing we did was to manually start a container on the same daemon in which we tried to count how often DNS resolution failed by frequently resolving the same host. This wasn’t very often. Out of 450,000 requests across a range of two days only 33 or so failed. This didn’t match the expectation, because build failures seemed to occur more often. However, at least we saw some errors.
Due to the manually created bridge network, the internal DNS server of Docker is not involved and the containers /etc/resolve.conf
directly points to DNS servers configured on the host system.
Therefore, the first suspicion was that the configured DNS servers had sporadic problems.
Because of firewall settings, no external DNS servers could be reached for a comparison of error rates on the affected server.
However, executing the same long-running experiment directly on the host and not in a Docker container showed no resolution errors at all.
Therefore, the DNS servers could not be the reason and the problem was reduced to the affected host system.
One thing that was still strange with the experiment so far was the low error rate and the observation that most resolution problems in production appeared at night. This is the time were nightly builds were running on the Jenkins instances. Hence, I restarted the periodic DNS resolution task and triggered one of the nightly builds and it immediately became apparent that the running builds increase the chance for DNS resolutions errors.
In order to get Jenkins out of the issue, I tried to mimic what Jenkins does on the Docker side by starting a loop that frequently creates new containers and removes them shortly after again:
while true
do
echo "next"
docker -H unix:///var/run/docker-jenkins.sock run --rm debian:latest bash -c "sleep 2"
sleep 1
done
Creating and removing containers was sufficient to increase the DNS resolution errors again.
This was the point in time where we had the idea to replicate this setup to the main Docker daemon.
Interestingly, the same loop that constantly adds and removes containers didn’t affect DNS resolution on this Docker daemon.
Therefore, we somehow came to the conclusion that this must be related to the different network bridges used by the two daemons.
Of course, comparing all parameters of the bridges using ip addr
, brctl
, etc. didn’t show any important differences in the parameterization of the bridges.
In order to find out what is actually going on, another experiment we did was to record the network communication on the system using tcpdump
.
Looking at the wireshark visualization of the packages you can see the following:
So, what is visible here is that the packages are first sent from the container’s IP (10.x) to the DNS server and something remaps them to originate from the public IP address of the host system (x.136). These requests are then answered by the DNS servers and routed back the same way to the container. Packages appear multiple times due to recording all interfaces of the system.
In case of the unsuccessful request, the pattern of packages changes:
Suddenly, nothing maps the requests to originate from the public IP address of the server and the requests never get replies. So, effectively the requests are stuck at the bridge network (10.x) and never reach the outer word. The DNS lookup then iterates all available DNS server, but with no success.
Why do are packages transformed the way visible in the first image?
Because the bridge device has no configured upstream interface and Docker itself instead configures iptables
to perform NAT and forwarding.
This somehow seems to break down in case containers are added and removed.
Therefore, another suspicion we had was that Docker reconfigures iptables
with every container change and this results in short outages.
So, we removed this possibility.
Docker can be configured to not handle iptables
at all using --iptables=false
.
Hence, we set it up that way and configured the required iptables
rules by hand instead via:
iptables -A FORWARD -o brtest -i externalinterface -j ACCEPT
iptables -A FORWARD -i brtest -o externalinterface -j ACCEPT
iptables -t nat -A POSTROUTING -j MASQUERADE -s 10.24.0.0/24 -d 0.0.0.0/0
However, nothing changed and DNS requests were still interrupted while containers were added and removed.
As described before, the only real difference in terms of networking between the two daemons is how the bridge interface was created. Since we couldn’t find any real difference in the bridge properties, the next thing that we tried was to find out whether the issue could be reproduced with the bridge device itself and without any Docker involved.
So the first thing was to replace the loop spawning new temporary containers with something that replicates network devices joining the bridge and leaving it again.
This is what Docker does internally using veth
devices.
Hence, the loop for interrupting DNS migrated to this one:
while true
do
echo "next"
ip link add vethtest0 type veth peer name vethtest1
brctl addif brtest vethtest0
sleep 2
brctl delif brtest vethtest0
ip link del vethtest0
sleep 1
done
brtest
is the name of bridge device the daemon is using.
This was also sufficient to reproduce the observed DNS resolution errors.
The next step was to also remove the Docker container in which DNS resolution is tested and to replace it with something that limits the DNS traffic to the bridge device.
Here, Linux networking namespaces come into play.
Docker uses them internally as well.
Basically, the idea is to create a virtual namespace of network devices in which only a limited set of devices is visible to processes that are executed inside the namespace.
A virtual veth
pair can then be used to attach to the bridge on the host side and to provide a single device that only communicates via the bridge in an isolated namespace for testing DNS resolution.
This way it is ensured that resolution never directly communicates with the outside world.
For easy replication, we have automated the whole setup and test through the following bash script:
#!/bin/bash
set -e
# teardown
function cleanup {
set +e
brctl delif brtest vethtest0
ip link del vethtest0
iptables -t nat -D POSTROUTING -j MASQUERADE -s 10.12.10.0/24 -d 0.0.0.0/0
iptables -D FORWARD -i brtest -o enp0s31f6 -j ACCEPT
iptables -D FORWARD -o brtest -i enp0s31f6 -j ACCEPT
ip link delete veth0
ip link set down brtest
brctl delbr brtest
ip netns del test
}
trap cleanup EXIT
ip netns add test
brctl addbr brtest
ip addr add 10.12.10.1/24 dev brtest
ip link set up dev brtest
ip link add veth0 type veth peer name veth1
ip link set veth1 netns test
brctl addif brtest veth0
ip link set up dev veth0
ip netns exec test ip addr add 10.12.10.42/24 dev veth1
ip netns exec test ip link set up dev veth1
ip netns exec test ip route add default via 10.12.10.1 dev veth1
# change external interface name
iptables -A FORWARD -o brtest -i enp0s31f6 -j ACCEPT
iptables -A FORWARD -i brtest -o enp0s31f6 -j ACCEPT
iptables -t nat -A POSTROUTING -j MASQUERADE -s 10.12.10.0/24 -d 0.0.0.0/0
while true; do ip netns exec test python3 -c "import socket; socket.gethostbyname('example.org')" && echo success; sleep 1; done
Even without Docker the errors still appear. This was the point where we basically gave up and requested help on the Linux netdev mailing list.
Fortunately, Ido Schimmel soon had an answer to the request:
The MAC address of the bridge (‘brtest’ in your example) is inherited from the bridge port with the “smallest” MAC address. Thus, when you generate veth devices with random MACs and enslave them to the bridge, you sometimes change the bridge’s MAC address as well. And since the bridge is the default gateway sometimes packets are sent to the wrong MAC address.
So the culprit here was to not assign a fixed MAC address to the manually created bridge device. The default behavior of constructing a MAC address from the (potentially changing) attached ports is actually pretty unexpected to us and I wonder what the original motivation for this was. However, with this information, network communication is finally stable.
]]>So far, autosuspend
has been pretty useful to suspend a server in case it was not in use.
However, what was missing was to wake up the system again to perform some schedule actions.
This was a long-standing user request and with version 2.0 I finally implemented this feature.
In addition to the existing configurable checks for currently ongoing activities that should prevent suspending the system, there are now additional checks for scheduled future activities.
Common examples include backups, planned TV recordings with software such as Tvheadend, or general times at which the system should be usable without delays.
These requested cases have now been implemented.
Thus, the autosuspend
logic has been changed to first periodically check for current activities.
If no activity exists for a specified time and the system is going to suspend, the new checks for scheduled activities are used to determine the closest time in the future at which the system is needed again.
If this time is far enough in the future, so that it’s worth suspending, a wake up shortly before the pending activity is scheduled (using an RTC alarm) and the system is suspended.
Otherwise, if the pending activity is close enough, the system will just wait for it while staying awake.
All timeouts are configurable, of course.
In addition to this new feature, many small details have been improved and new checks have been added.
The changelog gives an overview on the noteworthy changes.
With the addition of scheduled wake ups, autosuspend
is probably “the most versatile solution” for suspending and waking up a system.
A new package for Archlinux has been pushed to AUR and for Debian the updated package is currently going through unstable. In case someone would be willing to contribute packages for other distributions, I’d be glad to hear about that. In case of issues or feature requests, please use the issue tracking in the Github project.
]]>The first thing you have to do before being able to see a LaTeX document is to compile your document into (probably) a PDF. This process is a total mess in LaTeX and depending on the features and packages you use, requires multiple iterations of calling the compiler with some call to utility programs at the right time in between. This is highly complicated and hard to get right. Don’t even dare to create a custom Makefile for this purpose. I have seen countless examples of broken Makefiles for LaTeX that miss parts of these iterations, swallow important messages, or do not recover from errors. Instead, use an established build tool specifically crafted for LaTeX. A few options exist such as rubber (Ever tried googling for latex and rubber to get help?), arara or latexrun. However, my impression with all of them is that they are more or less unmaintained and lack important features. Thus, the only viable option is to use the Perl-based latexmk, which is one of the oldest build tools. It is included in every major LaTeX distribution.
latexmk is configured through a latexmkrc
file besides your main document..
This file is a Perl script.
Probably the first thing you have to do is to set PDF mode (who still uses DVI?):
$pdf_mode = 1;
In case your project contains custom classes that need to be added to the tex search path, you can use something like:
$ENV{'TEXINPUTS'}='./texmf//:';
Btw, the double slash instructs LaTeX to also search all subfolders of the specified texmf
folder.
After configuring latexmk, you have multiple options how to compile your document.
If you want a single compilation run, a latexmk main.tex
should be enough.
However, the real power of latexmk is continuous compilation.
If you start it with latexmk -pvc main.tex
, latexmk compiles your document, opens a PDF viewer and from then on continuously monitors your document (with all included files and images) for changes and updates your preview on the fly.
In case something got stuck and you need to clean your document folder from intermediate files, you can of course use your VCS (git clean -fdx
) or latexmk via latexmk -C main.tex
.
One common issue that often arises is that you have some images or other artifacts that should end up in the document, but they are in a format that is not compatible with LaTeX. Thus, you first have to convert them to a different format (for instance, PDF for vector graphics or JPEG for pixel images). People usually do this conversion manually and put the resulting files into their VCS, too. But as usual with generated files, in case someone changes the source image, you have to remember to manually regenerate the LaTeX-compatible output file, too.
With latexmk you can get rid of this manual and error-prone process by letting the build tool do the conversion automatically.
For this, you have to configure conversion rules in the latexmkrc
file.
These rules used file extensions.
In case a document contains a reference to foo.pdf
(e.g., an \includegraphics{foo.pdf}
) and this file doesn’t exist in the source tree, latexmk uses the available conversion rules to find a file with the same basename that can be generated into the desired PDF file.
For instance, automatically converting SVG vector images and Graphviz .dot
files to PDF can be achieved with these rules in the latexmkrc
:
add_cus_dep('svg', 'pdf', 0, 'svg2pdf');
sub svg2pdf {
system("inkscape --export-area-drawing --export-pdf=\"$_[0].pdf\" \"$_[0].svg\"");
}
add_cus_dep('dot', 'pdf', 0, 'dot2pdf');
sub dot2pdf {
system("dot -T pdf -o \"$_[0].pdf\" \"$_[0].dot\"");
}
You can even generate LaTeX sources from other documents in case you have tools that output the required .tex
files.
I have used this to automatically convert online questionnaires to TeX for the appendix of my thesis using my own quexmltolatex:
add_cus_dep('quexml', 'tex', 0, 'quexml2tex');
sub quexml2tex {
system("quexmltolatex -p \"${\basename($_[0])}\" \"$_[0].quexml\" > \"$_[0].tex\"");
}
Another currently popular tool for diagrams is draw.io. With the following rule and my own drawio-batch you can automatically convert those diagrams to PDF:
add_cus_dep('xml', 'pdf', 0, 'drawio2pdf');
sub drawio2pdf {
system("drawio-batch -f pdf \"$_[0].xml\" \"$_[0].pdf\"");
}
For further useful conversion rules, have a look at this GitHub repository.
After configuring images for automatic generation, remember to exclude the generated outputs from your VCS (.gitignore
).
In case you want latexmk to clean up the generated images on -C
as well, the following setting is required:
$cleanup_includes_cusdep_generated = 1;
In ancient times, using non-ASCII characters in LaTeX documents was a nightmare and special codes like \"a
had to be used in case you wanted an ä
.
This is still documented this way in many places on the internet, despite being horrible to type and to read.
Nowadays, you can more or less safely use common umlauts etc. by using UTF-8 encoded files with the correct header declaration:
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
Another common source for problems with umlauts is the bibliography. The ancient BibTeX has many issues with UTF-8. However, BibLaTeX and biber (see below) are designed to work with UTF-8.
Apart from using UTF-8 consistently, I recommend following the convention of placing a single sentence of your document per line of the source file.
The rationale is that this make VCS diffs easily readable without requiring additional options like --word-diff
for git to track changes.
This would be necessary if you put each paragraph into a single line or, even worse, if you configure your editor to automatically rewrap paragraphs at something like 80 characters line length.
Don’t do this.
Even a single change at the beginning of a paragraph can reformat the whole paragraph then, resulting in massive and unreadable diffs.
Citations are a necessity for scientific work and BibTeX has been used since ages for generating the necessary bibliographies and citations. Most conference templates still use it today. However, BibTeX has some serious issues. Apart from the aforementioned lack of UTF-8 support, modifying style files is hard and it lacks support for many important fields in the database entries. Thus, there are some replacements for BibTeX, with natbib previously being a good guess. However, nowadays BibLaTeX (mind the difference, which Google loves to swallow) is probably the most versatile and handy solution you can and should use.
The most important thing to do is to use BibLaTeX together with the biber .bib
file processor instead of the old bibtex
binary.
Only this way you gain full UTF-8 support, document validation, filtering capabilities.
Thus, you should load the package at least with the backend=biber
option (should be the default in modern versions).
\usepackage[backend=biber,style=alphabetic]{biblatex}
This way, you can use UTF-8 safely in your .bib
files.
BibLaTeX has an extended set of entry types and fields in the database, which is very well documented in the “Database Guide” section of the BibLaTeX manual.
For instance, new types include @online
for websites, @patent
for patents, @standard
for standardization documents, and @report
for research reports.
Moreover, many new fields for entries exist that make filtering the database easier and give bibliography and citation styles more possibilities to include the relevant information in an appropriate format.
Especially the doi
field is important nowadays support readers in finding the exact publications you are referring to.
A more ore less complete entry for a conference paper for BibLaTeX might look like this:
@inproceedings{Yu2015,
author = {Yu, Jingjin and Aslam, Javed and Karaman, Sertac and Rus, Daniela},
bookpagination = {page},
booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
date = {2015},
doi = {10.1109/IROS.2015.7354122},
eventdate = {2015-09-28/2015-10-02},
eventtitle = {2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
isbn = {978-1-4799-9994-1},
pages = {5279--5286},
publisher = {IEEE},
title = {Anytime planning of optimal schedules for a mobile sensing robot},
venue = {Hamburg, Germany}
}
All database entries are based on a declared schema, which can also be validated by biber using the --validate-datamodel
command line switch.
This can be enabled in the latexmkrc
file by overriding the default biber call:
$biber = 'biber --validate-datamodel %O %S';
This way, errors in the bibliography database will automatically be reported.
Regarding the contents of individual entry fields in the database, there are still some common pitfalls and illogical things being documented in other places:
Double curly braces instruct BibLaTeX/BibTeX to interpret a string or value exactly as written.
You don’t want to do this everywhere automatically.
For instance, this prevents name parsing and {{Peter Miller}}
will not be sorted along his surname, neither will his forename be abbreviated in styles that do so.
Moreover, setting paper title in double curly braces will also disable the automatic title case in the IEEE style.
In case you are citing documents in different languages, define at least the langid
field of the entry to ensure that proper hyphenation is used in the bibliography.
langid
is passed to the [babel] package.
Thus, all identifiers that [babel] supports are accepted here (e.g., american
, french
, british
).
The same identifiers can also be used in the language
field to add a note to bibliography entries in case a cited document is in a different language than the document itself.
Unfortunately, the values here are not validated by biber.
You can add the following code to your preamble to ensure that the provided language is known by BibLaTeX (hopefully this will still work in the future):
\makeatletter
\DeclareIndexListFormat{language}{%
\ifboolexpr{ test {\ifbibstring{#1}} or test {\ifbibstring{lang#1}} }
{}
{\blx@warning@noline{Unknown language '#1' in 'language' field of\MessageBreak '\thefield{entrykey}'}}}
\AtDataInput{\indexlist{language}}
\makeatother
Although you could continue to simply use \cite
to cite bibliography content, BibLaTeX has a lot more utility commands to simplify common situations.
Generally, it is a good idea to switch over to the \autocite
command.
The behavior of this command (inline citation, footnote, etc.) can be configured using a pacakge option.
Thus, you can easily change the way citations are printed with a single change to package options.
In case you want to use a publication as a subject or object in a sentence, \textcite
can be used.
This command will automatically insert (depending on the configured citation style) something like the author names so that a real subject exists.
For instance, “\Textcite{Foo1999} shows that” might result in: “Foo and Bar [FB99] shows that”.
The capitalized versions of the commands ensure that the resulting text starts with a capital letter at the beginning of sentences.
It is also possible to extract individual keys from bibliography entries using special citations commands.
In case you want to highlight a seminal book or something like that you can use \citetitle
and author names can be gathered using \citeauthor
.
This way, things are not duplicated and potential typos can be corrected in a single place.
Often, you do not maintain your .bib
file manually but instead you use a reference manager like Mendeley or Citavi.
Thus, you only have limited control of what will end up in the generated file.
For instance, the entry Yu2015
shown above is generated by Citavi and contains a huge level of details.
BibLaTeX will happily print all available details in the bibliography and thus even the conference dates will be printed.
This might be acceptable for a longer document, but in a space-limited conference paper, this eats all your space and might also violate the publication guidelines.
You could delete the offending keys for the .bib
file manually, but that has to be redone every time you regenerate the database.
Moreover, if you share the .bib
file between different documents with different requirements on the printed fields, this won’t work at all.
Fortunately, BibLaTeX in combination with biber allows declaring filters and maps to fix up such issues. In your preamble you can use something like the following to delete fields depending on the entry types:
\DeclareSourcemap{
\maps[datatype=bibtex]{
% remove fields that are always useless
\map{
\step[fieldset=abstract, null]
\step[fieldset=pagetotal, null]
}
% remove URLs for types that are primarily printed
\map{
\pernottype{software}
\pernottype{online}
\pernottype{report}
\pernottype{techreport}
\pernottype{standard}
\pernottype{manual}
\pernottype{misc}
\step[fieldset=url, null]
\step[fieldset=urldate, null]
}
\map{
\pertype{inproceedings}
% remove mostly redundant conference information
\step[fieldset=venue, null]
\step[fieldset=eventdate, null]
\step[fieldset=eventtitle, null]
% do not show ISBN for proceedings
\step[fieldset=isbn, null]
% Citavi bug
\step[fieldset=volume, null]
}
}
}
As you can see with the volume
field in @inproceedings
, this can also be used to fix at least some of the annoying errors that most reference managers do.
Btw, for Citavi, be sure to use the export style for BibLaTeX and not the BibTeX one.
Another useful application for source maps is to find your own publication in a list of publications based on the author name for printing a separate list of these publications.
\map[overwrite=true]{
\step[fieldsource=author, match=Wienke, final]
\step[fieldset=keywords, fieldvalue=ownpub]
}
With this map, every publication containing my surname in the author
field gets the keyword ownpub
.
I can then use something along the following lines to put my own publications in front of the general bibliography:
\printbibliography[heading=bibintoc,keyword=ownpub,title={Own publications}]{}
\printbibliography[heading=bibintoc,notkeyword=ownpub,title={Other publications}]{}
In some situations you have to know what you are doing to achieve results that correctly reflect the rules of typography. Some of those are of general nature, but some also are specific to LaTeX.
Typography knows different types of dashes and not every dash you will be using is represented as a minus sign in the source code. LaTeX know the following dashes:
-
: a hyphen so combine compound words or for hyphenation--
: en-dash, somewhat longer than the hyphen---
: em-dash, even longerDifferent people promote different rules for when to use these different dashes. Just have a look at this Stackexchange question to get an impression of the different schools, pick the convention that you like best and apply it consistently. Knowing the difference and using the different visual pauses the dashes types create in the text flow consistently and with purpose is the most important aspect.
LaTeX distinguishes between different types of spaces. Depending on the conventions commonly used in different countries, spaces between words might smaller than the ones between sentences. Thus, LaTeX must be able to distinguish these two types. Generally, a space is treated as an inter-word space as long as the character before the space is not a period. A period is thus interpreted as the end of a sentence and the following space is assumed to be an inter-sentence space. Thus, if you use an abbreviation with a period at the end, you have instructed LaTeX that this period does not end the current sentence by escaping the following space:
This is a sentence w.\ a forced inter-word space.
If you don’t do this, you randomly get larger spaces after abbreviations that interrupt the visual flow.
One exception to this rule is if the period is preceded by a capital letter.
In this case, LaTeX assumes an abbreviation and continues to use an inter-word space.
You can reverse this behavior by placing an \@
before the period.
This is done by XYZ\@. This is a new sentence.
If you are using common abbreviations in English, e.g. this one, i.e. exempli gratia, you can use the foreign package to get commands that handle the spacing correctly:
\usepackage[abbreviations,british]{foreign}
...
This is an example, \eg{} spacing is right here.
british
in the package options here means that no comma is placed automatically behind the abbreviations.
Placing correct quotation marks requires some thought, especially in other languages than English. For instance, in German you would have to type this to get correct opening and closing marks:
\glqq{}this is the quoted text\grqq{}
If you want to avoid remembering how each language and publisher style handles quotation marks, simply use the csquotes package, which automatically selects the correct way for the current document language:
\textquote{this is the quoted text}
If the quotation is actually from another publication that has to be cited, you can use the following shortcut to also get the correct citation through BibLaTeX:
\textcquote[42]{Yu2015}{this is the quoted text}
As you can see, this also supports the typical pre and post notes for BibLaTeX citation commands to place page numbers.
If your quotation is a sentence that ends with a period, some languages have different rules whether to include the period in the quotation marks or not. csquotes can handle these cases automatically (and with configuration options), but you have to indicate the final period manually:
\textcquote[42]{Yu2015}[.]{This is the quoted text and a full sentence with period}
Finally, it is common to strip some parts of quoted material or add annotations to make it understandable in the current context. csquotes provides command for these purposes so that you do not have to remember the exact rules in each language. For instance, to replace some fraction with a summary you can use:
\textcquote[42]{Yu2015}[.]{This is \textelp{garbage} with period}
\textins
adds an annotation without indicating the omission of material.
LaTeX usually automatically hyphenates words automatically.
However, this behavior is switched off as soon as the work contains a hyphen to create a compound word such as architecture-aware
.
Suddenly, LaTeX will not hyphenate the individual parts, despite knowing hyphenation patterns for them.
This will likely result in overfull hbox warnings.
The same will also happen for other special characters such as /
.
To avoid this issue, the hyphenat package provides commands that still allow hyphenation of individual parts.
Thus, architecture/building-aware
becomes:
architecture\fshyp{}building\hyp{}aware
Not exactly nice to read, but it does the job.
Some things require consistency and are repetitive and I will shortly introduce some packages for such cases.
In case you are typesetting numbers with physical units or numbers in general, some typographic rules have to be followed for a correct representation. Fortunately, the siunitx package implements these rules, selects appropriate ones automatically, and makes them configurable globally. For software engineers, the following package option also load binary units:
\usepackage[binary-units=true]{siunitx}
\sisetup{detect-all}
The second line detect all features of the document font to adapt the number formatting accordingly. Afterwards, you can do such crazy things and everything will be set correctly:
We measure in \si{\kilogram\metre\per\second}.
The brightness was \SI{.23e7}{\candela}.
The experiment resulted in \SIlist{0.13;0.67;0.80}{\milli\metre}.
We need \numlist{10;30;50;70} participants.
The angle was \ang{12.3}.
If you have to mention dates and times frequently, the datetime2 package provides macros to do this consistently:
\usepackage[useregional]{datetime2}
% ...
We took the measurements at \DTMDate{2017-08-07}.
In case a) you like code structure, b) use inline enumerations, and c) want a package for this, use paralist. This sentence could then be realized as:
In case
\begin{inparaenum}[a)]
\item you like code structure,
\item use inline enumerations, and
\item want a package for this, use paralist.
\end{inparaenum}
LaTeX is very strict about its typographic rules and in case it can’t make some part of your document fit into the typographic rules, it will generate a warning and will break the rules in the result document. The common result is that line of text will flow out of the text border because no valid hyphenation would result in resolving all constraints. This is a common cause for the “Overfull hbox warnings” most people see and ignore in their documents. While some obviously need fixing, because the page margin gets flooded, others are harder to spot.
So what to do about these issues? First of all, these warnings shouldn’t exist in the final document and in case one of them appears, you should be able to easily spot them. This is best achieved if usually there are no warnings at all. However, many packages create some warnings when being included that can simply not be solved and are acceptable. These warnings flood the error window of your editor and make it hard to detect the actual problems.
To avoid that acceptable warnings hide unacceptable ones, you can use the silence package for hiding warnings based on the source package and string matching for the warning message. As several unfixable warnings appear already when importing packages, it is usually a good idea to include the silence package as the first package:
\RequirePackage{silence}
\WarningFilter{scrreprt}{Usage of package `titlesec'}
\WarningFilter{scrreprt}{Activating an ugly workaround}
\WarningFilter{titlesec}{Non standard sectioning command detected}
\WarningFilter{microtype}{protrusion codes list}
\WarningFilter{latexfont}{Font}
\WarningFilter{latexfont}{Some font shapes}
\documentclass[...]{...}
As said, while some overfull hbox warnings can easily be spotted, other are harder to catch. However, when adding the following line to your preamble, a thick black bar marks every line that even slightly violates margins. This makes spotting them much easier.
\overfullrule=2cm
The following image shows the visual result of this setting in case of an overfull hbox.
In contrast to the previous examples, where LaTeX created lines that were too long, you sometimes also get “underfull hbox” warnings.
This happens in case something doesn’t fill the space it should.
A common source for this are manual line breaks using \\
.
If you can, avoid these and use better suited constructs where possible.
At least, don’t end a paragraph with a forced line break (\\
followed by an empty line).
This will always create a warning.
Even though LaTeX is already pretty good at producing high-quality documents, a single package can still greatly improve the situation: microtype. This package improves the visual appearance of the documents by fiddling with the different spacings on the level of individual letters. This creates a document where the visual weight of the different letters is reflected in the way they are laid out on the page. For instance, the text borders get much smoother this way. Another side effect is, that documents with microtype being enabled usually have less bad boxes.
As a small example, this document compares the layouting of the same text in two narrow columns:
\documentclass[a5paper,twocolumn]{article}
\usepackage[english]{babel}
\usepackage{blindtext}
\usepackage{geometry}
\geometry{
a4paper,
total={100mm,257mm},
left=55mm,
top=55mm,
columnsep=2.8cm,
}
\usepackage[activate={true,nocompatibility},final,tracking=true,kerning=true,factor=1100,stretch=10,shrink=10]{microtype}
\begin{document}
\microtypesetup{activate=false}
\blindtext
\newpage
\microtypesetup{activate={true,nocompatibility}}
\blindtext
\end{document}
The compiled result is the following:
While the left column without microtype enabled produces 4 underfull hbox warnings, the right one has only one.
Moreover, the right border looks a lot less jerky with microtype, partially because less words are hyphenated.
So, whenever possible, it is a good idea to enable microtype.
However, remember that at some places little shifts of individual words and letters might not be desirable.
For instance, the table of contents might be a place where you may want to temporarily disable protrusion using \microtypesetup{protrusion=false}
.
Cross-references are a common feature in close to any document. As soon as you have a floating figure, you probably want to reference it from the text. I’ve seen countless documents where people manually added the type of the reference target to the reference. Something along the following lines:
Please refer to Fig.~\ref{fig:xyz}.
First of all, you have to remember to put a non-breaking space (~
) to avoid splitting this.
Second, you have to remember for each type of reference, how you refer to it for being consistent.
Was it “Figure”, “figure”, “Fig.”, or “fig.”?
An if you decide to change this convention afterwards, this some manual work across the whole document.
Fortunately, there are different packages that solve this problem
In smaller documents and in case you have no desperate need to modify the default, the \autoref
command provided by the popular hyperref package is a solid choice.
The previous example would reduce to:
Please refer to \autoref{fig:xyz}.
Much easier and automatically consistent.
In case you want to change the way a figure is called, this can be done via the following code in the preamble:
\addto\extrasenglish{\def\figureautorefname{Fancy figure}}
In case you want more flexibility, cleveref is a solid package choice.
In longer documents that are primarily provided in printed form, referring to the number of a figure or chapter might not be enough to make the document easily readable. In case the chapter or figure is further away than the facing page, the reader has to start searching for it and providing a page number will make this much easier. Of course, you can to this manually:
Please refer to \autoref{fig:test} on the facing page.
Please refer to \autoref{fig:bla} on page \pageref{fig:bla}.
But how do you know whether the floating figure really ends up at the facing page or isn’t placed on same page as your reference? In that case, adding a page number just looks silly. The package varioref automates the process of adding these page references. If something close, it adds things like “on the previous/next/page page”, if targets are further ways, a page number is added. All this with some automatic variations to avoid sounding repetitive.
For using varioref, it is best combined with hyperref and cleveref, but a specific order of package imports has to be used:
\usepackage{varioref}
\usepackage{hyperref}
\usepackage{cleveref}
Afterwards, you can use \vref
or \Vref
to get the automatic references.
Please be warned, varioref is a monster with a disputable maintenance state and some ugly errors that can happen. Yet, I am not aware of a better alternative. Due to the nature of how LaTeX documents are compiled incrementally, you might end up in endless compile loops. Imagine the following situation:
\vref
to a figure sees that this figure is currently placed on the facing page.
This, it adds the text “on the facing page”, which is quite long.pdflatex
sees the long addition and as a result, the figure moves on page further, references are invalidated again.pdflatex
sees the shorter text and the figure flips back to the facing page.This process can now continue from the beginning and never stops. Generally, this is a problem that can always appear in LaTeX in case the delayed processing on invalidated references significantly changes the amount of text that is produced. So, in case latexmk stops compiling your document with an error message that the maximum number of passes was exceeded, this is a likely source. The only option to cure this is to change your document in such a way that the reference target will not move around.
As it is sometimes hard to find out which reference causes this issue, you might add the following code to your preamble:
\makeatletter
\def\@testdef #1#2#3{%
\def\reserved@a{#3}\expandafter \ifx \csname #1@#2\endcsname
\reserved@a \else
\typeout{^^Jlabel #2 changed:^^J%
\meaning\reserved@a^^J%
\expandafter\meaning\csname #1@#2\endcsname^^J}%
\@tempswatrue \fi}
\makeatother
At each compile run, this will spit out all references that changed in a compile run.
Compare these outputs between the different compiler invocations (diff
) to find the offending labels.
If you use hyperref and provide PDFs digitally, you get nice clickable links. However, for floating environments like figures, these links usually point to the caption text and not to the top of the environment. Thus, it might happen that clicking a link makes the PDF viewer scroll to the caption and the image above is cut off. Your can avoid these issues by loading the package hypcap, which needs to be imported after hyperref.
Providing code listings is a common task in case you are in a technical discipline.
The listings package with its lstlisting
environment and lstinputlisting
command is a common and simple solutions for this.
However, providing good code highlights is a hard problem and the rules used by this package are pretty simple.
Other, standalone code highlighters provide much better highlights.
The minted package includes the pretty well-known highlighter Pygments to provide nicer highlights.
Just have a look at the output of this test document:
\documentclass[a4paper]{article}
\usepackage[english]{babel}
\usepackage{color}
\definecolor{bluekeywords}{rgb}{0.13, 0.13, 1}
\definecolor{greencomments}{rgb}{0, 0.5, 0}
\definecolor{redstrings}{rgb}{0.9, 0, 0}
\definecolor{graynumbers}{rgb}{0.5, 0.5, 0.5}
\usepackage{listings}
\lstset{
columns=fullflexible,
commentstyle=\color{greencomments},
keywordstyle=\color{bluekeywords},
stringstyle=\color{redstrings},
numberstyle=\color{graynumbers},
basicstyle=\ttfamily\small,
}
\usepackage{minted}
\begin{document}
\section*{listings}
\lstinputlisting[language=python]{test.py}
\section*{minted}
\inputminted{python}{test.py}
\end{document}
For listings, this is the minimal setup required to get colored output with an editor-like appearance. For minted, the defaults already provide nice results:
Pygments does a much better job at highlighting things nicely with the default setup.
As minted relies on an external program for highlighting, of course, you need to ensure that Pygments is installed.
Moreover, the LaTeX compiler has to be instructed to allow calls to external programs.
This can be done in the latexmkrc
file by redefining the compiler call to include the -shell-escape
flag:
$pdflatex = 'pdflatex -shell-escape -interaction=nonstopmode';
For getting nicely looking tables, you should at least use the booktabs package, which defines different rules for the top, the bottom, and in between tables. The package documentation gives some basic hints on how to design tables correctly. Basically, common advices boil down to:
Others have written good and more detailed guides:
As you have seen, there are a few common pitfalls and a lot of things you have to remember when using LaTeX. At least for some of the typical issues, linters exist that try to point them out automatically. A good editor should be able to integrate these linters in a way that you get instant feedback.
For issues regarding the use of LaTeX itself, I recommend chktex, which comes with any major LaTeX distribution. It produces warnings like these:
Apart from linting the technical use of LaTeX, it is also a good idea to look through your prose for common issues. The least and probably well-accepted thing to use is a spell checker. But there are more and interesting tools to use for this purpose as well.
For finding grammar errors in LaTeX documents you can use the pretty good LanguageTool.
This grammar checker works on plain text files.
Obviously, LaTeX is not plain text with all the commands in between.
If your editor doesn’t have a reasonable integration of LanguageTool that strips away all the command (while preserving the exact line and column positions of words), you can take the scripts in this gist as a starting point for a preprocessor and integration with LanguageTool.
detex.py
strips away a manually curated list of commands and replaces them with plain text representations that preserve the exact position in the source file (adds multiple spaces if necessary).
detex-python.py
is a replacement for the LanguageTool binary that automatically pipes the target file through detex.py
before handing the stripped down text to LanguageTool for grammar checking.
If you are lucky, you can configure your editor to use this script instead of the original LanguageTool binary.
Obviously, the set of replaced commands needs to be matched with what you use in your document.
The extra effort this takes is well worth it, as LanguageTool is pretty good at spotting even more complex grammar issues.
At least for the English language, there are also multiple tools to actually lint the content of your text for common issues such as weasel words, passive voice, potentially offensive speech, etc.
While many exist, my verdict was that all relevant ones can be replaced with Vale, which uses a set of different styles to search for common problems.
The default configuration is pretty lax and doesn’t check much, so you need to search through the styles folder and select additional styles to apply to your text.
These can be put directly into your project and a .vale
file is the configuration for the linter.
What I have been using was this config:
StylesPath = .vale-styles
MinAlertLevel = suggestion
[*]
BasedOnStyles = vale, proselint, TheEconomist, 18F, PlainLanguage, mystuff
proselint.Annotations = NO
write-good.E-Prime = NO
write-good.Passive = NO
TheEconomist.UnexpandedAcronyms = NO
TheEconomist.Punctuation = NO
18F.UnexpandedAcronyms = NO
18F.Abbreviations = NO
18F.Contractions = NO
PlainLanguage.Contractions = NO
PlainLanguage.Slash = NO
In mystuff
I created a custom style to remind myself for being consistent.
For instance, .vale-style/mystuff/Lists.yml
checks that list items start with a capital letter:
extends: existence
message: "Use capital letters for list items"
ignorecase: false
level: warning
raw:
- \\item(\[.*?\])?\s+([[:lower:]])
.vale-styles/mystuff/ConsistentSpelling.yml
was used to check that common terms that I used were written consistently:
extends: substitution
message: Use '%s' instead of '%s'
ignorecase: true
level: warning
swap:
meta data: metadata
data set: dataset
data sets: datasets
meta model: metamodel
meta models: metamodels
timeseries: time series
data is: data are
open source: open-source
opensource: open-source
run-time: runtime
run time: runtime
front-end: front end
front-ends: front ends
javascript: JavaScript
Java Script: JavaScript
Vale doesn’t handle LaTeX commands.
Thus, you also need to put the detex.py
before the call to Vale.
However, once this is set up, extending rules is pretty simple and greatly helps to enforce consistency.
I hope this somewhat random list of things regarding writing documents in LaTeX helps someone to improve his writing experience and the final document.Probably, many things are missing here, but there are a lot of further resources around. One important issue here is to look for recent articles and blog posts. Even for such an ancient tool like LaTeX the ecosystem is evolving and better packages appear, others become unmaintained, and some are just not needed anymore with more modern distributions.
In case you have remarks, corrections, or further hints what to add here, feel free to contact me via mail.
The first Vim plugin that I have been using in order to replicate a SLIME-like behavior was vim-slime, which uses tools like screen or tmux to get the text into the target REPL application. That means you have to start IPython inside tmux and then configure vim-slime to connect to a specific session. While this works reasonably, it adds a tmux or screen layer around your IPython experience with all the clunky keyboard shortcuts that you have to remember then. It gets even worse if you want to use the Neovim terminal, which adds another layer of escape sequences around this. Also, it feels quite weird to start tmux inside the Neovim terminal just for this purpose. Finally, with some changes to IPython over the time, suddenly some heuristics for sending multi-line text stopped working correctly and I assume this might happen over and over again.
Therefore, my second attempt was to try something more principled. Since Jupyter started to appear in the Python ecosystem, where IPython is just a Kernel and several clients can attach to a running kernel, it would be nice if there was a way for Neovim to directly connect a running IPython kernel and send input there. nvim-ipy implements this for Neovim as a Python plugin. However, there are several drawbacks of this particular implementation that lead me to stop using it:
With the aforementioned drawbacks, I finally moved to iron.nvim.
This doesn’t use the kernel API of Jupyter but instead starts IPython inside a Neovim terminal and multiple other REPLs and languages are supported, too.
This means that the additional buffer of nvim-ipy is gone and executed code fragments are directly available in he history of the REPL and als tmux isn’t needed anymore.
iron.nvim also correctly picks up IPython from Miniconda via a correctly set PATH
variable.
Btw., in case you are interested in my Neovim configuration: it is available on Github: languitar/config-vim.
]]>The backup option of TWRP produces different files for the different partitions of your device. These files are (for backups without compression) simple tar archives of the data on your phone. In case of large volumes, the archives might be split into multi-part tar archives. A listing of the backup directory might look like this:
boot.emmc.win
boot.emmc.win.md5
data.ext4.win000
data.ext4.win000.md5
data.ext4.win001
data.ext4.win001.md5
data.ext4.win002
data.ext4.win002.md5
data.info
recovery.log
system.ext4.win
system.ext4.win.md5
system.info
The app data is stored inside the data partition. So the first step is to extract this partition:
tar -xvf data.ext4.win000
This will result in a folder called data
which will eventually contain the application data in the subfolder data
(yes, same name).
In order to push data from individual application to the phone we first need to restart adb on the phone with root permissions:
adb root
Afterwards, individual application data folders can be pushed to the telephone, e.g.:
adb push data/data/com.example.app /data/data/com.example.app
Now the harder part begins. Android uses a single Unix user per application. First, the pushed files need to become owned bu the user of the application.
For this purpose, first the user id of the application needs to be found out.
For this purpose the application needs to be installed already.
On the phone, e.g. through adb shell
do:
dumpsys package com.example.app | grep userId
This will print out the user id of the app, with which the files can be changed:
chown -R $id:$id /data/data/com.example.app
This is still not sufficient for the app to function properly again. If you try to launch the app now, chances are high that it complains about SQLite databases not being readable. This seems to be caused by SELinux, which is used by Android. The SELinux attributes of the files need to be restore, as described here:
restorecon -Rv /data/data/com.example.app
Finally, the app should be restored.
]]>Everything started with an old Android telephone that I didn’t need anymore and the idea that online music streaming and remote control for music replay might be a nice idea. I put XBMC (now Kodi) on that phone with some music on the SD card and a remote on my new phone and then things started to evolve. Setting up an Android device so that it automatically powers up with XBMC running and switches off when not needed anymore worked somehow but was quite cumbersome at that time. Also, the music on the SD card never matched my actual music library and parallel maintenance was annoying. So I got a small NAS, put the music on that NAS and found and automatic SMB mounting solution for Android. It worked, but sound quality of the audio jack of the phone actually wasn’t too good and administration was annoying. So I switched to a Raspberray Pi with XBMC and also connected it to my TV via the HDMI port. However, XBMC was sluggish on the Pi, so I tried MPD instead, which was blazing fast but soon lacked several features with my ideas of what can be done with such a system evolving. Therefore, I replaced my Pi and the ready-made Synology NAS with a custom-built Linux-based NAS, which is also the media center server, connected to my TV and my stereo sound system. This is now the current state of the hardware evolution.
On my custom-built server I am running the following software:
I decided to use two different solutions for audio and video in parallel because:
As remote controls for Mopidy I use the command-line based ncmpcpp on OS X and Linux and Remotedy for Android, which works slightly better than MPD-based clients. For Kodi most things can be done with the official Android remote control Kore. Only for some URL sharing tasks Yatse works better.
While Mopidy can be started as a system service, I opted to start Kodi as an autostart application of a special user of my system, which is automatically logged in. That way, I can also use the system as a normal computer using a wireless keyboard. I also had to configure pulse to ignore the external USB soundcard so that Mopidy / snapcast could use it without problems.
Finally, snapcast is the newest addition to the setup. It allows synchronized replay of audio on multiple devices as e.g. Sonos but for free. Mopidy outputs the audio to snapcast for distributed replay and one snapcast client already runs on the NAS to feed the external audio interface. I have set up a second client on the Raspberry Pi (also running Archlinux ARM) and connected it to other loudspeakers in my kitchen. The snapclient is started as a system service and automatically (re-)connects to the server so that just switching on the system results in an instant replay. An Android app allows to control which clients play at which volume level and clients can even be switched between different streams. This works really well and I didn’t have any issues with snapcast so far.
]]>autosuspend
periodically checks a running Linux system for certain conditions which shall prevent the system from going to sleep.
Such conditions can be logged in users, external TCP connections, music playing on MPD, X11 activity etc.
In case the configured checks did not indicate activity for a certain amount of time, the system is automatically suspended.
You should configure Wake-on-LAN for being able to bring the system back up in case you need it again.
autosuspend
is implemented in Python 3, all checks are configurable via configuration file and custom checks can be added easily.
The repository includes a systemd unit which I use to run the daemon on my Archlinux setup.
For Archlinux, there is also a PKGBUILD on AUR.
I’d be glad if someone finds this script useful, too. In case of issues, please use the issue tracking in the Github project.
]]>The solution is based on sampling the 3D space and computing a distance to the separating hyperplane for each sample.
Afterwards, I derived the isosurface at distance 0 using the marching cubes implementation in scikit-image
.
The resulting mesh can be plotted using existing methods in matplotlib.
This is an example of this technique (based on the 2D example for the One-class SVM in scikit-learn):
from matplotlib import cm
from mpl_toolkits.mplot3d import axes3d
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
from skimage import measure
from sklearn import svm
import matplotlib.font_manager
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import numpy as np
SPACE_SAMPLING_POINTS = 100
TRAIN_POINTS = 100
# Define the size of the space which is interesting for the example
X_MIN = -5
X_MAX = 5
Y_MIN = -5
Y_MAX = 5
Z_MIN = -5
Z_MAX = 5
# Generate a regular grid to sample the 3D space for various operations later
xx, yy, zz = np.meshgrid(np.linspace(X_MIN, X_MAX, SPACE_SAMPLING_POINTS),
np.linspace(Y_MIN, Y_MAX, SPACE_SAMPLING_POINTS),
np.linspace(Z_MIN, Z_MAX, SPACE_SAMPLING_POINTS))
# Generate training data by using a random cluster and copying it to various
# places in the space
X = 0.3 * np.random.randn(TRAIN_POINTS, 3)
X_train = np.r_[X + 2, X - 2, X + [2, 2, 0]]
# Generate some regular novel observations using the same method and
# distribution properties
X = 0.3 * np.random.randn(20, 3)
X_test = np.r_[X + 2, X - 2, X + [2, 2, 0]]
# Generate some abnormal novel observations using a different distribution
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 3))
# Create a OneClassSVM instance and fit it to the data
clf = svm.OneClassSVM(nu=0.1, kernel="rbf", gamma=0.1)
clf.fit(X_train)
# Predict the class of the various input created before
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)
# And compute classification error frequencies
n_error_train = y_pred_train[y_pred_train == -1].size
n_error_test = y_pred_test[y_pred_test == -1].size
n_error_outliers = y_pred_outliers[y_pred_outliers == 1].size
# Calculate the distance from the separating hyperplane of the SVM for the
# whole space using the grid defined in the beginning
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel(), zz.ravel()])
Z = Z.reshape(xx.shape)
# Create a figure with axes for 3D plotting
fig = plt.figure()
ax = fig.gca(projection='3d')
fig.suptitle("Novelty Detection")
# Plot the different input points using 3D scatter plotting
b1 = ax.scatter(X_train[:, 0], X_train[:, 1], X_train[:, 2], c='white')
b2 = ax.scatter(X_test[:, 0], X_test[:, 1], X_test[:, 2], c='green')
c = ax.scatter(X_outliers[:, 0], X_outliers[:, 1], X_outliers[:, 2], c='red')
# Plot the separating hyperplane by recreating the isosurface for the distance
# == 0 level in the distance grid computed through the decision function of the
# SVM. This is done using the marching cubes algorithm implementation from
# scikit-image.
verts, faces = measure.marching_cubes(Z, 0)
# Scale and transform to actual size of the interesting volume
verts = verts * \
[X_MAX - X_MIN, Y_MAX - Y_MIN, Z_MAX - Z_MIN] / SPACE_SAMPLING_POINTS
verts = verts + [X_MIN, Y_MIN, Z_MIN]
# and create a mesh to display
mesh = Poly3DCollection(verts[faces],
facecolor='orange', edgecolor='gray', alpha=0.3)
ax.add_collection3d(mesh)
# Some presentation tweaks
ax.set_xlim((-5, 5))
ax.set_ylim((-5, 5))
ax.set_zlim((-5, 5))
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_zlabel("Z")
ax.legend([mpatches.Patch(color='orange', alpha=0.3), b1, b2, c],
["learned frontier", "training observations",
"new regular observations", "new abnormal observations"],
loc="lower left",
prop=matplotlib.font_manager.FontProperties(size=11))
ax.set_title(
"error train: %d/200 ; errors novel regular: %d/40 ; "
"errors novel abnormal: %d/40"
% (n_error_train, n_error_test, n_error_outliers))
fig.show()
The resulting plot will look like this:
]]>Concerning the Seafile solution, the most prevalent issues why I wanted to switch away from this solution were:
For these reasons I was looking for a different solution, but this was never really an urgent issue.
Several months ago I stumbled across homesick, which synchronizes dotfiles using Git. I talked to a colleague about this, we both liked the idea, and then more or less forgot about this issue until a few weeks ago. At that time, the colleague told me that he had looked at homesick, decided against a Ruby dependency just for managing dotfiles and started using homeshick (mind the H), which is more or less a clone of homesick written in bash. Therefore, I also gave this a go and finally moved all dotfiles to the new system.
The first issue when starting to use homeshick or homesick is to decide how many git repositories (called castles) to use. Since several parts of my configuration differ between private computers, work hosts and servers, I decided to use a modular approach and created a castle for each program I use. Configuring the castles worked like expected apart from the fact that I had to fix some symlinks manually, which already existed from my previous solution. There is no way for homeshick to understand which symlinks to keep and which to replace in this situation, so I can’t blame anyone about this.
Since I spent some time to configure my environment and people sometimes ask about my configuration files I decided to host them publicly on GitHub. At least all configuration files where I could remove sensitive information are now on my GitHub account as repositories starting with the “config-” prefix. Feel free to use these configurations and give feedback on them.
So far I am quite happy with the solution and everything works as expected. I did not investigate in how to manage the different castles, e.g. using repo or myrepos as proposed by the homeshick authors, but maybe I will try this out in the futures.
]]>After installing the Python package, you can use the provided logging handler by configuring the logging system appropriately, e.g. via a config file like the following one:
[loggers]
keys=root
[handlers]
keys=broadcastHandler
[formatters]
keys=simpleFormatter
[logger_root]
level=DEBUG
handlers=broadcastHandler
[handler_broadcastHandler]
class=broadcastlogging.BroadcastHandler
level=DEBUG
args=('192.168.0.255',55555)
In order to receive the logging messages, the broadcastlogging
module also provides an executable which replicates the log messages in a local program. It can be launched e.g. as follows:
python -m broadcastlogging -h -c -b '192.168.0.255' 55555
For further details, please refer to the online help of the module and the README file.
The broadcast-logging package is available on my GitHub account and on PyPI. Feel free to try this out and please give feedback via GitHub issues in case of problems or enhancement ideas.
]]>def add_subplot_zoom(figure):
# temporary store for the currently zoomed axes. Use a list to work around
# python's scoping rules
zoomed_axes = [None]
def on_click(event):
ax = event.inaxes
if ax is None:
# occurs when a region not in an axis is clicked...
return
# we want to allow other navigation modes as well. Only act in case
# shift was pressed and the correct mouse button was used
if event.key != 'shift' or event.button != 1:
return
if zoomed_axes[0] is None:
# not zoomed so far. Perform zoom
# store the original position of the axes
zoomed_axes[0] = (ax, ax.get_position())
ax.set_position([0.1, 0.1, 0.85, 0.85])
# hide all the other axes...
for axis in event.canvas.figure.axes:
if axis is not ax:
axis.set_visible(False)
else:
# restore the original state
zoomed_axes[0][0].set_position(zoomed_axes[0][1])
zoomed_axes[0] = None
# make other axes visible again
for axis in event.canvas.figure.axes:
axis.set_visible(True)
# redraw to make changes visible.
event.canvas.draw()
figure.canvas.mpl_connect('button_press_event', on_click)
Applying this method to a figure will allow you to use Shift+Click
to let a single subplot fill the whole canvas and a second click of this kind will restore the original state.
This solution is partially based on the Stackoverflow question Matplotlib: Grab Single Subplot from Multiple Subplots but improves it for several corner cases.
Still some issues remain.
I couldn’t find out how to get the state of the usual navigation buttons in the event handler.
Therefore the code might interfere with the other navigation operations and also the home button might not work as expected.
In case someone find out, please let me know.
pass-git-helper is a Python script which implements the Git credential API to act as a credential helper.
Since Git requests credentials using the host name and the pass database is usually not organized this way, some kind of adjustment of these concepts is required.
My basic design decision with respect to this issue was that I did not want to artificially structure my password database just to match the Git host concept.
Therefore I opted for a mapping-based solution.
In order to use pass-git-helper you need to specify how hosts are mapped to entries in your password store using a file, usually called ~/.git-pass-mapping
, which might look like this:
[github.com]
target=dev/github
[*.fooo-bar.*]
target=dev/fooo-bar
Additionally, the helper needs to be configured inside Git, e.g. using:
git config credential.helper '!pass-git-helper $@'
As usual for pass, the first line of a password store entry is assumed to contain the password and the second line is interpreted as the user name, if present.
Git credential helper can also offer abilities to update saved credentials or store completely new ones. This is currently not supported, but I’d be happy to integrate such a feature if desired.
pass-git-helper is available on GitHub and licensed as LGPLv3+. I’d be glad to receive some feedback and hope this little helper is useful for someone.
]]>For my portfolio I have used Jekyll as a well known and easily maintainable static site generator. I am quite happy with how things worked out for that site but there were a few things which led me to try out something different for the blog:
With that in mind I explored pythonic static site generators. First stop here was the quite popular Pelican. I didn’t notice any considerable technological improvement compared to Jekyll apart from being implemented in Python. Also, most existing templates looked worse then for Jekyll, so I dropped Pelican.
Having experience with writing technical documentation using the Sphinx documentation generator I had the idea that it might be nice to have a blogging platform based on Sphinx. Sphinx offers a lot of semantic additions to the usual reStructured Text repertoire, which makes it quite appealing on first sight. Indeed, there are blogging platforms based on Sphinx and I started to explore Tinkerer. The problem I had with Tinkerer is that things which are easily possible with e.g. Jekyll were actually a nightmare to realize using Tinkerer because of the much more rigid structure of the processing backend. One example is adding a teaser image per post. This is not directly supported by Tinkerer and I would have had to implement an extension just for this purpose. In Jekyll I can just use an item in the front matter and access it in the post list template. Therefore, I also dropped Tinkerer.
In the end, I decided to use Middleman for this site, a Ruby-based solution again. ;) Middleman provides a lot of flexibility while still being easily usable and maintainable. It is not only intended for blogging and e.g. adding post images was a matter of a few lines of template code. So this works quite well. Sometimes I had to dig a bit deeper into the documentation or the forum to understand some aspects, but generally, the documentation quality is good and the system is quite understandable. I have decided to use kramdown as the markdown engine because of the richer feature set like definition lists.
For theming I decided to start a new theme more or less from scratch instead of using an existing once to avoid a lot of the overhead available in the existing themes.
The HTML code uses modern semantic HTML 5 elements like article
, section
or nav
and CSS uses level 3 features.
The layout scales to portable devices using CSS media queries.
I tried to focus on the article contents and removed a lot of the previously existing sidebars etc.
gpg-agent
and Enigmail’s internal passphrase management cannot be used anymore. Therefore, a setup is required that enabled the gpg
processes spawned by Enigmail to talk to a running gpg-agent
instance.
gpg-agent
is a small utility daemon that handles passphrase caching. A gpg
process can talk to a running gpg-agent
once it knows where the agent can be reached. This information is obtained from the environment variable GPG_AGENT_INFO
. The content of this variable looks something like this: /Users/youruser/.gnupg/S.gpg-agent:38959:1
. In case of programs started in a shell, starting gpg-agent
on demand and exporting the correct environment variable is easily possible with a few lines of shell configuration code. However, Thunderbird and therefore also Enigmail are usually started via the OSX GUI and not from within a shell. Therefore, the GPG_AGENT_INFO
variable needs to be exported by the process which manages launching graphical programs (via Spotlight). This is launchd
on OSX. Fortunately, launchd
has command line options to control the environment it uses to start new processes, which we can take advantage of.
For the setup I am using now, I start a gpg-agent
process with my graphical login. To do so, I have adapted this solution, which starts gpg-agent
at login, but does not export the environment variables inside launchd
.
The first step is to create a plist-file, which is a configuration for launchd
instructing it to start gpg-agent
at login. Create ~/Library/LaunchAgents/org.gnupg.gpg-agent.plist
with the following contents (shamelessly stolen from the aforementioned blog post):
<xml version="1.0" encoding="UTF-8"?>
<DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>org.gnupg.gpg-agent</string>
<key>ProgramArguments</key>
<array>
<string>/Users/youruser/bin/start-gpg-agent.sh</string>
</array>
<key>RunAtLoad</key>
<true/>
</dict>
</plist>
Replace youruser
with your actual OSX username. launchd
automatically processes plist files from the ~/Library/LaunchAgents
directory.
As the plist only dispatches to a shell script called /Users/youruser/bin/start-gpg-agent.sh
, we need to create this script, which does the real magic of starting gpg-agent
and exporting the variables. Create it with the following contents (adapt the gpg-agent
path to your installation):
#!/bin/bash
if test -f "$HOME/.gpg-agent-info" && \
kill -0 "$(cut -d: -f 2 "$HOME/.gpg-agent-info")" 2>/dev/null
then
echo "already running" > /dev/null
else
/usr/local/bin/gpg-agent -c --daemon --write-env-file > /dev/null
fi
if [ -f "${HOME}/.gpg-agent-info" ]
then
socket="$(cut -d= -f2 "$HOME/.gpg-agent-info")"
launchctl setenv GPG_AGENT_INFO "${socket}"
else
echo "gpg-agent did not write info file"
fi
The first part of the script starts gpg-agent
in case no other gpg-agent
instance is already running. A running instance of gpg-agent
puts its connection information in a file called ~/.gpg-agent-info
in case it was started with the --write-env-file
option. The second half of the shell script parses this file and exports the GPG_AGENT_INFO
variable inside launchd
with the launchctl setenv
command. Afterwards, every graphically launched program has knowledge about the running gpg-agent
instance and can connect to it.
In case a gpg
process (e.g. spawned by Enigmail) now wants to interact with one of your keys, it will dispatch the passphrase work to gpg-agent
. If a passphrase has not been provided to gpg-agent
yet, or the last entry is longer ago than the configure TTL (time to live), gpg-agent
needs a way to prompt for a new passphrase. Therefore it is important that a graphical pinentry program is configured for gpg-agent
. This is done inside the file ~/.gnupg/gpg-agent.conf
. Ensure that in this file at least the following line is present (adapt the path as required):
pinentry-program /usr/local/bin/pinentry-mac
This instructs gpg-agent
to use the pinentry-mac program (from the GPGTools project, can e.g. be installed via homebrew) for requesting a passphrase.
After performing all these steps, you can logout and back in again. In a terminal you should now see that gpg-agent
is running, e.g. via ps -ef | grep gpg-agent
and also the GPG_AGENT_INFO
variable should be present (echo $GPG_AGENT_INFO
). Enigmail should be able to interact with gpg-agent
and passphrases will only be requested once per TTL.
Starting with GPG 2.2 this is probably not necessary anymore, since it seems that GPG itself now implemented a system to start the agent if it is not running.
]]>During this process I also updated the software section in this blog and used some Wordpress plugin magic to auto-generate an always up-to-date overview page, which I always forgot to update.
]]>git diff stash@\{0\}^1 stash@\{0\} -- path/to/your/file | git apply
Based on the Stackoverflow question: How would I extract a single file (or changes to a file) from a git stash?
]]>languitar@cinnabar:~$ python2 -c 'print("test", "42")'
('test', '42')
languitar@cinnabar:~$ python3 -c 'print("test", "42")'
test 42
The first solution I tested was ownCloud. My experiences weren’t too good. Especially, because I was completely unable to manually compile the client on my work computer. After endless tries I finally ended up with a launchable version which, however, didn’t synchronize at all. Some more web-browsing later with endless remarks about the bad support and many open issues I decided against ownCloud.
The solution I ended with is Seafile, a not so well known open source solution with some Chinese developers and good support and issue response times. People might be suspicious because of the origins, but so far I couldn’t find any obvious security issues and things work quite well.
For my setup I followed the installation instructions on the Github wiki using the pre-compiled packages. I am operating Seafile on a subfolder of an SSL-enabled subdomain by using apache as a proxy. Actually, all required information how to set up this configuration can be found on the wiki with some browsing. Things could be better described, but I can also imagine much more horrifying installation instructions.
After installing the Seafile server and adding a user account clients can connect to the server and synchronize multiple libraries. This is a nice concept not directly available in Dropbox etc. Each library is a folder structure with separate settings for the kind of history that is maintained and the ability to enable client-side encryption. Moreover, libraries are the atomic unit that can be shared between users on the same server. E.g. in my setup I created a library for sharing files with others where I completely disabled the history feature and another one for the config files where I preserve a few revisions of history. Across the different collections, Seafile implements a de-duplication strategy to save server-side storage space. Clients exist for all major operating systems (including a console-based daemon for linux) and also for Android.
My experiences with Seafile are quite good so far and most things work well. The following issues are the once I noticed and that could be improved.
One kind of service I have been using for some time now is a news feed reader / aggregator. First Google Reader and after the shutdown Feedly. I never used any collaborative feature for a news feed reader, so there is really no benefit in using a public service despite the setup overhead. So I started to search for a privately hosted alternative. Basically I had only a few requirements:
With these requirements I ended up with Tiny Tiny RSS. Tiny Tiny RSS is a PHP-based web-service providing a comparable service like Google Reader and Feedly did. Moreover, several front-ends exist, e.g. for Android.
I am using Tiny Tiny RSS now for several weeks and I am quite satisfied. The service works very reliable so far without any issues. The setup itself was simple following the installation guide. The default theme for the web-interface looks a bit oldish, so one modification I did was installing tt-rss-feedly-theme, which provides a Feeldy-like style:
For mobile reading on my Andoird devices I am using TTRSS-Reader. Instead of the “official” open source Android client , which for strange reasons requires a payed unlock for full functionality, this one is completely free and works well. Swiping between articles could be a bit smoothers and there is sometimes a rendering bug when going back to the article list. But apart from that, everything works well.
So, Tiny Tiny RSS works very well for me and I do not miss anything I previously had with Google Reader or feedly. There are also a lot of other front-ends available other than the ones I have shown, which allows for a lot more customization.
]]>alias makej="make -j $(cat /proc/cpuinfo | grep processor | wc | sed -r 's/^ +([0-9])+.*//')"
Mac:
alias makej="make -j $(sysctl hw.ncpu | awk '{print $2}')"
.gitconfig
file (along with other configuration files) into a folder inside the Dropbox (or other syncing tool), create a symlink to that file in the original location in your home directory and let Dropbox do the syncing.
However, what if some parts of the git config are device specific? E.g. the visual difftool on my Mac is different from the one on Linux workstation, also the keychain helper. The solution I came up with is to let the shared .gitconfig
include a device-specific file for the specific settings using the include feature available since 1.7.10:
[include]
path = .gitlocal
Now I can configure device specific settings in the ~/.gitlocal
file which is not synced by Dropbox.
Unfortunately, some of the computers I am working with have older git version running not supporting the include feature. For these machines I can at least include all settings shared between them in the usual .gitconfig
file and override them for other computers in their .gitlocal
files. Not perfect, but it works for my setup.
To overcome at least the problem of setting up a usable build system at all I have created a tutorial with example projects for C++, Python and Java some time ago at work. This is probably also usable outside my university, so feel free to use that tutorial for you own needs: Build System Essentials.
]]>is
and is not
test for object identity: x is y
is true if and only if x and y are the same object. x is not y
yields the inverse truth value.” This means the expressions “a is b” only returns True if both a and b point to the exact same object instance. Hence, you should always use is when you want to compare against specific instances. For the advice about using is to compare with None this means that there is only a single instance of None. The same is true also for Ellipsis.
Not knowing the difference in comparison operators can actually lead to quite confusing results because of Python’s operator overloading abilities. Imagine you need to check in some code whether an arbitrary variable you receive is Ellipsis or anything else. If you use == instead of “is” for that check, this might work in many cases but it might also be destroyed completely if that variable points to an object with a custom implementation of the **eq** function:
>>> a = "a test string"
>>> a == Ellipsis
False
>>> import numpy as np
>>> a = np.array(range(5))
>>> a == Ellipsis
array([False, False, False, False, False], dtype=bool)
>>> a is Ellipsis
False
>>> a == None
False
In the example, string behaves correctly for the test against Ellipsis, but comparing against a numpy ndarray (highlighted lines) doesn’t yield a boolean value as expect, but instead another array containing booleans, because ndarray implements **eq** and maps the comparison operator to an element-wise comparison. This probably results in an exception in production use. Lines 10 and 11 show that “is” ignores the custom comparison method and correctly resutls the expected boolean value.
Finally it is noteworthy that **eq** implementation of ndarray seems to be aware of the fact that people constantly compare against None with ==. This is the reasons why the last two lines of the source code listing return only a single boolean value and not a list. However, you should normally not rely on this fact as it is not given for all comparison implementations.
Other languages actually have a similar concept but they expose it in a more obvious way. In Java the == operator always checks for object identity and you explicitly need to implement and call the equals method if you want to have a custom comparison.
]]>Of course, as usual with Samsung devices I had to go over several bricks before I finally got a running installation. Also, I ended up being stuck with an endless recovery mode loop caused by Clockwork Mod Recovery, which was fortunately cured by this solution (German) after the second attempt. The only thing left I have to find out now is the battery consumption…
Btw, why are all these people on the android hacker forums so unbelievably brief and pseudo-geeky with their language. Really a pain to read posts and descriptions there…
]]>First of all this wasn’t a portfolio at all with the lack of any selection and reasonable presentation. Essentially I am using flickr now for any kind of photo and not only for selected ones. Also, in line with the ongoing shift to other platforms, I have started putting fine art-like photos to other sites like 500px. Hence, flickr wasn’t the right basis anymore. Second, flash is a dying web technology and has several drawbacks for photography. Most important it doesn’t support color management well. So dropping this viewer was also an important step.
As a consequence I have created a dedicated website for my photography work: https://www.johanneswienke.de. This website will also contain a blog for the photography-related things I am doing.
The blog here will still continue to exist and I will try to update it from time to time, especially with more tech-related stuff. As such things are mainly discussed in English, I have also switched the language to English now by replacing the underlying Wordpress installation. To make the maintenance of this blog and the portfolio easier I have also switched to the network features of Wordpress, which allows to run multiple sites using the same Wordpress installation.
]]>The script can be found here.
]]>Usually, it is a good idea to make software which you built relocatable. This means that created binaries can be moved around and they still work after the location change. This includes the ability to still find the correct versions of dynamic libraries. One simple way to achieve this is by setting the rpath relative to the current location of the binary. In CMake this can be achieved like this:
# for all binaries created in a CMake project:
SET(CMAKE_INSTALL_RPATH "$ORIGIN/../lib:$ORIGIN/")
# for certain targets
SET_TARGET_PROPERTIES(target 1 target2 ...
PROPERTIES INSTALL_RPATH "$ORIGIN/../lib:$ORIGIN/")
Please note that the backslash is required.
]]>The Jenkins continuous integration server uses the notion of file fingerprints for this purpose. The upstream job is built by Jenkins and produces one or several so called artifacts, the results of the build process. The artifacts are archived by Jenkins and fingerprints (hash sums) for each artifact are created and stored along with the build number of the job. When the downstream job starts to build it downloads the (most recent) artifacts from the upstream job and uses them for its purposes, i.e. building and running the own source code. By comparing the fingerprints of the downloaded artifacts with the stored fingerprints Jenkins knows which version of each upstream job was involved in a build and can track which upstream build number broke the downstream job. Jenkins will only issue notifications if this fingerprinting mechanism is properly configured, triggering a build after another is not sufficient to receive these notifications. Moreover, the Blame Upstream Commiters plugin needs to be used and enabled for each downstream job or the global property hudson.upstreamCulprits (will this ever be renamed?) needs to be set.
The rational behind this rather complex mechanism is that it enables a high amount of parallelism for building jobs. While the downstream job builds, the upstream job can already operate again without affecting the downstream job. This would be the case if e.g. a central installation location would be shared between both jobs. If the upstream job installs new files while the downstream job is still building, this will certainly result in hard to debug errors. Moreover, this also allows to run the downstream job on a different build salve (assuming similar systems), which also would not be the case with a central installation location in a file system.
For Java projects (where Jenkins comes from) the explained mechanism usually works well. The upstream job produces one or several jar files containing all resources for the project like images, fingerprints them, no preprocessor is involved which configures the Java code according to the installation setup, and no source code was generated based on this setup. For C++ projects this is usually different, because the language already includes a preprocessor and it is common practice to set certain code lines according to the installation location, e.g. to find additional files like images, because they cannot be packaged in the jar file. Also, C++ projects usually consists of much more files considering all headers compared to Java. This provides more chances to mix something up.
So assuming an upstream C++ job A (using CMake, other build solutions are not covered in this post but the techniques can be applied there, too) which is built in Jenkins, it usually will be configured with an installation location, e.g. inside the job’s workspace like /jenkins/workspace/A/install
.
Often, CMake will use this location, e.g. to generate a config.h
which tells that images are found at /jenkins/workspace/A/install/share/A/images
etc.
To use the CMake dependency mechanism, it will generate a AConfig.cmake
file and install it also to the share folder (cf. the CMake documentation for find_package). The file might look like:
SET(A_LIBRARIES "/jenkins/workspace/A/install/lib/libA.so")
SET(A_INCLUDE_DIRS "/jenkins/workspace/A/install/include/A")
After building the project the job will e.g. use a compression tool to create a single archive and compress all contents of /jenkins/workspace/A/install/
, archive this artifact and generate the fingerprints for it.
Both issues mentions above will prevent the dependency tracking of Jenkins to function properly, because the downstream job will download the artifacts to its own workspace, e.g. to /jenkins/workspace/B/upstream/A
and unpack them. Cf. the issues:
AConfig.cmake
point to A’s workspace and not the downloaded artifacts.To enable reliable dependency tracking in Jenkins, the solutions are:
Do not use this technique at all. The software is generally more flexible if not hard locations are assumed and more situations are covered without recompiling.
The idea here is to make all paths given in the config file (AConfig.cmake
) relative to its current location on the disc. This will look like this:
GET_FILENAME_COMPONENT(CONFIG_DIR "${CMAKE_CURRENT_LIST_FILE}" PATH)
SET(A_LIBRARIES "${CONFIG_DIR}/../../lib/libA.so")
SET(A_INCLUDE_DIRS "${CONFIG_DIR}/../../include/A")
Now the CMake script of B will use the correct downloaded headers, libraries etc. for A from the own workspace
The two aspects make it possible to use fingerprinting in Jenkins for dependency tracking with notifications for upstream committers. Especially the first aspect includes taking care while designing the project but there is no other solution I can think of.
Please note that for executing any tests in downsteam job B you have to set the LD_LIBRARY_PATH
to find the right upstream libraries as well.
Some more care needs to be taken to not mix up the dependency tracking again:
The downstream job needs to make sure that the latest downloaded artifact is really used to build its own source code. So it is a good idea to simply remove the upstream directory as the first step of the build.
The downloaded artifacts (as explained above the generated archive files) need to be kept after extracting them, because the downstream job also has to generate fingerprints for them (and not for the extracted files) to create a match with the fingerprints stored for the upstream job.
In order to enable the downstream CMake project to find the upstream project use the _DIR
variable for the CMake call as defined in the CMake documentation, e.g. -DA_DIR="${WORKSPACE}/upstream/A/share/A"
If your upstream project contains a version or revision number in the extracted folder (e.g. ${WORKSPACE}/upstream/A-0.35/
) and you want your downstream job to be resilient against version changes in the upstream project you can use some find-magic on UNIX for automatically finding the folder:
A=`find "${WORKSPACE}/upstream" -maxdepth 1 -type d -name "A-\*";\`
If you are using pkg-config instead of or in addition to the CMake config file mechanism, you can use the --define-variable
command line argument to achieve similar flexibility, assuming that all your absolute paths depend on a single prefix-variable in the pc file.
One thing which I was not happy with so far was the missing integration of open-source coverage tools for Linux. Here, Gcov can be used to generate more or less precise coverage reports for projects compiled with GCC. Unfortunately, Gcov itself does not provide tools to export the results in any common or even nicely readable format. Until now, the only working solution I found was to use the Gcov front-end LCOV to generate a HTML report. This report is nice to read but it cannot be tracked by Jenkins with the drawback that no trend report for the code coverage can be generated. Nevertheless, I’ve wrapped the creation of such a HTML report in a CMake function and worked with it so far.
Today, I searched again for cheap solutions to overcome this drawback (this means without writing a custom Jenkins plugin for Gcov coverage files). While searching the net, I found the gcovr script, which parses Gcov result files and is able to convert them into XML files that satisfy the format generated by Cobertura, which is a coverage tool for Java with an existing plugin for Jenkins.
As far as I tested it, this script works well with the Jenkins plugin, so I integrated the execution of this script in my existing coverage function for CMake, which is available in the RSC library. This library also contains additional CMake wrappers for tools that can be used to generate trend reports in Jenkins, like cppcheck. Now our Jenkins can also generate coverage trend reports for C++ projects.
]]>def foo(arg=MyClass()):
pass
As I am currently programming a lot in C++ this did not look suspicious to me. But from python’s point of view it absolutely makes sense that the default value for arg is already constructed at module load time and not only on a call to that functions. Moreover it is important to remember that this default argument is constructed only once for all calls to a function as stated here.
]]>boost::shared_ptr<Foo> p(new Foo);
boost::thread t(boost::bind(&Foo::method, p.get()))
This prevents the livecycle management of shared_ptr
or any other smart pointer to be effective in the thread. Hence, p
may get destructed even though the thread is still active.
Boost bind can handle smart pointers, so instead use the smart pointer itself as the instance argument for bind:
boost::thread t(boost::bind(&Foo::method, p))
$HOME
. $HOME
, in turn, is filled from the user definition in /etc/passwd
. pwd
always returns the current working directory without a trailing slash, but the entry for my user in /etc/passwd
contained a trailing slash. Removing this slash from the file solves the problem and the prompt uses ~ again for the home directory.
]]>sudo update-alternatives --config x-www-browser
]]>Frühlingsröllchen, Chili-Pfeffer-Dip, Reis, Eisberg mit Möhren, Kokosquark
]]>-Wall -Wextra
will generate a lot of useful warning messages about unused parameters etc.
Unfortunately, this is not the common practice and often the own compiler settings concerning the warning level results in dozens of warnings from system headers on which the own code relies, making it impossible to spot warnings from your own code in the endless mass of console output.
Fortunately, GCC has a way to ignore warnings from foreign headers. Instead of using -I
to specify an include path, -isystem
tells the compiler to treat the includes from the given path as system headers where no warnings should be reported.
If CMake is used to create the Makefile, a special argument to the INCLUDE_DIRECTORIES
function generates these compiler flags:
INCLUDE_DIRECTORIES(SYSTEM /usr/include /and/other /system/paths)
Change the compiler flags. The worst solution. In other cases these warnings are really helpful. Generally I really like to compile with the highest warning level. Most of the warnings (at least for GCC) really have something to tell.
Use a macro to flag a variable as being unused, e.g. from Qt:
foo(int aParameter) {
Q_UNUSED(aParameter)
}
Even though this looks like a reasonable solution it has the drawback of having a potential for confusion. Maybe sometime later you need the parameter but forget to remove the Q_UNUSED macro, hence generating something like this:
foo(int aParameter) {
Q_UNUSED(aParameter)
// 30 lines of other method code
int anotherVariable = 30 * aParameter;
}
Suddenly your code documents that a variable isn’t used but actually it is and the compiler can’t detect this error.
Simply comment the variable name:
foo(int /*aParameter*/) {
// 30 lines of other method code
int anotherVariable = 30 * aParameter;
}
Now the warning is gone, too and the compiler can detect either the illegal use of the variable in line three or the illegal statement that this variable is unused by purpose.
SET(HS folder/test.h folder/other/test2.h)
A simple call to INSTALL
doesn’t preserve the folder structure:
INSTALL(FILES ${HS} DESTINATION include)
This results in all files being directly under $prefix/include
.
To preserve the structure you can use this simple macro:
MACRO(INSTALL_HEADERS_WITH_DIRECTORY HEADER_LIST)
FOREACH(HEADER ${${HEADER_LIST}})
STRING(REGEX MATCH "(.\\\*)\\\[/\\\]" DIR ${HEADER})
INSTALL(FILES ${HEADER} DESTINATION include/${DIR})
ENDFOREACH(HEADER)
ENDMACRO(INSTALL_HEADERS_WITH_DIRECTORY)
INSTALL_HEADERS_WITH_DIRECTORY(HS)
#include <sstream>
#include <boost/test/included/unit_test.hpp>
#include <gtest/gtest.h>
#include <gmock/gmock.h>
using namespace std;
using namespace testing;
class BoostTestAdapter: public EmptyTestEventListener {
virtual void OnTestStart(const TestInfo& /*testInfo*/) {
}
virtual void OnTestPartResult(const TestPartResult& testPartResult) {
if (testPartResult.failed()) {
stringstream s;
s << "Mock test failed (file = '"
<< testPartResult.file_name()
<< "', line = "
<< testPartResult.line_number()
<< "): "
<< testPartResult.summary();
BOOST_FAIL(s.str());
}
}
virtual void OnTestEnd(const ::testing::TestInfo& /*testInfo*/) {
}
};
Every time a partial test result of googletest is reported that fails, BOOST_FAIL
is called.
The only thing that is left now is to install the newly created adapter in the googletest framework and initialize it properly. This can be done using a fixture class in Boost Test:
class TestFixture {
public:
TestFixture() {
InitGoogleMock(
&boost::unit_test::framework::master_test_suite().argc,
boost::unit_test::framework::master_test_suite().argv);
TestEventListeners &listeners = UnitTest::GetInstance()->listeners();
// this removes the default error printer
delete listeners.Release(listeners.default_result_printer());
listeners.Append(new BoostTestAdapter);
}
~TestFixture() {
// nothing to tear down
}
};
BOOST_GLOBAL_FIXTURE(TestFixture)
With this, Doxygen tags in comments are highlighted and comments with parameters etc. are generated automatically like in Java. What’s still missing is the auto-formatter support for these comments and tag-completion.
]]>The game we developed is based on the well known Pong but extends it with some interesting tweaks like two paddles per player etc.
Here are the downloads:
The game should be playable on every modern C64 emulator or the real one. It was developed on vice.
]]>*.kml
konvertierte Variante hatte dieses Problem. Was hingegen funktioniert hat, war das Erstellen, Speichern und erneute Laden eines Tracks in Google Earth. Ein Vergleich meiner KML-Datei mit der von Google Earth selbst generierten hat einen netten Fehler aufgedeckt: Google hat wohl etwas zu viel i18n oder l10n gemacht, so dass beim Erstellen und Laden von Dateien das landestypische Dezimalzeichen (also mit deutscher Locale ein Komma) benutzt wird. Laut GPX- und KML-Spezifikation ist es natürlich totaler Müll, Koordinaten in der Form 52,xxx zu benutzen. Folgender Trick hat Google Earth dann davon überzeugt doch einen Punkt als Dezimaltrenner zu benutzen:
LANG="" googleearth
So startet Google Earth auf Englisch.
]]>typename T::ThereReallyIsNoMemberByThisNameInT vertices(T const&);
// The graph is passed by *const* reference so that graph adaptors
// (temporaries) can be passed into this function. However, the
// graph is not really const since we may write to property maps
// of the graph.
Spaßig sind aber auch die Compiler-Fehlermeldungen:
make all Scanning dependencies of target imnbase [ 7%] Building CXX object src/CMakeFiles/imnbase.dir/algorithms/DefaultAlgorithmFactory.cpp.o [ 14%] Building CXX object src/CMakeFiles/imnbase.dir/algorithms/parallelScan/ParallelScan.cpp.o /usr/local/include/boost/mpi/datatype.hpp: In function »ompi_datatype_t\* boost::mpi::get_mpi_datatype(const T&) [with T = std::basic_string, std::allocator >]«: /usr/local/include/boost/mpi/detail/mpi_datatype_primitive.hpp:96: instantiated from »void boost::mpi::detail::mpi_datatype_primitive::save(const T&) [with T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:95: instantiated from »static void boost::archive::save_access::save_primitive(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:212: instantiated from »static void boost::archive::detail::save_non_pointer_type::save_primitive::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:294: instantiated from »static void boost::archive::detail::save_non_pointer_type::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:506: instantiated from »void boost::archive::save(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/mpi/detail/ignore_skeleton_oarchive.hpp:46: instantiated from »void boost::mpi::detail::ignore_skeleton_oarchive::save_override(const T&, int) [with T = std::basic_string, std::allocator >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/archive/detail/interface_oarchive.hpp:64: instantiated from »Archive& boost::archive::detail::interface_oarchive::operator<<(T&) [with T = const std::basic_string, std::allocator >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/serialization/nvp.hpp:78: instantiated from »void boost::serialization::nvp::save(Archivex&, unsigned int) const [with Archivex = boost::mpi::detail::mpi_datatype_oarchive, T = const std::basic_string, std::allocator >]« /usr/local/include/boost/serialization/access.hpp:93: instantiated from »static void boost::serialization::access::member_save(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = const boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/split_member.hpp:43: instantiated from »static void boost::serialization::detail::member_saver::invoke(Archive&, const T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/split_member.hpp:69: instantiated from »void boost::serialization::split_member(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/nvp.hpp:88: instantiated from »void boost::serialization::nvp::serialize(Archive&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = const std::basic_string, std::allocator >]« /usr/local/include/boost/serialization/access.hpp:109: instantiated from »static void boost::serialization::access::serialize(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/serialization.hpp:74: instantiated from »void boost::serialization::serialize(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/serialization.hpp:133: instantiated from »void boost::serialization::serialize_adl(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:220: instantiated from »static void boost::archive::detail::save_non_pointer_type::save_only::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:294: instantiated from »static void boost::archive::detail::save_non_pointer_type::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:506: instantiated from »void boost::archive::save(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/mpi/detail/ignore_skeleton_oarchive.hpp:46: instantiated from »void boost::mpi::detail::ignore_skeleton_oarchive::save_override(const T&, int) [with T = boost::serialization::nvp, std::allocator > >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/archive/detail/interface_oarchive.hpp:64: instantiated from »Archive& boost::archive::detail::interface_oarchive::operator<<(T&) [with T = const boost::serialization::nvp, std::allocator > >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/serialization/collections_save_imp.hpp:60: instantiated from »void boost::serialization::stl::save_collection(Archive&, const Container&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, Container = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/serialization/vector.hpp:53: instantiated from »void boost::serialization::save(Archive&, const std::vector&, unsigned int, mpl_::false_) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, U = std::basic_string, std::allocator >, Allocator = std::allocator, std::allocator > >]« /usr/local/include/boost/serialization/vector.hpp:123: instantiated from »void boost::serialization::save(Archive&, const std::vector&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, U = std::basic_string, std::allocator >, Allocator = std::allocator, std::allocator > >]« /usr/local/include/boost/serialization/split_free.hpp:45: instantiated from »static void boost::serialization::free_saver::invoke(Archive&, const T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/serialization/split_free.hpp:74: instantiated from »void boost::serialization::split_free(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/serialization/vector.hpp:147: instantiated from »void boost::serialization::serialize(Archive&, std::vector&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, U = std::basic_string, std::allocator >, Allocator = std::allocator, std::allocator > >]« /usr/local/include/boost/serialization/serialization.hpp:133: instantiated from »void boost::serialization::serialize_adl(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:144: instantiated from »void boost::archive::detail::oserializer::save_object_data(boost::archive::detail::basic_oarchive&, const void*) const [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /home/languitar/workspace/iMN/src/algorithms/parallelScan/ParallelScan.cpp:128: instantiated from here /usr/local/include/boost/mpi/datatype.hpp:184: Fehler: keine passende Funktion für Aufruf von »assertion_failed(mpl_::failed*\*\*\*\*\*\*\*\*\*\*\* boost::mpi::is_mpi_datatype, std::allocator > >::\*\*\*\*\*\*\*\*\*\*\*\*)« /usr/local/include/boost/mpi/detail/mpi_datatype_cache.hpp: In member function »ompi_datatype_t\* boost::mpi::detail::mpi_datatype_map::datatype(const T&, typename boost::disable_if, void>::type*) [with T = std::basic_string, std::allocator >]«: /usr/local/include/boost/mpi/datatype.hpp:185: instantiated from »ompi_datatype_t* boost::mpi::get_mpi_datatype(const T&) [with T = std::basic_string, std::allocator >]« /usr/local/include/boost/mpi/detail/mpi_datatype_primitive.hpp:96: instantiated from »void boost::mpi::detail::mpi_datatype_primitive::save(const T&) [with T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:95: instantiated from »static void boost::archive::save_access::save_primitive(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:212: instantiated from »static void boost::archive::detail::save_non_pointer_type::save_primitive::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:294: instantiated from »static void boost::archive::detail::save_non_pointer_type::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:506: instantiated from »void boost::archive::save(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/mpi/detail/ignore_skeleton_oarchive.hpp:46: instantiated from »void boost::mpi::detail::ignore_skeleton_oarchive::save_override(const T&, int) [with T = std::basic_string, std::allocator >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/archive/detail/interface_oarchive.hpp:64: instantiated from »Archive& boost::archive::detail::interface_oarchive::operator<<(T&) [with T = const std::basic_string, std::allocator >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/serialization/nvp.hpp:78: instantiated from »void boost::serialization::nvp::save(Archivex&, unsigned int) const [with Archivex = boost::mpi::detail::mpi_datatype_oarchive, T = const std::basic_string, std::allocator >]« /usr/local/include/boost/serialization/access.hpp:93: instantiated from »static void boost::serialization::access::member_save(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = const boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/split_member.hpp:43: instantiated from »static void boost::serialization::detail::member_saver::invoke(Archive&, const T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/split_member.hpp:69: instantiated from »void boost::serialization::split_member(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/nvp.hpp:88: instantiated from »void boost::serialization::nvp::serialize(Archive&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = const std::basic_string, std::allocator >]« /usr/local/include/boost/serialization/access.hpp:109: instantiated from »static void boost::serialization::access::serialize(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/serialization.hpp:74: instantiated from »void boost::serialization::serialize(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/serialization.hpp:133: instantiated from »void boost::serialization::serialize_adl(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:220: instantiated from »static void boost::archive::detail::save_non_pointer_type::save_only::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:294: instantiated from »static void boost::archive::detail::save_non_pointer_type::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:506: instantiated from »void boost::archive::save(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/mpi/detail/ignore_skeleton_oarchive.hpp:46: instantiated from »void boost::mpi::detail::ignore_skeleton_oarchive::save_override(const T&, int) [with T = boost::serialization::nvp, std::allocator > >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/archive/detail/interface_oarchive.hpp:64: instantiated from »Archive& boost::archive::detail::interface_oarchive::operator<<(T&) [with T = const boost::serialization::nvp, std::allocator > >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/serialization/collections_save_imp.hpp:60: instantiated from »void boost::serialization::stl::save_collection(Archive&, const Container&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, Container = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/serialization/vector.hpp:53: instantiated from »void boost::serialization::save(Archive&, const std::vector&, unsigned int, mpl_::false_) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, U = std::basic_string, std::allocator >, Allocator = std::allocator, std::allocator > >]« /usr/local/include/boost/serialization/vector.hpp:123: instantiated from »void boost::serialization::save(Archive&, const std::vector&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, U = std::basic_string, std::allocator >, Allocator = std::allocator, std::allocator > >]« /usr/local/include/boost/serialization/split_free.hpp:45: instantiated from »static void boost::serialization::free_saver::invoke(Archive&, const T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/serialization/split_free.hpp:74: instantiated from »void boost::serialization::split_free(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/serialization/vector.hpp:147: instantiated from »void boost::serialization::serialize(Archive&, std::vector&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, U = std::basic_string, std::allocator >, Allocator = std::allocator, std::allocator > >]« /usr/local/include/boost/serialization/serialization.hpp:133: instantiated from »void boost::serialization::serialize_adl(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:144: instantiated from »void boost::archive::detail::oserializer::save_object_data(boost::archive::detail::basic_oarchive&, const void*) const [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /home/languitar/workspace/iMN/src/algorithms/parallelScan/ParallelScan.cpp:128: instantiated from here /usr/local/include/boost/mpi/detail/mpi_datatype_cache.hpp:68: Fehler: keine passende Funktion für Aufruf von »assertion_failed(mpl_::failed*\*\*\*\*\*\*\*\*\*\*\* boost::mpi::is_mpi_datatype, std::allocator > >::\*\*\*\*\*\*\*\*\*\*\*\*)« /usr/local/include/boost/mpi/detail/mpi_datatype_oarchive.hpp: In constructor »boost::mpi::detail::mpi_datatype_oarchive::mpi_datatype_oarchive(const T&) [with T = std::basic_string, std::allocator >]«: /usr/local/include/boost/mpi/detail/mpi_datatype_cache.hpp:75: instantiated from »ompi_datatype_t\* boost::mpi::detail::mpi_datatype_map::datatype(const T&, typename boost::disable_if, void>::type*) [with T = std::basic_string, std::allocator >]« /usr/local/include/boost/mpi/datatype.hpp:185: instantiated from »ompi_datatype_t* boost::mpi::get_mpi_datatype(const T&) [with T = std::basic_string, std::allocator >]« /usr/local/include/boost/mpi/detail/mpi_datatype_primitive.hpp:96: instantiated from »void boost::mpi::detail::mpi_datatype_primitive::save(const T&) [with T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:95: instantiated from »static void boost::archive::save_access::save_primitive(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:212: instantiated from »static void boost::archive::detail::save_non_pointer_type::save_primitive::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:294: instantiated from »static void boost::archive::detail::save_non_pointer_type::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/archive/detail/oserializer.hpp:506: instantiated from »void boost::archive::save(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::basic_string, std::allocator >]« /usr/local/include/boost/mpi/detail/ignore_skeleton_oarchive.hpp:46: instantiated from »void boost::mpi::detail::ignore_skeleton_oarchive::save_override(const T&, int) [with T = std::basic_string, std::allocator >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/archive/detail/interface_oarchive.hpp:64: instantiated from »Archive& boost::archive::detail::interface_oarchive::operator<<(T&) [with T = const std::basic_string, std::allocator >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/serialization/nvp.hpp:78: instantiated from »void boost::serialization::nvp::save(Archivex&, unsigned int) const [with Archivex = boost::mpi::detail::mpi_datatype_oarchive, T = const std::basic_string, std::allocator >]« /usr/local/include/boost/serialization/access.hpp:93: instantiated from »static void boost::serialization::access::member_save(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = const boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/split_member.hpp:43: instantiated from »static void boost::serialization::detail::member_saver::invoke(Archive&, const T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/split_member.hpp:69: instantiated from »void boost::serialization::split_member(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/nvp.hpp:88: instantiated from »void boost::serialization::nvp::serialize(Archive&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = const std::basic_string, std::allocator >]« /usr/local/include/boost/serialization/access.hpp:109: instantiated from »static void boost::serialization::access::serialize(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/serialization.hpp:74: instantiated from »void boost::serialization::serialize(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/serialization/serialization.hpp:133: instantiated from »void boost::serialization::serialize_adl(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:220: instantiated from »static void boost::archive::detail::save_non_pointer_type::save_only::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:294: instantiated from »static void boost::archive::detail::save_non_pointer_type::invoke(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:506: instantiated from »void boost::archive::save(Archive&, const T&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = boost::serialization::nvp, std::allocator > >]« /usr/local/include/boost/mpi/detail/ignore_skeleton_oarchive.hpp:46: instantiated from »void boost::mpi::detail::ignore_skeleton_oarchive::save_override(const T&, int) [with T = boost::serialization::nvp, std::allocator > >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/archive/detail/interface_oarchive.hpp:64: instantiated from »Archive& boost::archive::detail::interface_oarchive::operator<<(T&) [with T = const boost::serialization::nvp, std::allocator > >, Archive = boost::mpi::detail::mpi_datatype_oarchive]« /usr/local/include/boost/serialization/collections_save_imp.hpp:60: instantiated from »void boost::serialization::stl::save_collection(Archive&, const Container&) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, Container = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/serialization/vector.hpp:53: instantiated from »void boost::serialization::save(Archive&, const std::vector&, unsigned int, mpl_::false_) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, U = std::basic_string, std::allocator >, Allocator = std::allocator, std::allocator > >]« /usr/local/include/boost/serialization/vector.hpp:123: instantiated from »void boost::serialization::save(Archive&, const std::vector&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, U = std::basic_string, std::allocator >, Allocator = std::allocator, std::allocator > >]« /usr/local/include/boost/serialization/split_free.hpp:45: instantiated from »static void boost::serialization::free_saver::invoke(Archive&, const T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/serialization/split_free.hpp:74: instantiated from »void boost::serialization::split_free(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/serialization/vector.hpp:147: instantiated from »void boost::serialization::serialize(Archive&, std::vector&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, U = std::basic_string, std::allocator >, Allocator = std::allocator, std::allocator > >]« /usr/local/include/boost/serialization/serialization.hpp:133: instantiated from »void boost::serialization::serialize_adl(Archive&, T&, unsigned int) [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /usr/local/include/boost/archive/detail/oserializer.hpp:144: instantiated from »void boost::archive::detail::oserializer::save_object_data(boost::archive::detail::basic_oarchive&, const void*) const [with Archive = boost::mpi::detail::mpi_datatype_oarchive, T = std::vector, std::allocator >, std::allocator, std::allocator > > >]« /home/languitar/workspace/iMN/src/algorithms/parallelScan/ParallelScan.cpp:128: instantiated from here /usr/local/include/boost/mpi/detail/mpi_datatype_oarchive.hpp:36: Fehler: keine passende Funktion für Aufruf von »assertion_failed(mpl_::failed*\*\*\*\*\*\*\*\*\*\*\* boost::mpi::is_mpi_datatype, std::allocator > >::\*\*\*\*\*\*\*\*\*\*\*\*)« make[2]: \*\*\* [src/CMakeFiles/imnbase.dir/algorithms/parallelScan/ParallelScan.cpp.o] Fehler 1 make[1]: \*\*\* [src/CMakeFiles/imnbase.dir/all] Fehler 2 make: \*\*\* [all] Fehler 2
Übersichtlich…
]]>Doppelhaltestelle
gpg: Schwierigkeiten mit dem Agenten - Agent-Ansteuerung wird abgeschaltet
Lange Rede, kurzer Sinn: der gpg-agent lief, aber ihm fehlte das entsprechende Programm um den Eingabedialog für das Passwort anzuzeigen. Da der gpg-agent von meiner KDE-Sitzung gestartet wurde, hat pinentry-qt gefehlt. Seltsamerweise wird in Jaunty unter KDE4 nicht pinentry-qt4 benutzt, was bereits installiert war.
]]>Leider ist dies in der C++-Welt teilweise noch nicht ganz angekommen, deshalb hier ein kurzer Reminder, wie solche “Out-of-Source Builds” mit CMake gemacht werden können.
Das schöne an CMake ist, dass man eigentlich gar nichts tun muss, um ein Out-of-Source Build zu erstellen. Angenommen die Projektstruktur sie so aus:
languitar@bird /tmp/cmakeexample $ find .
./bin
./src
./src/yourCodeHere.cpp
./CMakeLists.txt
Um jetzt ein Out-of-Source Build zu erstellen, bei dem alle kompilierten Dateien in bin
abgelegt werden, reicht es in das bin
-Verzeichnis zu navigieren und dort cmake ..
aufzurufen. Alle Dateien, die CMake intern benötigt, inklusive des generierten Makefiles, und alle Ergebnisse des Compilers / Linkers werden so in bin
angelegt und sowohl src
als auch der Projektordner bleiben unangetastet. Diese Vorgehen hat bei CMake auch noch einen besonderen Vorteil: Eigentlich sollte CMake Änderungen an der CMakeLists.txt
eigenständig feststellen und das Makefile neu generieren beim Bauen. Das klappt des Öfteren jedoch nicht richtig. Hätte man CMake im Projektverzeichnis aufgerufen müsste man dort jetzt mühselig alle von ihm generierten Dateien wie den CMakeCache.txt
löschen um von vorne anzufangen. Im Falle eines Out-of-Source Builds kann man hingegen bedenkenlos und einfach den Build-Ordner leeren.
Ein paar Kleinigkeiten muss man für saubere Out-of-Source Builds mit CMake allerdings doch noch beachten. Auch wenn CMake das eigentliche Bauen und das Verwalten der internen Dateien sauber im Build-Ordner durchführt, verhindert es keine Fehler durch den Anwender. Falls man CMake z. B. noch selbst benutzt um Dateien zu generieren (z. B. pkg-config Files) sollten diese auch im Build-Ordner erstellt werden. Dieses Verzeichnis kann man in CMake mit den Variablen CMAKE_BINARY_DIR
bzw. CMAKE_CURRENT_BINARY_DIR
abfragen. Dateien sollten also immer über diese Variablen gespeichert werden und nicht über CMAKE_SOURCE_DIR
bzw. CMAKE_CURRENT_SOURCE_DIR
. Im Falle eines “In-Source” Builds (also der Aufruf cmake .
im Projektordner) zeigen CMAKE_BINARY_DIR
und CMAKE_SOURCE_DIR
etc. übrigens auf dieselben Verzeichnisse.
Häufig sieht man Implementierungen wie diese:
public class ObserverTest {
private static interface Observer {
void process();
}
private static Set observers = new HashSet();
public static void addObserver(Observer observer) {
observers.add(observer);
}
public static void removeObserver(Observer observer) {
observers.remove(observer);
}
private static void notifyObservers() {
for (Observer o : observers) {
o.process();
}
}
// es fehlt noch eine main-Methode
}
Das funktioniert so auch in vielen Fällen problemlos, in folgendem Fall aber nicht:
public static void main(String[] args) {
addObserver(new Observer() {
@Override
public void process() {
removeObserver(this);
}
});
notifyObservers();
}
Führt man das Programm so aus, wirft Java eine ConcurrentModificationException:
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761)
at java.util.LinkedList$ListItr.next(LinkedList.java:696)
at ObserverTest.notifyObservers(ObserverTest.java:21)
at ObserverTest.main(ObserverTest.java:37)
Das Problem hier ist, dass eigentlich alle Standardimplementationen von Collection keine Änderung beim iterieren zulassen. Diese Beschränkung sollte man normalerweise aber nicht an die Observer weiterreichen, da sie deren Handlungsfähigkeit stark einschränkt (ein üblicher Anwendungsfall ist, dass sich ein Observer abmeldet, wenn er die gewünschte Information erhalten hat). Zum Glück gibt es eine sehr einfache Lösung für dieses Problem: vor dem Iterieren holt man sich eine Kopie der Collection und iteriert über diese (hierbei muss beachtet werden, dass es eine “shallow copy” ist, also die Elemente selbst nicht Kopiert werden, nur der Container).
Mit folgender Modifikation läuft der Code also problemlos durch:
private static void notifyObservers() {
Set observersCopy = new HashSet(observers);
for (Observer o : observersCopy) {
o.process();
}
}
]]>Vielleicht wird es doch Zeit, sich nach einer Alternative umzuschauen…
]]>Das Buch ist unter der ISBN 978-0-596-51613-0 bei O’Reilly erschienen.
]]> -DOktal, --debug=Oktal
[...]
--debug=help zeigen diese Debugging-Werte an.
Nummer Beschreibung
1 Allgemein hilfreiche Fortschrittsinformationen
2 Aufruf und Status der Betreuerskripte
10 Ausgabe für jede verarbeitete Datei
100 Umfangreiche Ausgabe für jede verarbeitete Datei
20 Ausgabe für jede Konfigurationsdatei
200 Umfangreiche Ausgabe für jede Konfigurationsdatei
40 Abhängigkeiten und Konflikte
400 Umfangreiche Abhängigkeiten/Konflikte-Ausgabe
10000 Trigger-Aktivierung und -Verarbeitung
20000 Umfangreiche Ausgabe bezüglich Trigger
40000 Alberne Menge an Ausgabe bezüglich Trigger
1000 Umfangreiches Gelaber beispielsweise über das dpkg/info-Verzeichnis
2000 Verrückte Mengen an Gelaber
Seltsamerweise findet sich das nicht in der Online-Version dieser Manpage auf der Ubuntu-Seite…
]]>Die Entität wird normal mittels der Hibernate Search Annotationen indiziert, für alle Einträge, deren Sichtbarkeit geregelt werden kann, wird jedoch eine eigene FieldBridge benutzt, die als Feldnamen für das Lucene-Document eine Kombination aus Sichtbarkeit und gewünschtem Feldnamen benutzt, so dass die Einträge im Document am Ende so aussehen könnten:
EXTERNAL_name: Mr. X
COMMUNITY_name: Mr. X
EXTERNAL_project: XYZ
COMMUNITY_age: 24
COMMUNITY_phone: 1234567890
Wichtig für meinen Ansatz ist, dass Einträge, die für alle sichtbar sind, zusätzlich auch mit der Sichtbarkeit für eingeloggte Nutzer gespeichert werden (oder falls es noch mehr Levels gibt entsprechend auch für alle alle weiteren “kleineren” Levels).
Die Frage ist nun, wie in einem so strukturierten Dokument gesucht wird. Ich habe dafür leider keine Lösung gefunden, die den existierenden Parser für Suchausdrücke von Lucene benutzten kann. Daher wird zunächst eine eigener Parser benötigt, der z. B. mit javacc implementiert werden kann. Die grundsätzliche Idee ist nun, dass dieser Parser mittels der Lucene API Suchausdrücke auf den im Document gespeicherten Feldern (hier also “name”, “project”, “age” und “phone”) baut und dabei vor jeden Feldnamen das gerade notwendige Sichtbarkeitslevel (“COMMUNITY” oder “EXTERNAL”) schreibt, so dass nur auf den Feldern des entsprechenden Levels gesucht werden kann.
]]>import org.junit.After;
import org.junit.Before;
public abstract class DatabaseTestCase {
@Before
public void initDatabase() {
System.out.println("Initializing DB");
// init db stuff
}
@After
public void clearDatabase() {
System.out.println("Clearing DB");
// set db to old state
}
}
Ein konkreter Test, der die Datenbank benutzt, könnte dann wie folgt aussehen:
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
public class FooDBTest extends DatabaseTestCase {
@Before
public void setUp() {
System.out.println("Initializing test case");
}
@Test
public void test() {
System.out.println("I should test the DB");
}
@After
public void tearDown() {
System.out.println("Tear down of test case");
}
}
Der entscheidende Punkt hier ist, dass JUnit 4 versichert, dass die mit @Before gekennzeichneten Methoden einer Oberklasse garantiert vor den mit @Before gekennzeichneten Methoden der Unterklasse und mit @After markierte Methoden der Oberklasse erst nach den mit @After markierten Methoden der Unterklasse aufgerufen werden. D. h. FooDBTest kann auch in seiner setUp-Methode schon davon ausgehen, dass die Datenbank initialisiert ist.
Dieses Verhalten lässt sich natürlich auch für jede beliebige andere Test-Fixture benutzen.
]]>Zumindest auf meinem Rechner reproduzierbar beim Kopieren einer HTML-Datei im Package-Explorer in die Zwischenablage.
]]>Tunnel auf der Radroute am Telemarkkanal (gestitched mit Hugin)
Für die Fotos aus Norwegen benutze ich gerade mal wieder den OpenSource Panorama Stitcher Hugin, der in aller Regeln schon hervorragende Ergebnisse liefert. Bei einigen schwierigeren Aufnahmen hat mir allerdings bisher der Background gefehlt, um die entsprechende Parameter zu optimieren. Auf der Suche nach einer guten Erklärung dazu bin ich über dieses Tutorial gestolpert, das viele Tipps mit wesentlich mehr fachlichem Hintergrund gibt, als andere Tutorials für Hugin.
]]>Westend Speiseplan
Gefunden an einem kombinierten Möbelladen, Künstleratelier und Bootsverleih.
]]>Soweit man überhaupt irgendwas an diesem Urlaub geplant nennen darf, ist es die Route entlang des Telemarkkanals. Diese wollen wir die erste Woche mit dem Rad fahren und dann je nach Stimmung entweder mit Kanus zurück oder mit einem Boot an einem Tag und dann auf einem anderen Gewässer Kanu fahren.
]]>Wo wohl der Unterschied zwischen Bockwurst und Bockwurst(1,2,3) ist?
]]>Einer der wichtigsten Punkte wird im genannten Text unter “Be intuitive” aufgeführt: “Use consistent parameter ordering across methods”. Dazu ein Beispiel aus der C API, über das ich heute dank Segfault gestolpert bin:
int puts(char *s);
int fputs(const char *s, FILE *stream);
vs.
int printf(const char *format, ...);
int fprintf(FILE *stream, const char *format, ...);
fprintf
erwartet den Stream als erstes Argument, während fputs
ihn als letztes Argument erwartet. Nicht gerade konsistent und intuitiv. Und das ganze geht noch weiter: puts
ist ja vor allem so praktisch, weil es sich selbst um das Newline-Zeichen kümmert. Warum kann fputs
das dann nicht auch? “fputs() schreibt die Zeichenkette s ohne sein nachfolgendes ’’ in den Ausgabestrom stream.” (aus der Manpage).
import pygtk
pygtk.require('2.0')
import gtk
Bis zur require-Zeile ist das alles kein Problem, doch import gtk
hat in manchen Situationen den Segfault ausgelöst. Seit heute weiß ich, dass dieser Segfault auftritt, wenn vorher bereits ein Modul geladen wurde, das gegen GTK gelinkt wurde, in meinem Projekt z.B. der loader:
from ship.icewing import loader
import pygtk
pygtk.require('2.0')
import gtk
Diese Variante führt garantiert in den Segfault, während folgendes kein Problem ist:
import pygtk
pygtk.require('2.0')
import gtk
from ship.icewing import loader
So weiß ich jetzt zumindest, wie ich den Fehler systematisch umgehen kann, schön ist das aber immer noch nicht. Vor allem muss jetzt der Nutzer meiner API die Import-Reihenfolge beachten. Hat irgendwer einen besseren Lösungsweg oder eine Idee, wo das Problem herkommt?
]]>/* Return value for opts_value_set() if error occurs */
#define OPTS_SET_ERROR (-9999)
Leider kann -9999 auch ein valider Rückgabewert der Funktion sein, die diese Konstante benutzt.
]]>#include <stdio.h>
int main(void) {
int foo = 1;
int bar = 0;
printf("%d, %d\n", foo && bar);
return 0;
}
Mal schnell die Berechnung einer boolschen Variable in printf gejagt und vergessen die && gegen Kommas zu ersetzen und schon sieht man die seltsamsten Ergebnisse:
0, -1079242984
In welchem Speicher printf da wohl nach dem zweiten Integer sucht? Wie schön sind doch Exceptions… Nach so einem Fehler kann man ewig suchen.
]]>