niklas muhs

flows

heroshot

timeframe:

4 days

role:

interaction conceptualization, implementation

tldr:

exploration of an integrated workflow for automatic prompt engineering based on dspy and extended through an interface.

prompt_quality

when developing applications on top of large language model (llm) in an exploitative space aiming to limit uncertain outcomes, it can be difficult to determine when a prompt is accurate and adaptable enough for deployment.

in addition, identifying the remaining issues and finding systemic ways to address them becomes more difficult when evaluating prompts at scale.

output

conventionally, we write prompts hoping that the model will infer the desired outcomes from the inputs. this becomes very opaque due to uncertainty regarding the relevant affordances of the llm.

prompt

however, by treating the llm as the black box that it is, the llm could instruct itself using its own affordances, while the user could focus on input and desired output. frameworks such as dspy already allow for optimizations like this.

llms can generate and test a variety of prompts to align with the desired outcomes.

cycle

in addition, llms can help create new examples that use the optimized prompt and generate edge-case inputs that the user may not have considered, leading to iterative improvement.

value

the creation of accurately evaluated and large numbers of examples ensures both adaptability to new inputs and accuracy in achieving desired outputs.

heroshot

try the prototype here: flows.niklas.space

the workflow offers other advantages as well

when prompt performance reaches a plateau, we can move on to other methods. this makes the transition between prompt engineering and for example finetuning clearer and easier, especially with a base of evaluated examples for finetuning.

a database of examples also facilitates comparison with more efficient models that may have lower affordance discoverability but similar capabilities.

finally, linking this workflow to traces in the production system could lead to continuous product improvement.

next page ->
e-mailabout
linkedin